Stochastic emulation with enhanced partial‐ and no‐replication strategies for seismic response distribution estimation

Modern performance earthquake engineering practices frequently require a large number of time‐consuming non‐linear time‐history simulations to appropriately address excitation and structural uncertainties when estimating engineering demand parameter (EDP) distributions. Surrogate modeling techniques have emerged as an attractive tool for alleviating such high computational burden in similar engineering problems. A key challenge for the application of surrogate models in earthquake engineering context relates to the aleatoric variability associated with the seismic hazard. This variability is typically expressed as high‐dimensional or non‐parametric uncertainty, and so cannot be easily incorporated within standard surrogate modeling frameworks. Rather, a surrogate modeling approach that can directly approximate the full distribution of the response output is warranted for this application. This approach needs to additionally address the fact that the response variability may change as input parameter changes, yielding a heteroscedastic behavior. Stochastic emulation techniques have emerged as a viable solution to accurately capture aleatoric uncertainties in similar contexts, and recent work by the second author has established a framework to accommodate this for earthquake engineering applications, using Gaussian Process (GP) regression to predict the EDP response distribution. The established formulation requires for a portion of the training samples the replication of simulations for different descriptions of the aleatoric uncertainty. In particular, the replicated samples are used to build a secondary GP model to predict the heteroscedastic characteristics, and these predictions are then used to formulate the primary GP that produces the full EDP distribution. This practice, however, has two downsides: it always requires minimum replications when training the secondary GP, and the information from the non‐replicated samples is utilized only for the primary GP. This research adopts an alternative stochastic GP formulation that can address both limitations. To this end, the secondary GP is trained by measuring the square of sample deviations from the mean instead of the crude sample variances. To establish the primitive mean estimates, another auxiliary GP is introduced. This way, information from all replicated and non‐replicated samples is fully leveraged for estimating both the EDP distribution and the underlying heteroscedastic behavior, while formulation accommodates an implementation using no replications. The case study examples using three different stochastic ground motion models demonstrate that the proposed approach can address both aforementioned challenges.


INTRODUCTION
Assessment of risk in earthquake engineering applications requires proper consideration of various types of uncertainties for the seismic hazard and the infrastructure models. 1 Recent advancement of high-fidelity computational simulation models 2,3 have dramatically improved our ability to provide accurate predictions for infrastructure response through non-linear response history analysis (NLRHA), but the use of such models in assessing seismic risk creates frequently a prohibitive computational burden due to the large number of simulations needed to address the aforementioned uncertainty sources.][6][7][8][9][10] Surrogate models, also referred to as metamodels or emulators, replace the original high-fidelity model with a computationally efficient (fast) data-driven approximation developed using a small number of judicially chosen simulations, frequently referenced as computer experiments or training points.][13][14][15][16][17][18] Although attractive, surrogating seismic structural behavior is challenging because of the aleatoric uncertainty associated with the hazard description. 11This type of uncertainty originates from the complex physical mechanism in propagation/generation of earthquakes, which precludes the complete description of the seismic excitation through parametrized explanatory variables (EVs). 1,19,20Depending on the approach adopted for modeling the seismic excitation (ground motion time-history in context of NLRHA) 21 this variability may be expressed as high-dimensional or nonparametric uncertainty.From the perspective of the numerical simulations to predict the engineering demand parameters (EDPs) of interest, this yields simulation models that are stochastic rather than deterministic: the response output for the same explanatory seismic input parameters (e.g., intensity measure vector) will vary across simulations depending on the exact selection of the ground motion according to the underlying aleatoric uncertainty in the hazard description. 20his variability is large and crucial to the downstream analysis of structural performance estimation 1 necessitating for a high-precision estimation of the underlying EDP distribution.Meanwhile, the level of variability depends significantly on structural and seismic explanatory inputs 22 (representing a heteroscedastic behavior) increasing the complexity of the risk assessment.
4][25][26][27][28][29][30][31] Many of the stochastic emulation approaches leverage specially designed configuration of simulation experiments called replications, established by repeating the simulation for identical input values.Such replications of the stochastic simulation code can be used to provide sample-based estimates of the response statistics of the output, such as the non-stationary mean and heteroscedastic variance across replications, and are used directly by the stochastic emulator to approximate the output distribution. 23,26lbeit the demonstrated effectiveness of such techniques, the computational burden associated with the need to perform replications for the simulation experiments limits their applicability for many practical applications with expensive simulation models.This challenge has been further addressed by recent developments that reduce 26 or even entirely avoid 29,31 the need for replications by introducing latent variables to approximate the underlying statistics, instead of establishing sample-based approximations.
In the field of earthquake engineering, stochastic emulation techniques were recently adopted 32 for describing the EDP distribution, utilizing Gaussian Process (GP) regression (also referred to as kriging) as foundational surrogate model technique.In this instance, the non-deterministic features of the structural simulations originate from the aleatoric component of the seismic hazard variability.The work in ref. [32] builds closely upon the developments in refs.[23, 26]  and relies on the concept of partial-replications to achieve computational efficiency, formulating a framework which incorporates both replicated and non-replicated simulation experiments for the emulator development.In particular, the replicated samples are used to obtain variance estimates at the sampled training points, and an auxiliary GP, termed secondary GP, is then introduced to denoise the sample variance observations and interpolate them to construct a continuous mapping of a variance-field across the input space.Then, one can constrain the relative scales of the variance (i.e., the heteroscedastic characteristics) to train a primary GP to approximate the EDP distribution, exploiting both replicated and non-replicated samples. 32The approach is computationally attractive as, by introducing the point estimate of variance, it avoids complex alternative solutions for approximating the heteroscedastic behavior, 26,[33][34][35] but has two drawbacks: (i) it always requires some minimal replications; and (ii) the information from non-replicated samples is unused during the approximation of the heteroscedastic variance.Both these drawbacks originate from the fact that the variance approximation in the secondary GP leverages estimates of the sample variance which cannot be provided from the non-replicated samples.
To address the two limitations, this work proposes a modified version of the partial-replication-based stochastic emulation approach that fully utilizes the information from both replicated and non-replicated samples.In particular, it enables the use of non-replicated samples in the variance estimation by adopting the techniques discussed in refs.[30, 36] to replace the observations of sample variance with the squared sample deviation from the mean.To estimate this deviation a primitive mean estimate at each sample location is required, obtained by introducing an additional, tertiary, GP model relying on a homoscedasticity assumption.The latter reliance is justified by the results reported in ref. [32], showing that the mean field estimation is relatively robust to the variance approximation model, even when the underlying observations have high heteroscedastic aleatoric variability.Accordingly, the mean of the response is first approximated using the tertiary GP under the simplified assumption of constant aleatoric variance.Then, taking advantage of the knowledge of the mean, the response variance is approximated in terms of sample squared deviations from the mean, as opposed to the original approach that relied on the sample variance estimation.In this way, not only replicated but also non-replicated samples can contribute in developing the secondary GP for the variance-field.This further enables the framework to be used without any replications, allowing the computational budget previously used for the replication to be used to further explore the unexplored part of the input domain.Meanwhile, this workflow inherits an attractive property of the original strategy, that optimization across latent features is not required, as the latter task may create overfitting challenges to the observation data for problems with high levels of aleatoric uncertainty or large number of exploratory variables. 26The proposed strategy is demonstrated utilizing stochastic ground motion models to describe the seismic hazard, examining three different parameterization settings to explore the impact of the different sources of excitation aleatoric uncertainty. 22

SEISMIC RISK ASSESSMENT USING SURROGATE MODELS
8][39] The most influential types of uncertainty are associated with the seismic excitation, and can be broadly divided into two classes.The first class pertains to key seismicity characteristics and/or ground motion features.Depending on the approach adopted to describe the seismic hazard, these may pertain to the moment magnitude, rupture location and local soil properties, to underlying excitation characteristics such as arias intensity, significant duration and frequency content, or to properly chosen intensity measures.This class represents the parametric explanatory input variables that may be used to describe the dominant hazard features in the problem formulation, but can only partially explain the structural response variability.The remaining variability, that is the portion that cannot be described using such EVs, is attributed, as discussed in the introduction, to miscellaneous sources of randomness and represents the second class of uncertainty in the hazard description, 20 corresponding to aleatoric randomness.Depending on the modeling approach for describing the seismic excitation, this uncertainty source might not have a parametric description (corresponds to latent features of the excitation model), or might correspond to a high-dimensional stochastic sequence. 32Within any predictive modeling setting, this randomness needs to be treated as inherent uncertainty in the model output given the values of the seismic hazard EVs.
Additionally, the structural model might have its own uncertain parameters, such as material and geometric properties, that need to be used as EVs when estimating structural response.
Based on this uncertainty description, the objective of the surrogate modeling within the seismic risk assessment context is to create a mapping between all the EVs, that is, seismicity features and structural parameters, and the EDP distribution of a target structure, where the distribution randomness is attributed to the aleatory uncertainties. 15,32,40To formalize this concept, let us denote the structural response (EDP) by z, the explanatory seismic excitation characteristics by x h , and the uncertain structural parameters by x s .The augmented vector of excitation and structural EVs will be denoted by  = { ℎ ,   } ∈ ℝ   , with n x representing its dimension.The goal of the surrogate model is to produce an approximation to the distribution (|), with notation "|" introduced to denote conditional relationship.It should be pointed out that the estimation of this distribution is one of the key components of modern seismic risk assessment practices. 1,3Once this distribution as a function of x is accurately approximated, the propagation of all uncertainties associated with x, to accommodate the risk assessment, can be efficiently performed, for example, using standard Monte Carlo Simulation (MCS) techniques. 32,41tochastic emulation techniques establish directly an approximation for the full EDP distribution, (|), instead of approximating, for example, the EDP statistics. 11Specifically, GP-based emulation is considered in this work for the (|) approximation, representing one of the most popular stochastic emulation techniques. 23,26The foundation of the GP emulation is the approximation of the chosen response y as a realization of a GP.Instead of approximating directly the EDP, in seismic risk assessment applications it is recommended to approximate its logarithm, setting y = ln(z) for the GP output. 32This agrees with the lognormal assumption for (|) frequently utilized in earthquake engineering. 1he underlying GP has some mean trend function over -typically expressed by a vector of basis functions  () and their regression coefficients -and stationary auto-covariance model denoted as σ2 (,  ′ |) with kernel parameters  and process variance σ2 . 8,42Leveraging the so-called nugget 27,43 to establish a regression over the training data and address the aleatoric randomness, the stochastic emulation model yields the following distribution approximation 32 where (, ) stands for a Normal distribution with mean a and variance b, ỹ() and  2 () represent the predictive mean and variance for the GP and  2 () the nugget variance.Note that the nugget variable  ∼ (0,  2 ()) is a zero mean normal variable statistically independent across the location , 27,43 that is,  can be interpreted as white noise with varying scale respective to .When  2 () = 0, the GP regression reduces to an exact interpolation which is useful only when one has noiseless observation of (well-posed) deterministic simulators.When  2 () =  2 in Equation ( 1) is a constant, the problem reduces to a simpler homoscedastic case.In this case, only one parameter  2 needs to be identified to describe the aleatoric variance scale, though it has been shown 32 that this is a poor approximation for describing hysteretic structural response distributions.Therefore, the emphasis is herein on how to address a heteroscedastic nugget variance  2 () across the domain .Appendix A presents a generic GP formulation covering all aforementioned variants.The first two implementations, that is, no nugget and homoscedastic nugget, are obtained by selecting as output v = y = ln(z) and adopting the simplified assumptions reviewed in Table A1.

REVIEW OF STOCHASTIC EMULATION WITH PARTIAL REPLICATIONS
This section briefly reviews the stochastic emulation with partially replicated samples. 32This approach leverages two GP models: the primary one with heteroscedastic nugget approximates the response z(x), while its nugget variance τ 2 (x) is approximated by the secondary GP.
For formulating the GP metamodel, a database of observations is first developed by performing simulations for N t different input configurations {  ;  = 1, … ,   }, representing the GP training points.The training points should cover the entire input domain, denoted as X herein, where the surrogate model is anticipated to be used for, so that any extrapolations are avoided.For each training point, the EDP prediction is repeated a total of n i times yielding observations { () ;  = 1, ..,   }.These represent replications of the output for the same input x i , that is, y i(k) = y (k) (x i ), within a stochastic emulation/simulation setting, and are obtained by utilizing different realizations for the aleatoric hazard description for the same selection for the excitation EVs   ℎ .For example, in the context of NLRHA utilizing stochastic ground motion models, these realizations correspond to acceleration time-histories obtained by utilizing different stochastic sequences. 32erein superscripts will be used to distinguish the training points and superscripts in parenthesis will be used to distinguish the replications.Following ref. [32] let us consider a special configuration of partially replicated training set, where first N r samples have n r replications (  =   for i ≤ N r ) and remaining N n = N t -N r samples are non-replicated samples (  = 1 for N r < i ≤ N t ).The set of training points for which replications are available will be denoted as X p .The total number of high-fidelity simulations for the stochastic emulation training is N sim = n r N r +N n .
For the training inputs with replications (X p ), unbiased estimates for the sample variance are obtained as: These estimates can serve as noisy observations to train a secondary GP model, denoted GP τ herein, to predict the continuous field of the response variance.The noise in these observations originates from the variability in the sample variances of Equation (2) [sample-based estimation error], and can be addressed using a homoscedastic nugget in the GP formulation. 32he predictions of GP τ are utilized to describe the heteroscedastic behavior of the EDP distribution, denoted as τ2 (), and scaling the nugget variance as  2 () =  2  τ2 (), where  2  is a scaling parameter to be calibrated based on the available observations, formulates then the primary GP, denoted as GP y .The workflow for this stochastic emulation using the partial replications, denoted PR-SE, is summarized by the following steps 32 : Step 1: Estimation of sample variance: For the X p training points {  ;  = 1, ..,   }, the available replications { () ;  = 1, ..,   } are used to obtain the sample-based estimates ( τ2 )  for the aleatoric variance using Equation (2).
Step 2: Approximation of heteroscedastic variance by GP τ : Train a GP model that approximates the response variance across the input domain using the observation set {  , ( τ2 )  ;  = 1, ..,   }.Since the variance is strictly positive, following recommendations in ref. [26], its logarithm is actually approximated (though note this is not necessary).This further justifies 32 the use of a homoscedastic nugget that greatly simplifies the GP τ formulation.The GP τ follows the Ho-NR formulation (Table A1) discussed in Appendix A utilizing output selection  = ln( 2 ), number of training points N s = N r , and provides approximation ln( τ2 ()) given by Equation (A2).Formulation entails the hyper-parameter calibration given by Equation (A6) utilizing observation set for the replicated samples {  , ( τ2 )  ;  = 1, ..,   }.
Step 3: Estimation of the response by GP y : Using τ2 () as unscaled variance component, formulate heteroscedastic GP model that approximates the response distribution across the domain of interest utilizing all available observations {  ,  () ;  = 1, … ,   ,  = 1, … ,   }.The GP y follows the He-R formulation ( As discussed in the introduction, drawbacks of this stochastic emulation strategy are that: (i) replication is always needed in Step 1 to obtain the observation of Equation (2) utilized in Step 2 to establish the GP τ ; and (ii) the samples without replications are utilized only in Step 3 for the calibration of GP y and not in Step 2 for the calibration of GP τ , even though correct predictions for GP τ play a critical role in accurately estimating the EDP distribution.Both issues arise from the fact that the variance estimation in Equation (2) hinges on only the the replicated samples.Such discrepancies in the information utilized from the replicated and non-replicated samples, creates trade-offs in the choice of the number and size of replications, that is, given the total computational budget N sim , having more replications restrict the exploration of domain (smaller N t ), while more non-replication samples will lead to poor variance estimation (smaller N r or n r ).This conflicting benefits have been also investigated by numerical experiments 32 which showed that, given number of total model evaluations, the overall accuracy improves as more replications are included only until the ratio reaches certain point at which it starts to decrease.Presumably, this conflict arises mainly because the non-replicated sample did not contribute to improving the accuracy in the response variance approximation.Furthermore, generating sufficient replications may not be always feasible.This can occur, for example, when x h represents a higher-dimensional intensity measure vector, and one cannot find multiple ground motions for all desired x h values, limiting the ability to consider replications for them.To address these challenges an enhanced replication strategy is discussed next, allowing stochastic emulation even for the limiting case without any replications.

Algorithmic implementation of the improved stochastic emulation
The objective of the improved stochastic emulation implementation is to utilize all points (even the non-replication points) for the variance-field approximation in Step 2 of the PR-SE algorithm discussed in Section 3. Following ref. [30], this is achieved by introducing the squared sample deviation to replace the sample variance utilized in that step, so that information from both replicated and non-replicated samples can be utilized.The sample deviation is defined as the distance of the response observation from some primitive estimate of the response mean, denoted m(), and its square is expressed as This sample deviation can be used as an alternative to Equation ( 2) for the variance-field observations, and requires to first establish the prediction of m().Different smoothing techniques [44][45][46] can be used to approximate m(), and the one adopted here is to introduce an additional GP with homoskedesity assumption. 30,36This tertiary GP will be denoted herein as GP m .The workflow for this updated stochastic emulation algorithm, denoted ER-SE, is shown in Figure 1 and summarized below.Note that the major difference between PR-SE and ER-SE lies in Steps 1 and 2, while Step 3 is identical for the two implementations.
Step 1: Preliminary estimation of the mean response GP m and approximation of the sample squared deviation: Train a GP model that establishes a crude approximation of the mean response using a homoscedastic nugget assumption and all available observations {  ,  () ;  = 1, … ,   ,  = 1, … ,   }.The GP m follows the Ho-R formulation (Table A1) discussed in Appendix A utilizing output selection  = ln(), number of training points N s = N t , homoscedastic nugget selection, and provides approximation for the mean across all training points m(  ), provided by Equation (A2).Formulation entails the hyper-parameter calibration given by Equation (A6) utilizing observation set composed of both the replicated and non-replicated samples, {  ,  () ;  = 1, … ,   ;  = 1, … ,   } and {  ,   ;  =   + 1, … ,   }, respectively.Then, obtain for each observation the squared deviation from the mean utilizing Equation ( 3) and the mean field approximation m(  ).
Step 3: Estimation of the response by GP y : Using τ2 () as unscaled variance component, formulate heteroscedastic GP model that approximates the response distribution across the domain of interest utilizing all available observations {  ,  () ;  = 1, … ,   ,  = 1, … ,   }.Implementation is identical to Step 3 of PR-SE algorithm.
The major difference between the variance estimators of the original (GP τ in PR-SE) and alternative (GP s in ER-SE) stochastic emulation formulations is that the original [utilizing Equation (2) as observations] relies on the pure sample variance, while the modified one [utilizing Equation (3) as observations] treats the mean estimate obtained from the tertiary GP model as the true mean (population mean) and approximates variance as the sample squared deviation from the mean.This difference allows the latter to accommodate the non-replicated samples for approximating both the heteroscedastic behavior (Step 2) and the final EDP distribution (Step 3).It also means that formulation can be adopted without any replications.These characteristics address the two challenges identified in Section 3. Figure 2 summarizes the utilized information in each step of PR-SE and ER-SE, illustrating that only the latter fully utilizes the information from all samples in all three steps.Note that while PR-SE can be used without any non-replicated samples, the best practice suggested in previous literature 32 is to facilitate both replicated and non-replicated samples.
Of course, it should be noted that the proposed method is heuristic in a way that it assumes that the mean estimate obtained from the crude homoscedastic nugget assumption, m(), gives a reasonably reliable estimate.This assumption is justified by past studies, 22,32 that have shown that for seismic applications the mean-field prediction is relatively robust to the choice of the variance model, especially compared to the level of variability observed in the sample replications.This was also observed in more general, non-seismic, applications investigated in ref. [30].Furthermore, note that if the final estimation of mean ỹ() (from Step 3) is significantly different from the initial estimate m() (from Step 1), then one may iterate Step 2−3 by replacing m() with the ỹ() until the two functions become more similar.Note that, as mentioned in ref. [30], this iteration is rather heuristic and is not guaranteed to converge.
Α final remark is warranted for the Gaussianity of the observations and potential prediction bias for the GP s and GP τ surrogate models.The sample variance follows a chi-squared distribution with k = n i degrees of freedom 46 and therefore the output observation used in GP s (for ER-SE) are expected to be more skewed than the observations used for GP τ (for PR-SE), deviating more from the underlying Gaussianity assumption.Nevertheless, the mean estimate of each GP is known to be the best linear unbiased estimator (BLUE) for given basis functions and covariance kernel assumption, even when the Gaussianity condition is violated.However, the log-transformation in Step 2 may introduce bias when converting the predictions back to the original space to obtain τ2 ().This bias is partially accounted for when estimating  2 () through the calibration of the variance scaling parameter  as detailed in Appendix A, that is, the average of variance across the domain is unbiased.This applies to both the ER-SE and PR-SE formulations, though it is expected to be of a greater utility for the case with larger potential bias ER-SE, due to the larger deviation of the observations from the underlying Gaussianity assumption.

Illustration for single degree of freedom application
To illustrate the key differences in the PR-SE and ER-SE implementations NLRHA for a single degree of freedom (SDoF) oscillator is utilized.The SDoF has a modal damping equal to 3% and a restoring force that corresponds to an elastic-perfectly plastic hysteresis model with yield displacement δ y = 2 cm and pre-to-post yield ratio of 0.3.For describing the acceleration time-history seismic excitation a stochastic point source model 47 is utilized, with characteristics identical to the ones adopted in study. 11This model provides sample realizations of the excitation using as inputs the moment magnitude M, and the rupture distance, r rup .The range for M is considered to be [5.5 7.5], while r rup is taken to be 20 km.
For the EDPs of interest, both the peak SDoF displacement and peak absolute acceleration are considered and results are presented for the median and logarithmic standard deviation.Reference estimates for these statistics are obtained using a large number of n r = 300 replications to accomplish high accuracy for the statistical estimates.The predictions for the EDP distribution statistics using PR-SE and ER-SE for a total of N sim = 70 simulations are examined.For PR-SE, 71.4% of the simulations corresponds to replications with n r = 5, leading to N t = 20, N r = 10.For ER-SE, two setups are considered, the first one is identical to the one examined for PR-SE and the second uses no replications.Training points are obtained using space-filling sampling form M, yielding independent set of samples for the two PR-SE variants.Figure 3  1 of PR-SE implementation (under homoscedastic assumption) is shown as the dotted line.Figure 4 presents the same results for the acceleration EDP. Figure 5 presents the curves corresponding to the median and median ± one logarithmic standard deviation for both EDPs (column of figure) for the same surrogate model cases presented earlier in Figures 3  and 4. Figure 5 facilitates ultimately a more direct comparison for the distribution results based on the individual statistics reported in Figures 3 and 4.
Results verify first of all the trends reported in ref. [32], indicating significant aleatoric uncertainty in the hazard description and heteroscedastic behavior, with level of variance in EDPs strongly dependent on seismic intensity.All implementations yield good accuracy for the median EDP predictions, but big differences can be observed for the logarithmic standard deviation estimates (right column of Figures 3 and 4), and, ultimately, the overall probabilistic characterization of the aleatoric response variability (Figure 5).With respect to the predictions for the EDP variability (logarithmic standard deviation) ER-SE slightly outperforms PR-SE under the partial replication setting, while the accuracy substantially improves for the ER-SE implementation without any replications.The comparisons for the partial replication setting show that, as expected, the enrichment of the sample coverage allowed by the use of non-replicated samples for the secondary surrogate model in ER-SE can offer tangible benefits.The comparison for the setting with no replications demonstrates that this coverage benefit can be further extended by removing replications altogether.More importantly, comparisons for both the EDP median and logarithmic standard deviation, reveal that there is no evident penalty in avoiding (or not being able − as discussed earlier) to use replications.
Finally, it is important to note that even under homoscedasticity assumption (PR-SE primitive case) the median EDP predictions appear to be quite accurate, especially considering the sample variability arising from high aleatoric uncertainty.This observation aligns well with the core assumption taken in PR-SE algorithm that allows the replacement of (  ) with m(  ) in Equation (3).Of course, the final PR-SE predictions improve upon this primitive mean, indicating that all suggested steps in the PR-SE implementation are essential components.

ILLUSTRATIVE EXAMPLE
The illustrative example revisits the examples discussed in refs.[22, 32] to allow a direct comparison between the original (PR-SE) and updated (ER-SE) stochastic emulation algorithms across different settings for quantifying seismic hazard and the underlying aleatoric variability.The example considers the approximation of the distribution for peak drift and absolute floor acceleration EDPs for a three-story concrete moment resisting frame (MRF), with seismic excitation described through different stochastic ground motion models.

Structural model
The model for the three-story concrete MRF building is described in detail in ref. [48].The uncertain structural model parameters 22,32 include the damping ratio ζ s and the characteristics of the constitutive laws for the concrete and steel materials: (1) the maximum compressive stress f c and the strain at maximum stress ε c for concrete, and (2) the modulus of elasticity E s , the yielding stress f y and the strain hardening ratio a for steel.Common characteristics across all floors are assumed for the steel and different ones for each floor for the concrete, following standard realistic construction applications, with f c1 , f c2 and f c3 denoting the compressive strength for the first, second and third floors, respectively, and ε c1 , ε c2 and ε c3 denoting the strain at maximum stress for the same floors.This leads to The input domain for each of these parameters is reviewed in Table 1 and is chosen so that it extends at least 2.5 standard deviations from the median of the probability distribution-also shown in Table 1-which is used later on in the seismic risk assessment implementation (Section 5.5).This guarantees that the domain of the surrogate model formulation is broad enough for the intended purposes of its use.

Excitation models and aleatoric uncertainty description
The seismic excitation is described using a stochastic ground motion model, established 47,49,50 by modulating a highdimensional stochastic sequence through functions that offer the desired frequency and time-domain characteristics for the earthquake acceleration time-history (excitation).These functions are parameterized through a group of functional variables (defining the ground motion model features), such as ground motion duration and arias intensity.For the probabilistic hazard quantification these variables are subsequently connected through predictive relationships to seismicity and local site characteristics, such as the moment magnitude, M, the rupture distance, r rup , the fault type, and the shear wave velocity for the top 30 meters of soil V s30 .Notations x g and x m will be used herein to describe, respectively, the vectors describing (i) the ground motion features and (ii) the seismicity characteristics.The predicting models relating x g to x m can be purely deterministic 47 or correspond to a probabilistic mapping p(x g |x m ). 49,50or selecting the EVs, x h , used to describe the seismic hazard in surrogate modeling context, three alternative options exist. 22These result in different definitions for the input x h (and subsequently for x), and to different selections for the source of aleatoric variability.As discussed in detail in ref. [22] the alternatives should be carefully evaluated in each application in order to limit the dimensionality of x h while maintaining a lognormal distribution under the influence of the aleatoric variability.The latter is critical for accommodating the underlying Gaussian assumption (for addressing aleatoric variability) invoked within the GP-based stochastic emulation strategy.The first option, denoted EV 1 , is to choose x h = x m , that is, EVs consisting of only the seismicity characteristics, with aleatoric uncertainty in the hazard description corresponding to both the stochastic sequence utilized in the ground motion model and the variability in the probabilistic mapping p(x g |x m ).This option reduces the dimensionality of the input but might face challenges related to the underlying assumption of Gaussian aleatoric distribution.The second option, denoted EV 2 , is to choose x h = x g , that is, EVs consisting of only the ground motion features, with aleatoric uncertainty originating strictly from the stochastic sequence.Note that if x g is known then the stochastic ground motion model is completely defined, and one does not need to additionally know x m in order to create a realization of the excitation, in other words x g is statistically sufficient to define the EDP distribution (|  ,   ) = (|  ,   ,   ).The Gaussian distribution assumption for the aleatoric variability has been shown to consistently hold in this case (with aleatoric variability originating only from stochastic sequence), but, depending on the ground motion characteristics, the dimensionality of the input might prohibitively increase.The final option, denoted EV 3 , is to consider a partitioning   = [ 1   2  ] and choose  ℎ = [   1  ], that is, EVs composed of the seismicity characteristics and a subset of the ground motion features, with aleatoric uncertainty corresponding to both the stochastic sequence and the variability in the probabilistic mapping for the remaining ground motion features ( 2  |  ,  partitioning of x g, EV 3 can reduce the dimensionality of the input while also maintaining an underlying Gaussian aleatoric distribution.Further details for these selections, including extensive discussion on the relevant implications when the surrogate model is subsequently used to quantify seismic risk can be found in ref. [22].Note that if predictive models are deterministic, then only EV 1 exists, with aleatoric uncertainty in this instance corresponding strictly to the stochastic sequence utilized in the ground motion model 32 and Gaussianity assumption illustrating good agreement.To establish comprehensive comparisons three different stochastic ground motion models covering all aforementioned options for the EV selection are considered here, reviewed in Table 2.The assumption of lognormal EDP distribution under the assumed (in each case) aleatoric variability has been shown to provide high statistical accuracy for all three options in previous studies. 22,32he first ground motion model, denoted GM 1 corresponds to a point source stochastic model, 47 with deterministic predictive relationships.The ground motion model characteristics identical to the ones adopted in study. 11EV 1 is adopted for the EVs with both the moment magnitude, M, and the rupture distance, r rup , considered as input parameters, leading to  ℎ = {,   }.The input domain for M is chosen as [5 8] and for r rup as [3 60] (km).This setup is identical to the one considered in ref. [32], and was chosen based on the intended probability models used for the risk assessment in that study.
The second ground motion model, denoted GM 2 corresponds to the record-based stochastic ground motion model developed by Rezaeian and Der Kiureghian. 49EV 2 is adopted for the EVs with ground motion features corresponding to the arias intensity I a , the significant duration D 5-95 , the time corresponding to 50% of the intensity t mid the associated median spectral frequency ω mid , the rate of change for that frequency  ′ , and the damping ratio ζ d for the excitation spectrum, leading to  ℎ = {  ,  5−95 ,   ,   ,  ′ ,   }.The predictive relationships for this model, relating x m to x g consisting of M, r rup , V s30 and fault type, can be found in ref. [49].The input domain for  ℎ is defined based on these predictive relationships within the range of model applicability, 49 M in [6 8] and r rup in [10 60] (km), and deterministic values for V s30 = 760 m/s and strike slip fault.This setup is identical to the one considered in ref. [22], and was chosen based on the intended probability models used for the risk assessment in that study.
The third ground motion model, denoted GM 3 corresponds to the record-based stochastic ground motion model developed by Vlachos, Papakonstantinou and Deodatis. 50The ground motion model parameter vector x g in this case is composed of the arias intensity I a , the duration of the excitation T d , the shape c d and scale d d parameters for the energy accumulation envelope, and for each of the two modes of a bi-modal excitation power spectrum (e = 1,2), the damping ratio {  } defines the energy-based variation.The predictive relationships for this model, relating x m to x g consisting of M, r rup , and V s30 , can be found in ref. [50].To reduce input dimensionality, EV 3 is adopted for the EVs with  ℎ = {,   ,   ,   } following recommendations in ref. [22].The input domain for  ℎ is defined as M in [6 8], r rup in [10 60] (km) while for {  ,   } this domain is chosen, similarly to GM 2 based on the underlying predictive relationships. 50As in the case for GM 2 , this setup is identical to the one considered in ref. [22], and was chosen based on the intended probability models used for the risk assessment in that study.

Implementation details and validation metrics
The three excitation models discussed in Section 5.2 lead to three different surrogate model formulations.The total number of model parameters, composed of x s and x h , is n x = 12 for GM 1 , n x = 16 for GM 2 , and n x = 14 for GM 3 .The investigation focuses on the comparison of the baseline PR-SE approach discussed in Section 3 and the proposed here modification ER-SE discussed in Section 4. Since both accommodate partial replications, we further investigated the benefit offered from replications by exploiting different mixtures of replicated and non-replicated training samples.In particular, two different values of replication size (n r = 5 and 10) are investigated, as well as the varying numbers of the total simulations (N sim = 1500, 1000, and 500).Similarly, chosen cases of N n are N n = 200, 400, 600, 800, 1000, and 1500.Among the listed N n and N sim values, the pairs that satisfy N n < N sim are investigated in the original PR-SE implementation, and N n = N sim cases (i.e., no replications) are additionally included for the proposed ER-SE implementation, as the latter became applicable after the enhancements offered in Section 4. Note that, for example, when (N sim = 1500, N n = 200, n r = 10) is chosen, the number of replicated points was N r = (N sim -N n )/n r = 130.Similarly, when (N sim = 1500, N n = 1500) is chosen, regardless of n r , all the samples in the training sets are non-replication points and N r is zero.The training points are selected using space filling Latin Hypercube sampling in the input domains X defined in Sections 5.1 and 5.2, while the formulation is repeated 15 times in each case for different realizations of the training samples.Linear basis functions are utilized for the influential input parameters, corresponding to the moment magnitude and/or arias intensity depending on the surrogate model variant (depending on whether these parameters are included in the EVs definition or not), while a generalized exponential correlation function was used for the correlation kernel.Hyper-parameter calibration is performed, as discussed in Appendix A, using a pattern-search algorithm.The value of the nugget was chosen to be larger than 1, reflecting the large degree of aleatoric variability in the seismic hazard description and the additional statistical variability that comes from the estimation of sample values for the mean and squared deviation from the mean using a limited number of replications.The latter is especially true for the no-replication case.Following ref. [22], to facilitate consistent comparison across the variant implementations, the validation is established with respect to the conditional EDP distribution (or statistics) for a specific seismic scenario and structural configuration {  ,   }, as this distribution can be estimated across all of them, and moreover this is a distribution that is frequently explicitly encountered in seismic loss assessment and performance based earthquake engineering applications. 51,52To simplify discussions, notation   = {  ,   } will be also used to describe the input vector used in the validation stage.The surrogate model for explanatory variables EV 1 provides directly the approximation of the distribution for y = ln(z), but for EV 2 and EV 3 , this conditional distribution is obtained through the total probability theorem, respectively, as: where p(ln()|  ,   ) in Equation ( 4) and p(ln()|  ,   ,  1  ) in Equation ( 5) are the distributions approximated in the respective implementations for selection of x h = x g (for EV 2 ) and  ℎ = [   1  ] (for EV 3 ), respectively.Leveraging the computational efficiency of the underlying surrogate model the desired conditional distributions and statistics can be readily calculated using MCS with large number of samples. 22alidation is performed using a test-sample setting, considering N v = 1000 samples obtained through Latin Hypercube Sampling (LHS) within X, {   ,    ;  = 1, … ,   } = {   ;  = 1, … ,   }.As in refs.[22, 32] the surrogate model predictions are then compared to reference results obtained using Monte Carlo estimates corresponding to n r = 200 replications.To evaluate the overall prediction error in the response distribution, the Kullback-Leibler (KL) divergence measure is used, quantifying the discrepancy between the estimated and reference probability distributions, as in ref. [53].The averaged KL value over the N v inputs is utilized as validation metric, given by: Details for estimation of the integral in Equation ( 6), including establishing the Monte Carlo approximation for (|  ,   ) are included in ref. [22].The estimation of p(ln()|  ) based on the stochastic emulation predictions is discussed in Appendix B. Note that the KL divergence is invariant to transformation for the EDP, so whether it is calculated as difference of the distributions for z or ln(z) the result is identical.Estimation for ln(z) as detailed above is preferred, as this avoids any transformation for the stochastic emulation Gaussian predictions.The KL divergence values are dimensionless and have only a lower bound (zero), attained when the compared distributions are identical.To better frame the divergence values, the reference comparisons discussed in ref. [32] may be used.In the case that the two distributions examined are Gaussians and have the same mean, but their standard deviations differ by 0.1, then the KL divergence value is 0.14.On the other hand, if the standard deviation is the same, but the mean differs by 0.1 standard deviations, then the  KL divergence value is 0.1.If the discrepancy is 0.05 instead of 0.1, the KL divergence values for the two aforementioned cases are 0.07 and 0.051, respectively.Additionally, the first and second order EDP statistics, corresponding to median and logarithmic standard deviation are compared, utilizing as validation metric the correlation coefficient (cc) between surrogate model predictions and reference values.For the logarithmic standard deviation, the cc over the test-sample is estimated as: )) 2 where s ( with (

Validation results and Discussion
The validation results are presented in Figures

Percentage of training points with no replications (%)
F I G U R E 7 Averaged KL divergence across all floors for inter-story drift EDPs for PR-SE (left column) and ER-SE (right column) implementations for ground motion GM 1 with explanatory variables EV 1.The validation results are shown for different percentage of samples with no replications (abscissas), for N sim equal to (A) 500 and (B) 1500, and for n r equal to 5 (first row) and 10 (second row).Box plots characteristics are identical to the ones in Figure 6.EDP, engineering demand parameter; KL, Kullback-Leibler.
replications (N n /N sim ×100), where N n ∈ [200, 400, 600, 800, 1000, 1500] and N sim ∈ [500, 1000, 1500] as previously discussed, representing different computational budget allocation scenarios.Note that 100% (i.e., "no replication" implementation) is presented only for ER-SE approach because the implementation without replication cannot be accommodated by the previous PR-SE implementation.Figure 6 shows the results averaged across all EDPs for GM 1 with EV option EV 1 and for all three considered N sim values for n r = 10.This figure allows an easier comparison of the performance for different cases with respect to the total computational budget, but averages across all EDPs.In contrast, the remaining figures present results separately for the average (across floors) inter-story drift and absolute acceleration EDPs, but only for selective cases for N sim .Figures 7, 9, 11 present results for the drift EDPs for the three ground motion models considered in Table 2, respectively, while Figures 8, 10  2. For these figures, only the mean performance across 15 repetitions is shown, and the EDP results are averaged across all floors.To allow easier comparisons the same scale is used in the y-axis for the implementations that use the same computational budget (same N sim ) but to allow for better resolution across the different variants examined for each N sim that scale is modified across the different N sim cases.
Results in Figure 6-12 show that all observed cases show good prediction accuracy for N sim = 1500, with AD KL values being consistently smaller than 0.1.As expected, the prediction accuracy for the target distribution reduces (AD KL increases) as N sim reduces, but still, the majority of results (apart from some cases for the PR-SE implementation) are lower than 0.15.The cc results in Figures 13-15 show excellent match to the reference for the predicted median response (values of cc m close to 1) but larger differences for the logarithmic standard deviation (lower cc s values), agreeing with the results reported earlier in Figures 3-5 and the identified challenges for accurately approximating the heteroscedastic variance.Comparing the same implementation for drift and acceleration EDPs, prediction of the latter appears to be more challenging.The results for n r = 5 tend to yield better performance, as this choice allows for the use of a larger number of training points.These trends with respect to the impact of n r , N sim and N t , on the stochastic emulation performance and the relative greater challenges for predicting the acceleration EDPs agree with the trends reported in previous studies. 22,32nterested readers are referred to these studies for more detailed discussions and comparisons for these specific aspects of the stochastic emulation implementation.
The more interesting comparison is, of course, with respect to the different implementations PR-SE and ER-SE.The trends differ across the different ground motion and EV selection cases.These differences appear to be mostly related to the EV selection, with the implementation corresponding to EV 1, and therefore emulator input corresponding directly only to the features with respect to which the validation is performed, exhibiting different trends compared to the implementations using EV 2 or EV 3 .The results for EV 1 selection (GM 1 ) in Figures 6-8 and 13 show that, as the percentage of training points without replications increases, the accuracy of ER-SE continues to increase until the percentage reaches 100%, while that of PR-SE might start to decay from 40%−60%, depending on what is the value of N sim .This trend is especially evident for smaller number of simulations (smaller total computational budget), demonstrating that the proposed enhancements can support substantially improved computational efficiency.The improvements are consistent for all EDP types and across all replication sizes.Figure 13, in particular, clearly demonstrates that the reduction of accuracy for PR-SE is related to the challenge in predicting the variance.The prediction accuracy for the median consistently increases as the percentage of non-replication points increases while that of the variance decays.The latter can be explained by the reduction of information on the variance that is fed into the secondary GP for PR-SE.However, albeit the decay of variance prediction, exploration offered by non-replication points provides a larger benefit in the median prediction as evidenced by the increasing trend of the accuracy.Still, the case of 80% non-replication shows that a substantial error in the variance estimation can actually lead to poor median prediction.On the other hand, ER-SE utilizes fully the information from the training points without replications for training the secondary GP, and this allows ER-SE to show a consistent variance prediction accuracy across decreasing values for the percentage of no replication points.The predictions for the median initially follow a similar trend to the PR-SE approach, but it continues to increase until it reaches the 100% non-replication case.This is explained by non-degraded performance in the variance prediction.The variance performance is similar when the replication points dominate the training samples, suggesting that addition of non-replication points does not provide any benefit in such cases.However, as the proportion of non-replication points increases, they clearly dominate the variance predictions and restore the level of accuracy observed in the heavily replicated cases.It is important to note that the implementation without any training points with replications (100% for the ratio in the x-axis across all figures) shows the best performance, implying that the improved coverage of the training domain achieved by circumventing replications offers meaningful benefits.This agrees with the trends for the SDoF implementation examined in Section 4.2.
The benefits of ER-SE become smaller, and consistently appear only for the lower computational budget of N sim = 500, for the other two ground motion models and EV selections, that is, GM 2 with EV 2 (Figures 9, 10, and 14) and GM 3 with EV 3 (Figures 11, 12, and 15).For the larger N sim values, cases are observed in which PR-SE outperforms the ER-SE, though ER-SE ends up giving comparable performance to the best instance of PR-SE if the total computational budget is assigned to cases with no replications.In particular, Figures 14 and 15 indicate that PR-SE and ER-SE have comparable performance both for median and variance, and that the performance rank between the two is manifested primarily as sample fluctuations in the variance prediction, rather than meaningful performance trends.The comparisons still indicate a preference towards ER-SE over PR-SE, though at a smaller degree compared to the previous instance, as only the former can accommodate implementation without any replications for which, according to the dominant trend, we expect the best performance, while circumventing challenges of creating the replications as reported earlier.Looking into more detail on the replication size n r , for ER-SE small only impact is observed, while PR-SE appears to prefer smaller replication sizes, n r = 5, especially when training sample size is small, N sim = 500.This trend is consistent with the results in refs.[22, 32] and implies that the loss in the domain coverage (fewer training points) for a large replication size has a more negative impact in PR-SE, presumably because of the reduction of the training points available for the secondary surrogate model development-something that is avoided in ER-SE by leveraging all training points (even the ones without replications) for that development.

Implementation and validation for seismic risk estimation
The validated metamodels are now used for the seismic risk assessment of the three-story structure.The setup is identical to the one adopted in refs.[22, 32].For quantifying the seismic risk, the probability distributions for the structural parameters are the ones presented in Table 1, while for the uncertain hazard parameters the following are adopted: for GM 1 based on ref. [32] the rupture distance, r rup , a lognormal distribution with median 15 km and a coefficient of variation 50% is assumed, whereas for the moment magnitude, M, the Gutenberg-Richter model truncated in interval [M min , M max ] = [5 8] is utilized, leading to the probability density function () =    −   ∕[ −   min −  −   max ].The seismicity parameter for M is taken to be   = 0.9log  (10).For convenience these probability distributions for the structural and hazard parameters will be denoted as (  ) and (  ), respectively.For GM 2 and GM 3 implementations, and based on ref. [22] the statistics for r rup are adjusted to median 20 km and coefficient of variation 30% while the truncation interval for M to [6 8].These differences are necessitated by the domain of applicability for each of the three ground motion models.Risk is quantified as the probability that the EDP will exceed a certain threshold β, corresponding to the complementary cumulative distribution function (CCDF).For estimating this risk, the total probability theorem is utilized to propagate the various sources of uncertainty, while Monte Carlo (MC) simulation is used for the numerical estimation.Utilizing NLRHA for the actual excitation and structural model, the reference seismic risk estimate for the EDP of interest z, and threshold β, is expressed as: where N h is the total number of samples used for the MC estimation, with z h representing the hth response sample corresponding to structural configuration  ℎ  ∼ (  ), seismic hazard description  ℎ  ∼ (  ) and aleatoric hazard description as prescribed by the underlying excitation model.I[.] stands for the indicator function which is one if the quantity within the brackets holds, else it is zero.Utilizing the surrogate model approximation for the EDP distribution, it is straightforward to show that the seismic risk is approximated as follows 22,32 : where and N q is the total number of samples used for the MC estimation, Φ[.], stands for the standard Gaussian cumulative distribution function, x q is the qth sample of the surrogate model input vector, following the appropriate probability distributions based on the definition of this vector (selection of EV): for explanatory variables EV   8) requires the execution of NLRHAs, while ỹ(  ) and  2 (  ) +  2 (  ) in Equation ( 9) the simple evaluation of the (already calibrated) surrogate model, accommodating substantial computational savings even if N q >> N h .Similar to refs.[22, 32] to reduce any MC variability in the estimates, both N h and N q are assigned a very high value, equal to 5000 samples, and common random numbers are used for the hazard and structural parameters to facilitate the comparison between the curves (establishing a consistent MC error estimation).
The CCDF prediction obtained by Equation ( 9) is compared with the reference analysis obtained by MCSs using Equation (8), and the results are presented selectively in Figures 16-18.The predictions from both no replication (ER-SE) and partial replication (ER-SE and PR-SE) implementations are presented, separately for each inter-story displacement EDP and the absolute floor acceleration EDP. Figure 16 shows the prediction under ground motion GM 1 and implementation setup is N sim = 1000 for no replication case (part A) and additionally n r = 10 and N t = 400 for partial replication cases (parts B and C).Similarly, Figures 17 and 18 respectively present the prediction under ground motion GM 2 and GM 3 for N sim = 1500, n r = 5, and N t = 600, where the latter two values apply only to partial replication implementations.Note that results corresponding to higher computational budget are presented for GM 2 and GM 3, since for lower total computational budgets, the accuracy is decreased due to the higher input dimension associated EV 2 and EV 3 .
As expected from the high accuracy in the previous validation results, the risk estimates of all the inspected cases show good match with the reference curves.This is an important outcome, indicating that the validation setting can provide a faithful indication of the quality of the risk estimates when the trained surrogate model is subsequently used in a risk assessment setting.Minor improvements in the accuracy are observed in the drift EDP by introducing proposed method with no replication strategy.Overall, the risk assessment results agree with the ones reported in previous studies, 22,32 demonstrating the benefits (high accuracy with great computational efficiency) that can be achieved through stochastic emulation strategies within seismic risk and loss estimation.

CONCLUSIONS
This paper offered an advancement of the recently developed partial-replication-based stochastic emulation framework, established by incorporating more holistically information from training points without replications.The original framework leverages a secondary GP to estimate the heteroscedastic variability of the primary GP model, with the latter providing the desired approximation for the EDP response distribution.In this setting, among the partially replicated training data, only the portion with replications is utilized for the inference of the variance-field.The proposed enhancement additionally considers the non-replicated sample portion for this objective, by training the secondary GP based on observations of the deviation of non-replication samples from an approximate (primitive) mean estimate.This primitive mean estimate is obtained leveraging a homoscedastic GP.Compared to the previous partial replication approach, the enhanced approach in this study (i) significantly increases the contribution of no replication samples in estimating the heteroscedastic variance, while it ultimately (ii) enables an implementation without any replications, accommodating applications for which creating such replications across the entire surrogate model input domain can be challenging.Such merits were demonstrated for earthquake engineering applications.
A comprehensive case study was presented to validate the proposed method, considering three different setups with respect to the aleatoric seismic hazard uncertainty description within NLRHA-based seismic risk assessment.The case study results demonstrated that the complete engagement of non-replication training points can provide a better overall performance.Though differences were reported across the different setups, the enhanced approach shows similar or better performance to the original formulation.Noticeable performance improvement was observed as the proportion of the non-replicated samples increases, especially when the total computational budget (total number of NLRHA simulations) is maintained moderately low.Results indicate that the enrichment of the training domain coverage allowed by the use of non-replicated samples for the secondary surrogate model can offer tangible benefits and that this benefit can be further extended by removing replications altogether.Noticeably, the enhanced approach consistently provided very good performance even for implementation without any replications, demonstrating that it offers an attractive alternative to the existing approach or even a viable solution for applications for which that approach could not be adopted (inability to perform replications).Interesting future research topics include the adaptive selection of simulation experiments for improving prediction accuracy across both the mean and variance stochastic fields and the extension of the stochastic emulation framework to predict the correlation of the EDP responses.

2
Schematic comparison of the flow of information in ER-SE and PR-SE.

3
Displacement EDP statistics (left column for median and right column for logarithmic standard deviation) for implementations with replications (top row) and without replications (bottom row).Both the reference results and the approximation for PR-SE and ER-SE are shown.EDP, engineering demand parameter.

4
presents the estimates of the median (right column) and the logarithmic standard deviation (left column) for the displacement EDP for the PR-SE and ER-SE implementation with replications (top row) or the ER-SE implementation with no replications (bottom row).The observation data utilized for the surrogate model development are also shown, with training points corresponding to replications depicted separately.Additionally, the primitive estimation of median obtained after Step Acceleration EDP statistics (left column for median and right column for logarithmic standard deviation) for implementations with replications (top row) and without replications (bottom row).Both the reference results and the approximation for PR-SE and ER-SE are shown.EDP, engineering demand parameter.

5
EDP response (left column for displacement and right column for standard deviation) for implementations with replications (top row) and without replications (bottom row).Both the reference results and the approximation for PR-SE and ER-SE are shown for the median and the median ± one logarithmic standard deviation.EDP, engineering demand parameter.
the three parameters for determining the modal participation of each of the modes {

𝑗𝑣)
corresponding to the reference statistics and s(   ) corresponding to the emulation-based estimate.Identical equation holds for the cc of the median EDP, cc m .Details for the estimation of the relevant statistical quantities for the different stochastic emulation variants are included in Appendix B.
, 12 present results for the absolute floor acceleration EDPs.Additionally, the effect of replication sizes (n r ) is investigated in Figures 7-12 showing results for both n r = 5 and n r = 10 values.The remaining Figures 13-15 present the cc validation metrics for the predicted median cc m (top row) and logarithmic standard deviation cc s (bottom row) for both drift (left column) and acceleration (right column) EDPs for different N sim values for both ER-SE and PR-SE implementations for n r = 10.Each of these figures corresponds to a different ground motion (and EV) selection from the ones reviewed in Table

F I G U R E 8
Averaged KL divergence across all floors for absolute floor acceleration EDPs for PR-SE (left column) and ER-SE (right column) implementations for ground motion GM 1 with explanatory variables EV 1.The validation results are shown for different percentage of samples with no replications (abscissas), for N sim equal to (A) 500 and (B) 1500, and for n r equal to 5 (first row) and 10 (second row).Box plots characteristics are identical to the ones in Figure6.EDP, engineering demand parameter; KL, Kullback-Leibler.

F I G U R E 9
Averaged KL divergence across all floors for inter-story drift EDPs for PR-SE (left column) and ER-SE (right column) implementations for ground motion GM 2 with explanatory variables EV 2. The validation results are shown for different percentage of samples with no replications (abscissas), for N sim equal to (A) 500 and (B) 1500, and for n r equal to 5 (first row) and 10 (second row).Box plots characteristics are identical to the ones in Figure6.EDP, engineering demand parameter; KL, Kullback-Leibler.

F I G U R E 1 0
points with no replications (%) Averaged KL divergence across all floors for absolute floor acceleration EDPs for PR-SE (left column) and ER-SE (right column) implementations for ground motion GM 2 with explanatory variables EV 2. The validation results are shown for different percentage of samples with no replications (abscissas), for N sim equal to (A) 500 and (B) 1500, and for n r equal to 5 (first row) and 10 (second row).Box plots characteristics are identical to the ones in Figure6.EDP, engineering demand parameter; KL, Kullback-Leibler.

F I G U R E 1 1
points with no replications (%) Averaged KL divergence across all floors for inter-story drift EDPs for PR-SE (left column) and ER-SE (right column) implementations for ground motion GM 3 with explanatory variables EV 3. The validation results are shown for different percentage of samples with no replications (abscissas), for N sim equal to (A) 500 and (B) 1500, and for n r equal to 5 (first row) and 10 (second row).Box plots characteristics are identical to the ones in Figure6.EDP, engineering demand parameter; KL, Kullback-Leibler.

3 F I G U R E 1 7
No replication (ER-SE) (B) Partial replications, 40% samples without replications (ER-SE) (C) Partial replications, 40% samples without replications (PR-SE) Comparison of CCDF curves for the high fidelity model (reference curve) and the surrogate model approximations under the ground motion model GM 2 with explanatory variables EV 2. Shown are the results for selective cases of no replication and partial replication (N t = 600 and n r = 5) implementations with N sim = 1500.Results are presented separately for the inter-story displacement EDP (left column) and the absolute floor acceleration EDP (right column).CCDF, complementary cumulative distribution function; EDP, engineering demand parameter.

3 F I G U R E 1 8
No replication (ER-SE) (B) Partial replications, 40% samples without replications (ER-SE) (C) Partial replications, 40% samples without replications (PR-SE) Comparison of CCDF curves for the high fidelity model (reference curve) and the surrogate model approximations under the ground motion model GM 3 with explanatory variables EV 3. Shown are the results for selective cases of no replication and partial replication (N t = 600 and n r = 5) implementations with N sim = 1500.Results are presented separately for the inter-story displacement EDP (left column) and the absolute floor acceleration EDP (right column).CCDF, complementary cumulative distribution function; EDP, engineering demand parameter.

Table A1
Train primary surrogate model GP y to estimate final mean and variance fields using {x i , y i(k) ; i=1,…,N r , k=1,…n r } and {x i , y i ; i= N r +1,…,N t } and formulation He-R + unscaled variance estimates .
* Points with replications are not necessary Structural parameter input domain for the surrogate model development and the associated probability distributions used in the risk assessment.
TA B L E 1 Different excitation models considered in the numerical examples.
1  ).By appropriate TA B L E 2 Comparison of CCDF curves for the high fidelity model (reference curve) and the surrogate model approximations under the ground motion model GM 1 with explanatory variables EV 1. Shown are the results for selective cases of no replication and partial replication (N t = 400 and n r = 10) implementations with N sim = 1000.Results are presented separately for the inter-story displacement EDP (left column) and the absolute floor acceleration EDP (right column).CCDF, complementary cumulative distribution function; EDP, engineering demand parameter.
|  ).Note that z h in Equation (