Statistical modeling of fully nonlinear hydrodynamic loads on offshore wind turbine monopile foundations using wave episodes and targeted CFD simulations through active sampling

Accurately determining hydrodynamic force statistics is crucial for designing offshore engineering structures, including offshore wind turbine foundations, due to the significant impact of nonlinear wave – structure interactions. However, obtaining precise load statistics often involves computationally intensive simulations. Furthermore, the estimation of statistics using current practices is subject to ongoing discussion due to the inherent uncertainty involved. To address these challenges, we present a novel machine learning framework that leverages data-driven surrogate modeling to predict hydrodynamic loads on monopile foundations while reducing reliance on costly simulations and facilitate the load statistics reconstruction. The primary advantage of our approach is the significant reduction in evaluation time compared to traditional modeling methods. The novelty of our framework lies in its efficient construction of the surrogate model, utilizing the Gaussian process regression machine learning technique and a Bayesian active learning method to sequentially sample wave episodes that contribute to accurate predictions of extreme hydrodynamic forces. Additionally, a spectrum transfer technique combines computational fluid dynamics (CFD) results from both quiescent and extreme waves, further reducing data requirements. This study focuses on reducing the dimensionality of stochastic irregular wave episodes and their associated hydrodynamic force time series. Although the dimensionality reduction is linear, Gaussian process regression successfully captures high-order correlations. Furthermore, our framework incorporates built-in uncertainty quantification capabilities, facilitating efficient parameter sampling using traditional CFD tools. This paper provides comprehensive implementation details and demonstrates the effectiveness of our approach in delivering reliable statistics for hydrodynamic loads while overcoming the computational cost constraints associated with classical modeling methods.


| INTRODUCTION
2][3] As wind turbine power levels continue to increase, there is a growing demand for larger OWT structures.However, a significant challenge faced by the offshore wind industry is to ensure the reliability and survivability of these structures while simultaneously reducing costs.[6][7][8][9] Accurately capturing the aero-hydro-servo-elastic loads on OWT structures is essential.Specifically, precise analysis of the ultimate limit state (ULS) and fatigue limit state (FLS) enables less conservative designs and, consequently, cost reduction.To adhere to current best practices, such as designing based on the extreme loads corresponding to a 50-year return period, [10][11][12] engineers require prior knowledge of the short-term and long-term extreme load statistics on critical structural components.The extreme value theory is crucial for estimating the distribution of long-term extreme loads by leveraging information from short-term extreme loads. 13While the IEC standard provides recommendations for estimating 50-year loads using statistical extrapolation methods, 14 it allows flexibility for designers to choose the specific details of the statistical procedure. 126][17][18] These techniques are accompanied by fitting functions, including normal, Weibull, three-parameter Weibull, Gumbel, and Lognormal distributions, to extrapolate the data.The choice of statistical extrapolation methods significantly impacts estimation results, although limited research has compared their effects.Addressing the challenge of handling large variations in environmental conditions, Monte Carlobased sampling procedures offer a more realistic approach. 19,20However, basic randomization may exhibit slow convergence in achieving desired 50-year return period estimates.Implementing these methods in engineering design poses challenges, with a primary focus on validating selected extrapolation methods through comparison with measurement or simulation data to ensure accuracy and reliability.Additionally, capturing dynamic load behavior and assessing extrapolation method performance necessitates a substantial number of transient simulations and/or physical experiments.
For industry and engineering purposes, semi-empirical analytical models are available and are commonly used for the determination of extreme loads in OWT structures. 21,22Although lower fidelity numerical models are fast, they cannot capture high nonlinear phenomena while the peak loads are captured accurately.This is a critical issue for the reliable design of the offshore wind turbine foundations.In the literature, several studies are available that utilize CFD simulations to examine the monopile-type foundation in breaking waves, [23][24][25] analyze the slamming effects, 26,27 investigate the flow and scour around the monopile 28,29 and identify the second-order hydrodynamic loads. 30,31The high-fidelity simulations provide an accurate solution yet demand expensive computational cost.3][34] Developing surrogate models to capture the behavior of offshore wind turbines is essential during the design stage.These models offer a cost-effective alternative to numerical modeling and experiments, enabling efficient and reliable design processes.They support various analyses, including sensitivity analysis, optimization, uncertainty quantification, and statistical reconstruction for defining ULS and FLS.[42][43][44] In addition, scientists have addressed the question of experimental design for these studies-what waves to simulate-with a number of methods, including stochastic wavegroups, 45,46 critical wavegroups, [47][48][49][50] equivalent waves, 51 reduced order wavegroups, 40,52,53 and Karhunen-Loève (KL) wave episodes. 54While previous studies have explored surrogate models using methods like polynomial-chaos expansion and Kriging, 55 there is a growing interest in utilizing machine learning approaches.However, according to Elyasichamazkoti and Khajehpoor, 56 most existing studies develop surrogate models focusing on specific areas such as fault detection, wind speed and power forecasting, power optimization, and control.Limited research has been conducted on creating surrogate models that accurately capture the dynamic response of offshore wind turbines.For instance, Ahmad et al 57 utilized artificial neural networks to develop a surrogate model for the aero-hydro-servo elastic performance of hybrid floating offshore wind turbines.The model was trained on mid-fidelity data; however, there is a need to develop surrogates that can accurately capture extreme dynamic phenomena by utilizing high-fidelity datasets.This work introduces a novel framework for modeling the short-term probability distribution of extreme loads on the monopile foundation of OWTs across several sea states.As the increasing power levels of wind turbines require larger monopiles with significant diameters, the nonlinear wave-monopile interaction, especially in breaking wave conditions, is crucial to consider.Therefore, we focus on the hydrodynamic loads.This study takes the advantage of the predictive capabilities and the computational simplicity of the GPR machine learning method jointly with the efficiency of the active sampling method to develop a surrogate model.The surrogate maps the relationship between irregular wave episodes and hydrodynamic force time series, enabling the estimation of extreme load statistics for various sea states through Monte Carlo sampling.The surrogate model is trained using high-fidelity load datasets from CFD simulations, capturing strong nonlinear hydrodynamic effects like wave breaking and snap loads.The study validates the predicted probability distributions against steady-state CFD simulations, confirming the method's validity and significant computational cost savings.A unique feature of the framework lies in the employed techniques.Specifically, the spectrum transfer approach utilizes simulation data from multiple sea states to describe both quiescent and extreme waves, reducing the need for additional numerical simulations.To handle time series prediction via the surrogate modeling, dimension reduction techniques such as KL and PCA are employed, while the GPR model accounts for nonlinear dependencies, including those associated with extreme events.While similar techniques have been used in ship hull load calculations, 54 wave episode methods have not been applied to estimate hydrodynamic loads on monopile structures before.
Overall, this work presents a pioneering framework to model short-term extreme hydrodynamic loads statistics on monopile foundations showcasing its potential in improving the understanding and prediction of extreme loads in offshore wind turbine design.
We structure this paper as follows.In Section 2, we briefly recapitulate the design of KL wave episodes first described in Guth and Sapsis. 54In Section 3, we describe the techniques we use to build the surrogate model: GPR, active sampling, and spectrum transfer.In Section 4, we describe our CFD simulation design in the OpenFOAM software, and describe some of the challenges we faced adapting this approach to nonlinear waves.In particular, we describe both how we account for nonlinear dynamics when the KL wave episodes have Gaussian statistics, as well our approach to generating steady state validation data.Finally, in Section 5, we present results for three sea states with H s f5m,9m,13mg to represent quiescent, intermediate, and very extreme seas, respectively.9][60] Additionally, we include Appendix A with details about the CFD simulations setup and the characteristics of examined wave episodes.

| IRREGULAR WAVE EPISODES
Our goal is to estimate the hydrodynamic load statistics for an offshore wind turbine monopile foundation subjected to a selected sea state.Traditionally, a sea state realization (i.e., steady state) is approximated as an irregular wave record with duration from 30 min to 3 h.To avoid the enormous computational cost involved with CFD simulations of this duration, we instead perform a number of shorter simulations with carefully selected irregular wave episodes, which are defined from the wave spectrum of the examined sea state.We initially focus on the selection of the irregular wave episodes.However, we cannot create a GPR surrogate model to learn the relationship between the wave episode time series and hydrodynamic force time series.Therefore, we need to represent the time series in a reduced order form.
We follow Guth and Sapsis 54 for constructing irregular wave episodes using a KL basis and a parametrizing coefficient vector.The choice of basis corresponds to the choice of sea state, while the choice of coefficients corresponds to a particular sample from that sea state.In the remainder of this section, we will briefly recapitulate this method (Figure 1).

| Wave episode representation
The sea surface elevation xðξ,tÞ is a stochastic process, which we assume to be zero mean, statistically stationary, and described by Gaussian statistics.Additionally, we assume that waves are long-crested (unidirectional).We use the JONSWAP spectrum 61 to describe the time series xðξ ¼ 0, tÞ at a fixed spatial location: where constants are given by and F is the wind fetch, g is the acceleration due to gravity, and U 10 is the wind speed at 10 m elevation.In this work, we consider multiple sea states, each of which with peak period T p ¼ 2π ωp ¼ 8s and with a range of significant wave height crest to trough of 5m ≤ H s ≤ 13m ¼ 8s, while the significant wave height H s ¼ 5 m in the mild sea state and H s ¼ 13 m in the extreme sea state.The option to examine mild and extreme sea states is taken to emphasize the nonlinear effects associated with high and steep waves.
8][49] In the random phase method, the frequency space is discretized between ½ω min , ω max (with spacing δ ω ), and the recovered signal is given by where ω i is sampled uniformly between ½ω iÀ1 , ω i , ϕ i is a random phase drawn uniformly on ½0,2π, and a i is coefficient given by However, the random phase model does not provide the low dimensional parametrization necessary to apply ML techniques.Instead, we follow Guth and Sapsis 54 in building a basis set corresponding to a particular sea state and interval.We present the KL theorem 62,63 : Theorem 2.1 Karhunen Loève.Consider the stochastic process xðtÞ that is zero mean and square integrable on the probability space ðΩ,F ,PÞ.Define the covariance function with corresponding integral operator over the interval ½0,T, Then by Mercer's Theorem for every interval ½0,T the operator T Kx has an orthonormal basis of eigenvectors fê i,T ðtÞg and corresponding eigenvalues fλ i g.Moreover, the coefficients are centered orthogonal random variables: Furthermore, we can expand the random process xðtÞ as In summary, the eigenvectors of the spatial covariance matrix of the sea surface form an orthonormal basis.The decomposition of xðtÞ onto this basis produces a set of centered, orthogonal (in the random sense) coefficients.In particular, we can change back and forth between the function representation xðtÞ,t ½0, T and the coefficients representation α i , i ¼ 1,2,:::.In the literature, this procedure is also known as principle component analysis (PCA), proper orthogonal decomposition (POD). 64For this work, we will always refer to this general technique as KL.
We truncate the KL expansion in Equation ( 8) in a finite number of modes, n.In this way, we may represent the stochastic process on the interval ½0,T as an n-dimensional vector α of KL coefficients, each component of which is an orthogonal random variable with variance λ i .Each irregular wave episode corresponds to a distinct choice of coefficients.
Finally, we note that we stochastically extend the wave episode outside of the interval ½0, T by use of a Gaussian process extrapolation technique. 54This is important for physical realization of the sea surface elevation over an extended domain, and for the initialization phase of the OpenFOAM simulation.

| Wave episode generation
Applying the KL expansion, we represent each wave episode as a truncated series of KL mode coefficients, the vector α.The truncation order, n, determines both what fraction of energy included in the wave episode is captured by the KL reduced order wave, as well as the dimensionality of the reduced wave episode.For this work, we chose a truncation order of n ¼ 3 to balance fidelity against dimensionality, and in Section 3.2, we describe our method for choosing the vector α.
We follow the same technique described in Guth and Sapsis 54 to convert from the coefficient vector α to the OpenFOAM input.As we describe in Section A.2.2 of Appendix A, the OpenFOAM wave maker requires a series of linear waves with prescribed amplitude, period, and phase.These prescribed wave components are fixed on the interval ½20,52 s, where T ¼ 32 s is the KL interval length and T pre ¼ 20 s is the numerical initialization time associated with the ramp time in OpenFOAM simulations.We emphasize, that while OpenFOAM is uses these linear waves to define the boundary conditions, especially at the inlet, within the numerical wave tank the wave evolution is fully nonlinear.
Then, we extrapolate the signal onto the interval ½0,20 using the stochastic prelude technique described in Guth and Sapsis 54 that allows for statistically-consistent extrapolation.Specifically, this technique generates encounter conditions, which are statistically consistent with the sea spectrum but also transition smoothly to the prescribed wave episode.Finally, we perform a discrete Fourier transform (DFT) on this extended time series, and retain the N ¼ 90 Fourier components with greatest magnitude.This DFT truncation order was chosen to balance simulation costs against reconstruction fidelity.
To summarize, OpenFOAM simulates the wave episodes for 52 s, where the first 20 s are an initialization and ramp-up period and the rest is the real simulation time.We note that during data analysis, we drop the prelude and shift the time scale forward by T pre ¼ 20 s so that our data records always correspond to the interval ½0,32 s.

| Breaking wave criterion
Before performing the CFD simulations, we want to have a priori-knowledge whether the analytical irregular wave episodes appear breaking wave behavior.They are assessed in relation to the breaking wave criterion, 65 which is defined by where H b is the breaking wave height, T p is the peak wave period, L p is the wave length, k is the wave number, and d is the water depth.For irregular waves, 0.12 < A < 0.18.The wave number, k is estimated from the dispersion relation: To employ the breaking wave criterion for the examined wave episodes, T p = 8 s, d = 33 m, and the breaking wave height, H s , is equal to the maximum wave height in the 32 s duration irregular wave episode.The values are summarized in Appendix A: Table A2, Table A3, and Table A4.
Figure 2 presents the wave breaking assessment of the analytical wave episodes.The left plot refers to the wave episodes obtained from the quiescent sea state.Most of the waves are below the breaking limit.Wave episodes 310 and 451 constitute the exception, with the former being just above the lower breaking limit and the latter being above the upper breaking limit; thus, it is very likely to break.The right plot refers to the wave episodes obtained from the extreme sea state, and they are all above the upper breaking limit; therefore, they are expected to break.
While in Figure 2, we presented the simulated wave episodes compared to the breaking limit; in Figure 3, we compare the recovered sea states in comparison to the breaking limit.We see that for the quiescent H s ¼ 5m case, the significant wave height is well below the wave breaking limit, while for the very extreme H s ¼ 13m, the significant wave height nearly exceeds the upper limit.In the intermediate H s ¼ 9 m case, the significant wave height is approximately the lower limit, suggesting that nonlinear breaking effects will be important only for fraction of the steeper waves encountered in the steady state.

| Hydrodynamic force representation
In OpenFOAM, while the structure is modeled to interact with the irregular wave episode, a number of kinematic and dynamic quantities are recorded.We choose the total hydrodynamic force in the x-direction as the primary quantity of interest.An important obstacle we encounter is that the total force is a time series.That is to say, we cannot describe the force-on-structure across the whole interval ½20,52 s with a single scalar quantity.In order to fit the output into the optimal experimental design framework, we first represent the time series as a low dimensional vector q, and each component, q i , of the vector is treated separately.Specifically, we employ the following reduced order representation for the force: where the q i are the reduced order coefficients and the μi,T ðtÞ are the force modes.We represent each q i as a separate output component.We use n out ¼ 12 modes to describe the force, chosen to recover more than 99% of the signal energy.For this problem, we compute the output modes by applying a separate KL expansion to the training data, 54 a set of force-on-structure time series associated with the particular wave episodes we simulated.We additionally note that, while the reduced order model used to represent the force is linear, we are able to represent non-Gaussian loads and load statistics through the q i model, described in the next section.
The analytical wave episodes are located based on their wave breaking limit.Left: At the quiescent sea state (H s ¼ 5m), the majority of the wave episodes are below the breaking region (orange), except the wave episode 451, which exceeds the maximum breaking limit (blue).The wave episode 310 is probable to break as it is above the minimum breaking limit.Right: In the extreme sea state (H s ¼ 13m), all the wave episodes are above the maximum breaking limit (blue); therefore, breaking wave phenomena is expected to occur.
F I G U R E 3 Comparison of the three sea states considered (H s f5m,9m,13m) with the breaking wave criterion from Equation 9.

| Gaussian process regression
In this section, we describe the construction of a data-drive surrogate model used to estimate the load coefficients q i using the GPR technique.
For an introduction to Gaussian process theory, see Rasmussen and Williams. 35Briefly, the Gaussian process constructs a normal distribution for every choice of α given by where n is the aleatoric uncertainty, K, K * , and K * * are kernel matrices, and Q is the observed output data vector.The mean of this posterior distribution is the model's best prediction of the loads associated with a particular wave episode, and the variance is a combination of epistemic uncertainty (from limited data) and aleatoric uncertainty (from irreducible randomness, e.g., associated with the stochastic prelude).
For our application, we use a squared exponential kernel with automatic relevance determination (ARD), given by where M ¼ diagðl i Þ.The kernel hyperparameter σ k is a measure of variability due to choice of α, and the l i are a measure of the length scale associated with input dimension.The final hyperparameter, not associated with the kernel but part of the GPR scheme, is the aleatoric uncertainty σ 2 n (also referred as irreducible uncertainty), sometimes interpreted as a regularization parameter as in ridge regression.
In Equation (11), we give a representation of the output time series F x ðtÞ as weighted sum of basis vectors.We use this representation to build a separate surrogate for each output mode as a function of the wave episode vector α.That is, for each i ½1,n out , we build a distinct Gaussian process to estimate a posterior distribution This collection of independent scalar models allows us to model a low dimensional projection of the full time series F x ðtjαÞ.We can sample from the combined model by sampling each q i individually from the Gaussian process posteriors, and then combining the coefficients using Equation (11).While the variance for each q i is sampled independently, the means μ i ðαÞ are allowed to vary non-parametrically.This allows for the short-term distribution of each q i to have non-Gaussian (joint) statistics, despite each conditional distribution in Equation ( 16) having Gaussian form.To restate, the Gaussian distribution inherited from GPR represents the uncertainty quantification of the model, not the predicted shortterm statistics.

| Active sampling of training data
In order to build the surrogate models, we require a set of training data-representative wave episodes, and the associated force-on-structure time series calculated with OpenFOAM.The training data should be representative of both quiescent and extreme wave episodes, because the GPR model is more reliable in interpolation than extrapolation.We build our training set through two steps.
First, we uniformly sample wave episodes from the space of possible wave episodes.We perform this step by using Latin Hypercube sampling from a hyper box with side length D ¼ 2z * ffiffiffi ffi λ i p , where ffiffiffi ffi λ i p is the KL length scale associated with the i th component of α, and z * ¼ 3 is a control parameter that balances between extreme and quiescent waves.We choose Latin hypercube sampling in order to avoid the 'clumps and voids' that are associated with independent sampling.Figure 9 (right) also shows the spectral decay associated with those modes.We indeed see a major drop after the the first 3 directly excited modes-the higher order output modes primarily represent the effects of nonlinear physics.
Second, we make use of the initial uniform data points in order to optimally choose subsequent simulation designs for additional training data.
Active learning, or Bayesian optimal experimental design, is a method to use existing data, along with a surrogate model and an acquisition function, to design later experiments.The new training data can be added to refine the intermediate model in a loop. 43,66r this problem, we use Likelihood Weighted Uncertainty Sampling (LW-US), an acquisition function developed in the literature 42,43 to preferentially sample inputs likely to contribute to the output statistical tailsrare events.The functional form of the acquisition function is given by u LWÀUS ðαÞ ¼ p A ðαÞ p q ðqðαÞÞ σ 2 q ðαÞ ð 17Þ where p A ðαÞ is the probability of the encountering the wave episode in the steady state (from the KL construction), p q ðqðαÞÞ is the likelihood associated with qðαÞ from the surrogate, and σ 2 q ðαÞ is the posterior variance associated with the surrogate.In the standard formulation, the LW-US acquisition function is designed for scalar outputs.Following Guth and Sapsis, 54 we make two changes when borrowing u LWÀUS ðαÞ.First, because we represent time series using a set of multiple output modes, we associate with each mode q j a separate acquisition function u LWÀUS,j ðαÞ.During the active sampling loop, we iterate through these j output modes in a round robin format, selecting one experimental design for each.Second, because there is non-negligible intrinsic noise associated with this system, we perform uncertainty regularization.For this technique, we first estimate the intrinsic noise associated with the surrogate model.For a Gaussian process, this is simply the hyperparameter σ 2 n .Then, we replace the Gaussian process posterior variance σ 2 q ðαÞ with the epistemic variance σ 2 q ðαÞ À σ 2 n , that is, total variance reduced by the aleatoric variance.This regularization step improves acquisition function performance in weighted uncertainty sampling schemes when the true function has significant aleatoric variance.
Taken together, the final acquisition function we use for each output component, q j , is given by u LWÀUS,j ðαÞ ¼ p α ðαÞ p q j ðq j ðαÞÞ σ 2 q j ðαÞ À σ 2 n,j : ð18Þ

| Spectrum transfer
One of the advantages of wave-episode sampling is that the pairs of wave-episodes and hydrodynamic-responses can be used to quantify the statistical responses of different spectra, assuming that these have the same modal period and have similar shapes. 54For the present context, we have input-output pairs α i , q i associated with significant wave height H s,1 and we want to use this information for quantifying statistics for another significant wave height H s,2 .
We follow the spectrum transfer technique described in Guth and Sapsis. 54First, we calculate the KL basis for each spectrum, S 1 and S 2 .Second, we calculate the eigenvalue ratios Finally, we adjust the α from J ð1Þ to J ð2Þ by rescaling Through this energy-based rescaling of the KL coefficients we are able to use the same wave-episode data to build surrogate models corresponding to different sea states.We caution, however, that the initial data set, associated with waves having significant height H 1 has poor skill on describing forces associated with waves of H 2 , if the latter is significantly larger.This is not a surprise given that in the low-energy spectrum only small amplitude waves are needed to capture the overall statistics.OpenFOAM and further description of the numerical setup is available in Appendix A.

| Wave induced force
The wave induced force on the structure, F, is calculated by integrating the normal pressure, pn, where n is unit normal matrix, and the tangential viscous stress vector, τ, over the surface, A, of the structure.
In our application, the horizontal component of the force, which is known as the inline force, F x , has the major contribution; thus, it is chosen as the quantity of interest in this study.

| Numerical wave calibration
In OpenFOAM simulations, to produce the target irregular wave episode (e.g., analytical) at a specific location in the numerical wave tank (e.g., where the monopile foundation will be placed), we follow a wave calibration procedure. 67Figure 4 shows the schematic depiction of the wave calibration, which is similar to the one presented in Windt et al. 68 and previously has been applied by previous studies. 69,70e target wave episode and the numerical wave obtained from the last iteration are compared through their density wave spectrum.Figure 5 compares four representative wave episodes.For the waves obtained from the quiescent sea state (i.e., 404 and 403), the analytical and numerical spectra match very well-in terms of frequency and amplitude.Conversely, the two spectrum corresponding to extreme waves (i.e., 503 and 504) present significant divergence.In general, due to complex nonlinear hydrodynamic phenomena related to high and steep waves, the numerical wave episodes 501-514 cannot match the target wave profile.In high KC numbers (see Section A.1), the real wave propagation breaks down the theoretical approach.In practice, the nonlinear interaction between the wave components leads to breaking wave phenomena, which may contribute to changing wave profile, and thus difficulty to match with the analytical solution anymore.
In Figure 2, we compare the analytical wave episodes to the two estimates of the breaking limitdefined from the breaking wave criterion (Section 2.3).We observe that, for the quiescent sea state (H s ¼ 5 m), all but one of the waves are firmly below the breaking limit.However, for F I G U R E 4 An overview of the numerical wave calibration procedure.(1) Definition of the target irregular wave, n t ðtÞ, (2) which is analyzed into sinusoidal wave components via DFT.For each frequency, f j , the corresponding amplitude, a t ðf j Þ, and phase, ϕ t ðf j Þ, are obtained.3a) To numerically reproduce the target wave episode, the OpenFOAM numerical wave maker should receive the amplitude and phase for each frequency component.In the first iteration, the components of the target wave are the inputs to the numerical wave maker; however, these values are corrected at every iteration, that is, a w,i ðf j Þ ! a w,iþ1 ðf j Þ and ϕ w,i ðf j Þ !ϕ w,iþ1 ðf j Þ. 3b) The OpenFOAM simulation runs and 4) the numerical wave episode, n n ðtÞ, is recorded at a selected location in the numerical wave tank.5) Similarly to target wave, the numerical wave is analyzed into sinusoidal wave components via DFT.For each frequency, f j , the numerical amplitude, a n,i ðf j Þ, and phase, ϕ n,i ðf j Þ, are obtained and compared to the corresponding values from the target wave.6) An amplitude correction factor, which is the ratio of the target to numerical wave amplitude, is applied to the new wave maker inputs.7) A phase correction factor, which is the difference between target and numerical phase, is also applied to the new wave maker input.Steps (3)-( 7) are repeated either until numerical and target waves are converged or further improvement is not succeeded.
the extreme sea state (H s ¼ 13 m), nearly all of the waves are above the breaking limit.In Figure 6 and Figure 7, we provide an OpenFOAM visualization of the breaking wave evolution and its interaction with the monopile foundation, respectively.

| Ground truth solution
To evaluate the accuracy of the GPR surrogate model, we compare the model prediction with a reference solution-that is, the ground truth.For the purpose of this study, the ground truth is the statistics (i.e., pdf) of the hydrodynamic force-on-structure during the steady state realization (i.e., the full sea state realization).We calculate the ground truth force statistics for three sea states (H s f5m,9m,13mg) by simulating the wavestructure interaction for a 30 min period (H s ¼ 5m), a 75 min period (H s ¼ 9m), or 60 min period (H s ¼ 13m).
In CFD simulations, because of the expensive computational cost as explained in Section 5.2, the wave-structure interaction is usually modeled for a short time period (a few minutes).In the case of steady state realizations, in order to accelerate the numerical simulation process and avoid possible numerical instabilities due to running such a long simulation, we split the 30 min and 60 min simulations into 12 and F I G U R E 5 Numerical (CFD) and analytical wave spectral density over frequencies for the sea states with significant wave height, H s = 5 m (wave episodes 402 and 403), and H s = 13 m (wave episodes 503 and 504).
F I G U R E 6 Wave episode 508: Schematic depiction of the wave breaking process in the empty numerical wave tank (without the presence of the monopile foundation).
the ramp time that the wave train needs to reach the desired profile (transient), and thus, 160 s of wave-structure interaction are retained from each interval, as explained in Section 2.2.The intervals are then merged to reconstruct a representative long time-record for the steady state of the particular sea state.Our application studies the interaction of the irregular wave episode with the monopile foundation of an offshore wind turbine.For this application, the dynamic memory effects are not so critical as the monopile is a fixed structure.That is to say, the reconstruction of the full sea state by merging short-time intervals does not affect significantly the solution.Instead, for a floating structure whose current position is dependent on the previous dynamics, more care must be taken when reconstructing long time statistics from shorter interval simulations.
In order to reproduce the short wave-structure interaction intervals, we use the random phase model (discussed in Section 2, in Equations 2 and 3) to generate random waves from the examined sea state.The sea state is characterized by the significant wave height H s , period T p , and is described by the JONSWAP wave spectrum.For this procedure, we discretize the wave spectrum into N ¼ 90 wave components with angular frequency equally spaced in the range ½ω min , ω max ¼ ½0:44, 2:26 rad/s.

| Surrogate model of the monopile foundation
We model the distortion introduced between the target (analytical) wave episode and the numerical wave episode as a nonlinear mapping.The numerical solver along with the boundary conditions and absorption, implicitly define the map, f, between the target and numerical wave episodes (this is the wave calibration from Section 4.0.2).In Figure 5, we display the wave spectral density comparison of the analytical wave (in black) and OpenFOAM realization (in red).The numerical solver also defines the mapping g from the numerical wave episode to the numerical force on the structure.
When we use a machine learning technique to construct a surrogate model from the target wave episode (parametrized by the coefficient vector α), we are learning the composed function h ¼ g ∘ f.This is not the same as g, which maps the numerical waves to numerical forces, unless the OpenFOAM realization perfectly matches the target wave (i.e., fðÁÞ is very nearly the identity).In Figure 8, we show a schematic depiction the relationship between the target and OpenFOAM wave realization, the resulting force on the structure as estimated from OpenFOAM, and what the surrogate learns to map.
This shift is a problem for training a model that converges pointwise.That is to say, if we want the precise structural loads corresponding to the target wave episode (that we parametrized with α ! ), the OpenFOAM solution for the force-on-structure does not correspond to the desired wave.However, if instead our goal is to train a model that merely calculates the correct short-term statistics, including nonlinearities from real physics, the developed surrogate model successfully achieves this goal.
How can we recover the correct statistics without a precise model?If, in Figure 8 the function fðÁÞ does not change the short-term statistics, then the machine learning model may still learn the short-term statistics of the forces on structure.One straightforward case where this could be so is if fðÁÞ were simply the forward time evolution operator.In this ideal case, our surrogate model for hðÁÞ ¼ g ∘ fðÁÞ might still learn the correct statistics of forces, even if it incorrectly predicts the force associated with any particular wave episode.
Here the deviation of the OpenFOAM realization from the target wave episode is likely caused by nonlinear effects.Our target wave episode model assumes linear wave theory, which begins to break down in the extreme, steep waves (due to wave breaking, overtopping and other higher order hydrodynamic effects) that lead to extreme forces on structure (slamming effects-especially for the sea state 13 m).This form of realization error is statistically important, because peak sharpening and trough broadening are likely to have some impact on the distribution of extreme loads.However, this effect is likely to counteract an earlier modeling error of (incorrectly) assuming that the extreme steep waves we examine are well modeled by linear wave theory.Taken as a whole, we cannot say that the divergence between target and numerical wave episode does not impact the expected results.However, we expect this effect to be less significant for statistical reconstruction relative to the recovery of specific force time series.

| Irregular wave episodes dimension reduction
In this study, we choose sea states described by the JONSWAP wave spectrum and parameters H s f5m,9m,13mg and T p ¼ 8 s.For the KL dimension reduction of the irregular wave episode, we choose T ¼ 32 s and n ¼ 3 distinct non-zero wave episode coefficients.Figure 9 (Left) displays the mode shapes for the KL basis, while Figure 9 (right) displays the decay of eigenspectrum with the increasing number of KL modes, which practically confirms that higher KL modes carry negligible amount of energy compared the first KL modes.We note that in this study we choose n ¼ 3; therefore, we only use the first three KL modes shown in the figure .F I G U R E 8 Flow chart showing the procedure to estimate the hydrodynamic force followed by (top) OpenFOAM simulation and (bottom) GPR surrogate model.In OpenFOAM simulations, the numerical wave may differ from the analytical wave due to the strong nonlinear hydrodynamic phenomena.The forces are estimated corresponding to the numerical wave.The surrogate model provides the map between the analytic wave episode and numerical force.
F I G U R E 9 Left: KL mode shapes for the wave episodes, displayed in pairs due to approximate even/odd symmetries.Note that due to our normalization convention, the y-axis is arbitrary.Right: KL spectrum decay.X-axis counts the number of KL coefficient.The plots account for the quiescent sea state with wave height H s ¼ 5 m.For extreme sea state H s ¼ 13 m, the plots are identical except for scale.
We construct the irregular wave episodes by drawing Latin hypercube samples from a hyperbox with size parameter z * ¼ 3 from which we choose the KL coefficients {α 1 , α 2 , α 3 }.The resulting irregular wave episode is decomposed via DFT into N ¼ 90 distinct sinusoids, which are the inputs to the OpenFOAM wave maker.The numerical wave is simulated for 52 s, that is, the T ¼ 32 s wave episode, along with a T pre ¼ 20 s ramp-up period to reduce the transients.
Finally, the GPR surrogate model is trained on irregular wave episodes of 32 s duration, which are obtained from a mild and an extreme sea state.In particular, from the wave spectrum of the sea state with H s ¼ 5 m, we initially generate 16 training irregular wave realizations, selected via Latin hypercube sampling, which are followed by four additional training irregular wave episodes using active sampling.Subsequently, we use the spectrum transfer technique described in Section 3.3 to adjust our existing training wave data to the spectrum of the sea spectrum with H s ¼ 13 m.From the extreme sea state, we collect an additional nine irregular wave realizations using active sampling.We additionally simulated assorted validation wave episodes.

| Computational resources
In this study, the CFD simulations are performed on the Tetralith HPC cluster utilizing 128 processors per simulation.Each irregular wave episode, used for training the surrogate model, is simulated to interact with the monopile foundation for 52 s.This simulation requires approximately 12-18 h real time.The steady state (or full sea state) simulations for sea states with H s = 5 m and H s = 13 m model the wave-structure interaction for 30 min and 60 min, respectively.As mentioned in Section 4.0.3, this time is split into 12 and 24 intervals, which has the advantage that the shorter interval simulations can run in parallel, and thus, the modeling of the full sea state is accelerated.Each interval models 180 s of wavestructure interaction and requires 40-45 h of real time on the HPC cluster.

| Hydrodynamic force dimension reduction
Figure 10 shows contour slices for the KL-GPR surrogate for different combinations of PCA modes.The contours show the regions where the peak hydrodynamic force presents maxima and minima values.This is helpful for engineers in the design stage to extract information about how the waves (which are described by the KL coefficient vector α) impact the maximum hydrodynamic force.In other words, Figure 10 answers for which combination of α i the force is maximized.F I G U R E 1 0 Maximum (unsigned) predicted force, F x , of KL-GPR surrogate in ½N, for the extreme sea state with H s ¼ 13 m.The colormap shows the change in force for varying input waves, which are described by vector α.Left:

| KL-GPR model predictions for the force time series and statistics
We note a few features here.First, the surrogate model presents higher uncertainty at the beginning of the interval.This is probably caused both by the encounter condition, which is stochastic, and by transients from initializing the numerical simulation.Second, we note that agreement is weakest near the extreme peaks, particularly in wave 404.This is likely because the nonlinear wave phenomena are most significant in this wave.Initially, the KL-GPR model is trained on wave episodes from the quiescent sea state and their characteristics are listed in Table A2 Appendix  A3 Appendix A. This led to our decision to additionally train on extreme waves associated with the spectrum H s ¼ 13 m.
Ultimately, our goal is to recover the short-term statistics (statistics of the steady state) of the forces on the structure.The first force we examine is the signed hydrodynamic axial pressure force F P x , the dominating component of the total axial force.In Figure 12, we present the pdf of F P x for the three sea states we have considered: H s f5m,9m,13mg.For each plot, we display the direct Monte Carlo statistics of the steady state (in black), a linear estimate derived from potential flow and the Wiener-Khintchine theorem (in blue), [71][72][73] and our reconstructed statistics from the KL-GPR model (in red).The KL-GPR statistics are generated by Monte Carlo sampling a large number (1 Â 10 5 ) of wave episodes, and calculating the corresponding forces from the GPR model.We estimate the pdf by concatenating each of these generated force time series, and then computing the time statistics via histogram.Alternately, we might have recovered the pdf using a kernel density estimator in place of simple histograms, but plots generated by histogram emphasize the statistical limit to our precision.We note here that both the black line (sampling directly from CFD simulations of the steady state) and the red line (sampling from the KL-GPR model) employ the Monte Carlo sampling technique for accurate unbiased statistics.The major difference is that for the red line, our method, samples are evaluated from the surrogate model instead of a full new CFD simulation, which is many orders of magnitude faster.
It is clear that the reconstructed short-term statistical distribution does a better job of capturing both the heavy tail loads, and the asymmetry between positive and negative axial loads in heavier seas.At the same time, though the model outperforms linear estimate, its predictions slightly underestimate the heavy right tail, especially in the H s ¼ 9m case.This underestimate is likely from two sources, both ultimately stemming from limited data.First, the n ¼ 3 truncation order for the KL wave episodes truncates some of the high frequency wave energy, in the interest of making the statistical problem tractable.This truncation slightly reduces wave steepness and therefore may underpredict the the hydrodynamic pressure force.We consider this issue more fully below in Section 5.5.Second, our training simulations were actively chosen with H ¼ 5m and H s ¼ 13m in mind specifically.While the simulations themselves are reusable due to the spectrum transfer technique, it may have created a 'gap' corresponding to waves that contribute most significantly to the extreme loads at H s ¼ 9m, but not in more extreme seas.
Finally, in Figure 13, we present the recovered viscous forces F ν x and F ν z , along with the steady state simulations, for each of H s ¼ 9m and H s ¼ 13m (the viscous forces are essentially zero in the H s ¼ 5m case).Because viscous forces are essentially nonlinear, we do not present linear estimates.We note that our recovery of the viscous especially the component, are less accurate.This is in part due to our decision to prioritize the axial hydrodynamic force, F x , in our active search acquisition function for samples selection, which in turn is dominated by the contribution from the pressure force, F P x .Furthermore, the KL-GPR model fails to capture the essential asymmetry of the viscous forces.This is likely due to interaction between the typical wave episode, which has negligible viscous forces, and the energy preserving reduced order model, which pays insufficient attention to the rare, but extreme, viscous loads.

| Effect of energy truncation
In this study, we construct the wave episodes choosing a truncation order at n ¼ 3. The energy decay associated with the number of KL coefficients are shown in Figure 9, and it is clear that a fraction of the energy is truncated by choosing only the first three coefficients.At this point we examine whether this energy truncation effect the recovered force statistics.
We study the effects of energy truncation in the hydrodynamic force predicted from the KL-GPR surrogate model.We chose wave episode 507 to estimate the CFD axial and vertical force components, shown in Figure 14, for the basis of our comparison.This wave episode is constructed as previously described, that is, with n ¼ 3 nonzero components of the KL vector α, and a stochastic prelude.Importantly, our stochastic prelude technique means that any adjustment to wave episode 507 (other than global rescaling) will unavoidably add some aleatoric variance to a second OpenFOAM simulation, especially at the very beginning of the region of interest.
We consider two methods to correct for the effect of energy truncation.One option is to scale the coefficient vector α by an energy factor.This would have the effect of shifting energy from high wave numbers to lower wave numbers.At the same time, by retaining the low dimensionality of α, this method maintains both the shape of the wave episode, and its easy applicability to machine learning.We constructed wave episode 701 by scaling the first three components of α from wave episode 507.The corresponding CFD force to wave 701 is also shown in Figure 14.It is clear that each peak force on structure is significantly increased relative to wave episode 507.This increase is too large to be consistent with the steady state, and demonstrates that the distribution of wave energy across different scales is important on the force applied to the marine structure.
The second option is to simply avoid truncation of the higher order modes from the wave episode.That is to say, instead of a low order truncation n ¼ 3, choose a very high order truncation (here, n ¼ 25) to ensure that most of the wave energy is captured.This method has the advantage of modeling the waves with the highest fidelity.However, it has the disadvantage of increasing the dimensionality of the wave episode space, which makes the application of machine learning challenging.In Figure 14, we construct wave episode 702 by copying the first three components from α from wave episode 507, and then drawing the remaining components ðk 4, :::25) from the KL distribution.We can see that the first and fourth peak of the force on structure are significantly different between wave episodes 507 and 702, but the second and third peak are very similar.We explain this by noting that the first peak, at t ¼ 20 s, is at the border of the stochastic prelude region and the wave-episode region.The divergence of this peak is well explained by the random draw of stochastic prelude.The significant difference for the fourth peak cannot be explained away so easily.By comparing with wave 702, and examining the viscous force, it is likely that wave 507 has no significant wave crashing here.However, a small addition of energy, either in lower wave numbers or higher, was enough to cause a wave crash at the fourth peak.
In summary, correcting for the energy deficit by moving energy from high wave numbers to low wave numbers significantly and nonphysically overestimates the peak forces on the structure.However, correcting for the energy deficit by including the truncated KL modes increases the peak forces by a much less dramatic margin.This suggests that the KL truncation is likely responsible for some of the tail underestimate in Figure 13.We focus our optimal experimental design on the force component F x , and in the quiescent sea states (with H s ¼ 5 m) the GPR model is able to accurately recover the load statistics with only a handful of wave episodes in the training set.In the extreme sea state (with H s ¼ 13 m), we need more data points to recover the heavier distribution right tail.When we examine other force components with more strongly Comparison of truncated wave episode (507) with energy-corrected wave episodes (701 and 702).Wave episode 507 (black) is taken from our training set.Wave 701 (red) used the scaling coefficients correction, and wave 702 (blue) use the un-truncated coefficients correction.Left: Pressure component non-Gaussian character, F ν x and F ν z , (which are not considered during the active sampling phase) the KL-GPR model is less accurate.Finally, by studying how the energy truncation of the wave episodes effects our results, we determine that the truncated energy itself likely is not the cause of any remaining statistical error, nonlinear effects from the truncated shape complexity may play a subtle role in the distribution of extreme loads.

| CONCLUSIONS
We have presented a efficient computational framework for the statistical quantification of the force on a wind turbine monopile foundation caused by nonlinear interactions with irregular waves.The model is using GPR, a data-driven machine learning technique combined with active learning and order reduction methods.To ensure that the developed model provides accurate predictions, we train with data from CFD simulations.CFD models are able to capture the nonlinear and complex phenomena in the wave-structure interaction, yet they command expensive computational costs and are thus unsuitable when massive simulations are required (e.g., uncertainty quantification, sensitivity analysis, statistical calculations for risk, etc).
As a adjunct to the active learned GPR model, our approach relies on two dimensionality reduction steps: to describe irregular wave episodes and to describe the force-on-structure time series.Specifically, we use the KL theorem to construct low dimensional wave episodes that nonetheless adequately represent the sea state associated with a specific power spectrum (i.e., JONSWAP).At the same time, we use KL to reduce the dimensionality of the force-on-structure time history.Together, these dimensionality reduction steps allow the construction of the efficient surrogate modeling with the GPR method, which otherwise has difficulty with the high dimensionality associated with directly modeling time series.
The wave episode samples are the inputs for the high-fidelity CFD simulations-the OpenFOAM code-which are able to capture nonlinear wave phenomena, such as breaking wave and slamming loads.While important for accurately resolving the interactions between extreme waves and offshore structures, these phenomena also raise the difficulty of choosing experimental designs due to a matching problem: the numerical irregular wave may not match the target analytic wave profile.We explicitly address this matching problem with a wave calibration procedure, but this mismatch is also implicitly accounted for via machine learning.
Once we obtain a first set of simulation data, we train a surrogate model using the GPR machine learning technique to learn the relationship from the wave episode to the structural load.Particularly, each surrogate model maps the KL coefficients (α) that describe the wave episode to a single output KL mode (q i ).Thus, we construct as many GPR models as the number of output KL modes.Finally, we model the force time series by combining the output KL coefficients from each trained GPR model.
To evaluate the generated force statistics as estimated from the surrogate model, we compare with longer time OpenFOAM simulations that give an accurate description of the short-term statistics.The prediction and the ground truth of the axial hydrodynamic force show a good agreement both in quiescent and extreme sea states.However, the viscous part of the force is not very well predicted-especially in the right tail.We attribute this to the fact that the active sampling method chooses wave episodes that improve the GPR model accuracy in predicting the pressure part of the total hydrodynamic force.
In conclusion, this study's framework shows great promise in addressing design challenges for offshore wind turbine foundations or other offshore applications.By utilizing machine learning, it overcomes limitations of classical modeling, offering innovative ways to analyze complex structures.The introduced surrogate modeling method enables efficient exploration of more design scenarios, including extreme events, while estimating probability density functions effectively.This can contribute to the significant cost reduction and faster design processes, benefiting industry professionals.This framework's adaptability extends to a range of offshore wind turbine foundations, including jackets, floating structures, and other offshore systems like wave energy converters.All that's needed is a reliable model or experimental setup to produce the required training data.In particular, while high-fidelity models, such as CFD simulations, are essential for capturing nonlinear wave-structure interactions under extreme conditions, the framework remains adaptable to data from experimental campaigns, even to lower-fidelity model when extremes are not a primary concern.
In consideration of higher-dimensional systems, it is acknowledged that the use of GPR may encounter challenges due to the significant increase in data requirements as the number of input features or dimensions grows.The computational expense associated with the inverse of the covariance matrix in GPR becomes prohibitive in such cases.To address this, we propose the utilization of deep neural networks or other machine learning techniques adept at handling high-dimensional data.

PEER REVIEW
The peer review history for this article is available at https://www.webofscience.com/api/gateway/wos/peer-review/10.1002/we.2880.

A.2.1 | Governing equations
The Navier-Stokes equations are used to determine the motion of a fluid and can be seen as Newton's second law for fluid motion.For an incompressible Newtonian fluid flow, the Navier-Stokes equations take the form where u is the fluid velocity, p is the fluid pressure, ρ is the fluid density, ν is the fluid dynamic viscosity, and f b includes the external forces.The i and j denote the indices in x and y direction respectively.The left-hand side of this equation corresponds to the inertial forces while in the righthand side the first term corresponds to pressure forces, the second term to viscous forces and the third term to the external forces applies in the fluid.These equations are always solved together with the continuity equation The CFD code the numerical model solves the Reynolds-averaged Navier-Stokes (RANS) equations, which are a reduced form of the general Navier-Stokes equations.The Reynolds decomposition is applied, according to which the variables u and p of Navier-Stokes equations are written as the sum of the time-averaged and fluctuating part The RANS equations and the mass conservation take the following form: where τ ij is the Reynolds stress tensor, τ ij ¼ Àu 0 i u 0 j .To achieve a close form the RANS equations, the k À ω SST turbulence model is adopted.More details about the computational domain and the boundary conditions are provided in the Appendix A (Section A.2.1).

A.2.2 | Free surface modeling
The free water surface is captured using the volume of fluid method (VOF). 75The two-phase fluid problem is treated as a single fluid and the phase fraction, α, is used to indicate the mixture between air (α = 0) and water (α = 1) at each cell.The conservation of the phase fraction, α, is essential and the transport equation should be added to describe the motion of the phases: where the last term is an artificial compression to keep the surface sharp, u c is the fluid velocity normal to the interface, where c α is the compression coefficient.At each cell, the fluid properties (density, ρ, and dynamic viscosity, μ) are computed as

| Solver and algorithm
In this study, OpenFOAM version 1906, 76 is utilized to model the wave-structure interaction.
OpenFOAM is an open source CFD toolbox able to solve complex fluid applications based on the cell-centered finite volume method to solve the RANS of the two phase fluid flow, that is, air and water.In this approach, the equations are integrated over each of the control volumes (i.e., each cell of the computational mesh).The volume integrals are converted to surface integrals using Gauss's theorem.The surface integrals are calculated as the weighted sum of the cell faces.The pressure-velocity coupling in RANS equations is solved via the PIMPLE algorithm, while the interFoam solver is operated for capturing the two incompressible, isothermal fluids using volume of fluid method for interface capturing.

A.3 | Numerical wave tank
A 3D numerical wave tank (NWT) is setup in order to reproduce the wave-structure interaction modeling.The total length of the NWT was defined by the wavelength and it is equal to 300 m (approximately 3λ), the width is 42 m corresponding to 6D, where D is the diameter of the monopile.The water depth is 33 m.The NWT utilizes symmetric side planes in order to reduce the computational domain and thereby the computational cost.A monopile with diameter of 7 m was fixed at 100 m from the inlet boundary (approximately a wavelength).Figure A1 shows the computational domain with the boundary labeling.
The grid resolution is discretized based on the mesh sensitivity study presented in the following section.The quality of the wave propagation heavily depends on the aspect ratio of the cells close to the free water surface.The grid cells keep aspect ratio close to 1, which means the computational mesh consists of cubic cells.In a distance of one wavelength downstream of the monopile, the mesh grading technique is applied resulting in cells with larger aspect ratio.This is a technique for reducing the computational cost of each simulation.
For turbulence modeling, the k À ω SST model was implemented with wall functions for the resolution of the boundary layer around cylinder.
The boundary conditions are presented in the Table A1.The boundary conditions for the seabed and the side surfaces are type zeroGradient and symmetryPlane, respectively.

A.4 | Numerical wave generation and absorption
The CFD modeling of waves requires a special set of boundary conditions.The wave is generated at the inlet which provides the appropriate time-dependent velocity field and surface elevation the outlet boundary offers the absorption capability.In OpenFOAM, several custom boundary conditions have been built for wave generation and absorption including a wide range of wave theories.In this study, the IHFOAM library is utilized for the wave generation and absorption. 77The irregular wave is composed of the superposition of regular wave components; that is, the first-order irregular wave episode is generated by the summation of the linear regular waves components.For the wave generation, the static Boundaries of the computational domain on an example of mesh used for CFD simulations.boundary method is implemented at the inlet boundary in which the expressions for the free surface elevation (Equation A11) and wave velocities (Equation A12) are applied as Dirichlet boundary conditions based on the linear wave theory.The prescribed velocities determine the pressure and the free surface elevation.
where N is the number of wave components, H is the wave height, k x and k y are the components of the wave number vector k, ω is the angular frequency and ϕ is the wave phase.The orbital velocity components in horizontal and vertical directions of the wave propagation are given by where h is the water depth, z is the water elevation from the still water surface and θ is the total phase of each wave component.
The absorption of the generated waves is a very important feature in the numerical modeling.The restrictive dimensions of the wave tank might causes unwanted reflections of the wave components which disturb the study zone of the domain.Therefore, the wave absorption should be handled properly in order to do not distort the results.In this study, the active wave absorption method, available on the IHFOAM library, was implemented.Based on Shallow Water Equation, the outlet boundary generates a velocity profile equal and opposite to the incident wave velocity that cancels the incoming wave.The active wave absorption presents good performance at long spurious waves and it was found that reduces the computational cost compared to other methods (e.g., relaxation zones).The methodology is further described in Schäffer and Klopman. 78 OpenFOAM, for the construction of the irregular wave episode, the user chooses the irregularMultiDirectional wave model and provides the wave height, wave period and phase of each wave component.Further description about the wave generation is provided in the following subsections.

A.5 | Grid convergence study
The mesh sensitivity study is conducted without the presence of the monopile foundation.We want to evaluate the wave generation without being affected by the reflection effects from the foundation.The wave elevation is measured at the location where the monopile will be placed A2) discretization of the NWT close the monopile.
T A B L E A 1 The boundary conditions.later.In the mesh sensitivity study, the spectra wave density is evaluated.The numerical tests are performed for a two-dimensional numerical wave tank.In this study, we consider two sea states; the milder sea state with Hs = 5 m and the extreme sea state with Hs = 13 m.The mesh resolution is different for the waves episodes of each sea state; that is, it will be computational costly to use the same fine resolution for the milder sea state, since the wave episodes are less extremes.Therefore, separate mesh study is carried out for each sea state.To evaluate the mesh resolution, steep irregular wave episodes are selected.The resolution is defined as cells per wave height (CPH); that is, in our case, the wave episode is irregular; therefore, the resolution is defined based on the mean wave height of the irregular realization.
In particular, for the milder sea state (Hs = 5 m), the resolution is evaluated based on the wave episodes 307 and 310.F I G U R E A 4 The l1 norm is considered as the error metric for the evaluation of the spectrum wave density convergence (Figure A3).Each resolution is compared with the finest resolution for each case.comparison of the mesh sensitivity study.The wave episodes 307 and 310 (milder sea state) present very good convergence for all the examined resolutions.The peak of the SWD is also captured accurately.For the waves 502 and 510 (extreme sea state), the SWD changes with the increasing mesh resolution, while the SWD better converges for 40 CPH and 45 CPH.
To quantitative evaluate the mesh convergence, we compute the l 1 norm on the SWD difference between each resolution with the finest mesh for each case on the interval ½f l , f u = [0.09,0.17].In particular, the l 1 error metric is defined as:  where the SWD is the spectral wave density shown in Figure A3, the subscript finest declares the mesh with the finest resolution for each examined wave, while the subscript i declares the each of the other examined resolutions.f l and f u are the lower and upper boundaries of the examined frequency range.Figure A4 presents the l1 metric for each wave episode.For the milder sea state, the resolution 20 CPH satisfy l 1 < 0.015; therefore, the 20 CPH is used as resolution for all the wave episodes of the milder sea state.For the extreme sea state, l 1 < 0.02 is observed for resolution 40 CPH; therefore, this is used for all the wave episodes.

A.6 | Wave characteristics
Some critical characteristics of the wave episodes are listed in the Table A2 High-fidelity CFD simulations are employed to provide the hydrodynamic performance of the monopile foundation interacting with the irregular wave episodes.Complex physical flow mechanisms, for example, fluid viscosity, wave diffraction, radiation, wave overtopping, and slamming can be captured by CFD modeling based on the solution of Navier-Stokes equation.The simulations performed with the open-source software

F I G U R E 7
Isometric views of the wave-structure interaction (wave episode 508).The colormap expresses the velocity magnitude.Upper Left: Initiation of breaking wave at t = 23 s, Upper Right: Breaking wave hits the monopile foundation at t = 24.5 s, Lower Left: Wave run-up on the foundation at t = 25.5 s, Lower Right: Second-order effects around the monopile at t = 28 s.

Figure 11
Figure 11 shows the KL-GPR-model predicted time series of the axial hydrodynamic force, F x ðtÞ, for wave episodes 401-404, which are used for the validation of the KL-GPR-based surrogate model (i.e., these wave episodes are not used for the construction of the surrogate model).The wave episodes 401-404 represent "large waves" associated with the H s ¼ 5 m sea state.
A The training waves are characterized by smaller wave height, wave steepness and KC number compared to the corresponding values of the validation wave 402 and 404.The validation waves 401 and 403 are closer to the training waves.The characteristics of validation waves are listed in Table F x ðtÞ from (red) OpenFoam and (blue) KL-GPR-model with posterior uncertainty, for quiescent sea state with H s ¼ 5 m.Y-axis counts 1 Â 10 5 [N].These waves are only used for validation purposesnot used during the model training.F I G U R E 1 2 Comparison of the hydrodynamic pressure force F P x pdf predicted from KL-GPR-model, for (Top Left) H s ¼ 5m, (Top Right) H s ¼ 9m, and (Bottom) H s ¼ 13m.Monte Carlo direct CFD statistics (black) with 1σ bootstrap error bars as well as linear potential flow statistics (blue) included for comparison.F I G U R E 1 3 Comparison of KL-GPR-modeled pdfs of viscous forces.Top: H s ¼ 9m.Bottom: H s ¼ 13m.Left: axial viscous force F ν x .Right: vertical viscous force F ν z .Monte Carlo direct CFD estimates with 1σ bootstrap error bars included.
Four different resolutions are examined, that is, 14 CPH, 17 CPH, 20 CPH, and 28 CPH.For the extreme sea state (Hs = 13 m), six resolutions are examined since the wave episodes are strongly nonlinear, that is, 20 CPH, 25 CPH, 30 CPH, 35 CPH, 40 CPH, and 45 CPH.Figure A3 presents the qualitativeF I G U R E A 3For the mesh sensitivity study, the convergence of the spectra wave density is examined.For the sea state with Hs = 5 m, wave episodes 307 and 310 are evaluated.For the sea state with hs = 13 m, wave episodes 502 and 508 are evaluated.The resolution is defined as cells per wave height (CPH).For the sea state with Hs = 13 m, more resolutions examined due to the more complex nonlinear waves.
jSWD i ðfÞ À SWD finest ðfÞjdfðA13ÞF I G U R E A5 Visual depiction of the wave episodes steepness.Mild sea state (left): The initial training wave episodes are randomly selected (blue) and they are followed by wave episodes which are actively sampled (red).The testing waves are also depicted (green).Extreme sea state (right): The training wave episodes are randomly sampled (Figure A5).T A B L E A 2 Mild sea state (H s = 5 m, T = 8 s): Characteristics of the wave episodes used for training purposes.The waves 301-316 are randomly selected and the waves 451-454 are actively sampled.
Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/we.2880by Massachusetts Institute Of Technology, Wiley Online Library on [21/12/2023].See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions)on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License 98GUTH ET AL. 10991824, 2024, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/we.2880by Massachusetts Institute Of Technology, Wiley Online Library on [21/12/2023].See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions)on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License and Table A3 (mild sea state), and Table A4 (extreme sea states) T A B L E A 3 Mild sea state (H s = 5 m, T = 8 s): Characteristics of the wave episodes used for testing purposes.Extreme sea state (H s = 13 m, T = 8 s): Characteristics of the wave episodes used for training purposes.All the waves are actively selected., 2024, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/we.2880by Massachusetts Institute Of Technology, Wiley Online Library on [21/12/2023].See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions)on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License 10991824