Predicting near optimal interpolation points for parametric model order reduction using regression models

The design of vibro‐acoustic systems, such as vehicle interiors, regarding optimal vibrational or sound radiation properties requires the solution of many numerical models under varying parameters, as often material or geometric uncertainties have to be considered. Vibro‐acoustic systems are typically large and numerically expensive to solve, so it is desirable to use an efficient parametrized surrogate model for optimization tasks. High quality reduced models of non‐parametric systems can be computed by projection, given a set of optimal expansion points. However, finding a set of optimal expansion points can be computationally expensive and each set is valid for a specific parameter realization only. The method presented in this contribution learns the map between a model's parameter realizations and the corresponding sets of expansion points using data‐driven methods. Queried with an unknown set of parameters, the learned model returns a set of expansion points which are used to compute the corresponding reduced model efficiently. Numerical experiments on two vibro‐acoustic models of different complexity are performed and three data driven regression methods are evaluated: multivariate polynomial regression, k‐nearest neighbors, and support vector regression. Especially k‐nearest neighbors regression yields accurate results for different types of physical models while being computationally inexpensive to fit.


Introduction
Large parameter dependent models are encountered in many engineering applications. Optimizing the noise and vibration of vehicles towards a high level of comfort for the passengers, for example, requires many repeated evaluations of high fidelity models under varying parameters. Such parameters can include material properties, locations of excitation, or variations in the model geometry. Because of the high computational costs of the high fidelity models, low dimensional representations retaining the dominant features of the original model are required. Model order reduction (MOR) methods provide established tools to find such representations by projecting the original system onto a lower dimensional subspace containing the features required for the desired solution [1]. Moment matching methods aim at approximating the moments of the original model's transfer function in reduced space and interpolate between them. A valid projection subspace for such methods contains snapshots of the original transfer function at some frequency values. Their locations are not known a-priori but can be found iteratively [2,3]. This optimization step involves a considerable computational cost, as linear equation systems of the original system order have to be solved in each iteration step. So it is desirable to have a fast convergence of (near) optimal locations of the snapshots.
Parametric MOR methods retain the full model's dependence on some parameters in reduced space by, for example, combining the projection bases obtained for different parameter realizations of the full model [4]. Many of these methods require an affine representation of the parametrized system, which cannot always be computed efficiently. Other methods do not require insight in the structure of the dynamic system at all and compute parametric reduced models solely based on system output data [5][6][7]. However, these methods do not preserve the structure of the original system reduced space. The sampling of the parameter space has a large influence on the quality of the parametric reduced model. Beattie et al. [8] proposed a method to retain the dependence on parameters in reduced space without the need of sampling the parameter space. This method again requires an affine representation of the original system. Combining reduced parametric models with different approximation qualities enables the efficient evaluation of a parametric system, for example for parameter studies or optimization tasks [9]. A different approach to compute parametrized reduced models is to find valid snapshot locations for a moment matching method by interpolating the locations of models with similar parameters, for which the optimal locations have already been found. Benner et al. [10] use radial basis functions for the interpolation of the optimal locations, the reduced model is then directly computed with the locations obtained from interpolation, yielding a good approximation of the original system. Yue et al. [11] describe the transfer functions of reduced models depending on a parameter using the transfer functions' poles and corresponding residues. The poles and residues for an unknown parameter are retrieved by interpolating these values between the two models in the sample dataset with the nearest parameter realizations. The numerical poles are matched between these two models, so that they resemble the same physical pole. Similar to these approaches, we propose a procedure combining a non-parametric moment matching method with classic regression models as well as strategies based in machine learning: Firstly, we iteratively compute the optimal frequency locations for a moment matching method for different parameter realizations. Secondly, the locations are processed such that each snapshot location resembles a physical pole of the transfer function. Following, we train a data-driven model to learn the map between the model parameters and the optimal frequency shifts. Querying the trained model with a yet unknown combination of parameters now yields a good approximation for the optimal locations, which are used to project the system onto the lower dimensional subspace.
By characterizing the resulting system by its dominant poles, rather than the actual system output, the result space of the learned model stays low-dimensional while containing the most important features of the physical model. Additionally, the model's physical behavior is included in the learning framework and the resulting reduced model preserves the original system structure, which is favorable for many application cases [12][13][14]. Learning strategies not solely relying on data are often termed physics informed machine learning [15]. In an engineering context, machine learning methods have been combined with model order reduction for example to find coefficients for proper orthogonal decomposition (POD) in [16]; a neural network was trained in [17] to predict the vibrational behavior of resonant substructures used to control structural vibrations; [18] reconstructed constitutive laws for finite element simulations from data.
Building up on this literature, we propose to combine moment matching methods and data driven methods to a physics informed strategy for parametric model order reduction of vibro-acoustic systems. The remainder of the paper is outlined as follows: In Sec. 2 model order reduction by moment matching is revisited. The strategy to interpolate optimal snapshot locations is presented in Sec. 3. Numerical experiments in Sec. 4 show the applicability of the proposed strategy and comments on further work in Sec. 5 conclude the article.

Model order reduction by moment matching
Vibro-acoustic systems model the interaction between vibrating structures and a surrounding acoustic fluid. Such systems with a single input and a single output depending on a set of parameters p ∈ R l are given in the frequency domain by the Laplace transform of a second-order differential equation with the mass, damping, and stiffness matrices M (p) , C (p) , K (p) ∈ R n×n , input and output vectors f (p) , g (p) ∈ R n , system state x (s, p) ∈ R n , harmonic load u (s), and system output y (s, p). The system is excited by the complex driving frequency s ∈ C. Its transfer function reads Because n can be very large, it is desirable to have an approximation H r (s, p) of the original system's transfer function h (s, p) with order r n. The structure of the original system is retained in reduced space, such that the reduced system is defined by where M r (p) , C r (p) , K r (p) ∈ R r×r , f r (p) , g r (p) ∈ R r , and u r , y r denote scalar input and output. The reduced model's transfer function is given by H r (s, p) = g r (p) H s 2 M r (p) + sC r (p) + K r (p) −1 f r (p). An established method for reducing non-parametrized systems of shape (1), in this setting for a given p = p j , is the Petrov-Galerkin projection of the system matrices with two projection matrices V, W ∈ R n×r . The matrices are chosen, such that in order to approximate the full model's result Here, subscript r denotes a reduced quantity of order r. The reduced system matrices are thus given by The projection matrices V, W can be computed based on expansion points σ = {s 1 , . . . , s r }, such that where Ran (·) is the range of a matrix [2]. If the system matrices in (1) are symmetric, setting W = V preserves the symmetry in the reduced space [19]. The quality of the reduced model greatly depends on the choice of the expansion points σ, which are not known a-priori. A valid choice are the mirror images of the full system's eigenvalues, reflected at the imaginary axis. However, a full eigenvalue decomposition is not always available or feasible in the scope of an efficient reduction algorithm. The iterative rational Krylov algorithm (IRKA) and its extension to second-order systems SO-IRKA [19] is therefore used to find optimal interpolation (or expansion) points. Upon convergence, the reduced system's eigenvalues are the mirror images of the used interpolation points. In each iteration, IRKA needs to solve 2r linear systems of equations of order n (respectively r linear systems of equations, if the system is symmetric), making the method computationally rather expensive. So if a set of nearly optimal expansion points is known a-priori, the number of iterations until convergence can be reduced, while the quality of the resulting reduced model is not affected. Contrary to first order IRKA, the intermediate reduced second order systems in SO-IRKA have 2r eigenvalues λ 2r . If the mirror images of all eigenvalues are chosen as expansion points for the next iteration, the reduced model's size would double each iteration. Therefore, only a subset of eigenvalues can be chosen to update the expansion points. Choosing r of the mirror images of λ 2r closest to the imaginary axis, SO-IRKA yields a reduced model valid in a range starting from frequency zero. Nevertheless, it is also possible to choose any of the 2r eigenvalues, so a reduced model can be created to match the original system in a specified frequency range [20]. If the reduced model should approximate the full solution in a specific frequency range rather than having a fixed reduced order, more than r eigenvalues can be chosen during the iterations. If the mirror images of all eigenvalues inside the frequency range of interest are set as updated interpolation points, the reduced model can adaptively grow (or shrink) to a size required to match the full model's response in this specified frequency range.

Parametric model order reduction by interpolating optimal expansion points
Parametric model order reduction methods seek a representation Σ r (p) of the original system Σ (p) while retaining the dependence on some parameters. Such methods are typically divided into two phases: offline and online phase. During the offline phase, many resources for computations are available; during the online phase, time and computational resources are limited, so the parametric reduced model has to be computationally inexpensive to evaluate. The proposed method is outlined in the following: During the offline phase the map σ : P → S between the realizations of the parameter set p ∈ P and the corresponding optimal expansion points σ (p) ∈ S obtained from SO-IRKA is sought. Different data-driven regression methods are employed to obtain this map. The offline phase is sketched in Algorithm 3.1. Compute optimal expansion points σ i using SO-IRKA 5: Postprocess σ i (c.f. Sec. 3.2) 6: S (:, i) = σ i 7: end for 8: Use P and S to train a regression model to find map σ : p → σ (p) In the online phase, appropriate interpolation points for a new parameter set p new are retrieved from this map and used to compute the reduction basis without IRKA iterations. Algorithm 3.2 outlines the online phase. The individual steps of the algorithms are laid out in detail in the following sections.

Interpolation of optimal expansion points
In the proposed method, full order models with different parameter realizations p are used in the offline phase to fill a database of optimal expansion points σ (p), which is a function of p in the parametric context. The gathered data will be termed training www.gamm-proceedings.com data. We then use a data driven method to learn the map from the parameter input p to the interpolation points σ (p). In the following online phase, the learned model is queried with an unknown set of parameters and returns a new set of interpolation points, which are used to compute the projection matrices V, W according to (7). Projecting the full model onto this basis yields a new reduced model which is not a local optimum, but has a reasonable accuracy. Up to 2r linear systems of equations of size n have to be solved during the online phase for each reduced model, which can be of considerable computational cost. Benner et al. suggest using intermediate models of moderate size but not necessarily optimal quality to compute the actual reduced model [10]. Here, the projection matrices computed in the training phase are reused to compute projection matrices V, W, which project the full system onto a subspace of order m > r. This intermediate model is then used to compute the projection matrices for an unknown set of parameters during the online phase. However, the original system matrices are required to be in affine form, which is not applicable for all cases, especially if the geometric location of parts of the model is parametrized. In the following, we will assume the computational costs of solving r equation systems of size n to be affordable during the online phase. To speed up the creation of the dataset, it is advisable to reuse the expansion points at convergence of a model with a similar parameter configuration as initial expansion points as suggested in [21,22]. Rather than interpolating the complete set valued function σ (p) at once, we learn r independent models, one for each interpolation point. By doing so, the individual functions σ i (p) , i = 1, . . . , r remain smooth and can be approximated by interpolation methods more easily. For each model in the training dataset, the expansion points are ordered by their absolute value before training the data-driven model. This means, there will be one model for the "first" expansion point (i.e. with the lowest absolute value), another for the "second", and so on. To illustrate this, Fig. 1 plots the absolute value of the first expansion point for a model of a cantilevered beam over different parameter values for the beam length and its Young's modulus. It can be seen, that the function surface is smooth and no jumps are present, as the number of modes per frequency band changes smoothly depending on geometric or material properties of the model. This is the case for most vibro-acoustic systems [23]. After querying each model, the obtained interpolated expansion points are combined to a new set σ (p) and the corresponding projection matrices V, W are computed using the full order model. If the expansion points are matched one by one, i.e. the order of the reduced systems in the training data is not constant, it is necessary to determine a reasonable order of the reduced model for a new parameter set p. For this, another model is trained, which maps the parameter sets in the training data to the acquired reduced orders. Again, the required orders change relatively smoothly with varying parameters, so a reasonable interpolation quality can be achieved. Figure 2 plots the required orders for the beam model, if a frequency range from zero to 10 kHz is to be matched.

Postprocessing of IRKA expansion points
Not all optimal interpolation points obtained from IRKA are necessarily mirror images of poles of the corresponding transfer function [3, p. 87]. This means, that IRKA can yield multiple interpolation points around one physical pole for one set of parameters but not for the neighboring set, for example. Therefore, the interpolation points need to be processed in a way, that the "first" point in the sense of Sec. 3.1 resembles the "first" pole, and so on. We achieve this by converting the reduced model's transfer function into the pole residual form with the reduced system's eigenvalues λ i , which are, up to the convergence tolerance, identical to the mirror images of the interpolation points obtained from IRKA, and the corresponding residuals R i . The residuals can be computed from an eigenvalue decomposition of the reduced system [24,25], which is computationally cheap to perform. The higher the absolute value of a residual, the higher the corresponding pole's dominance in the transfer function, so removing expansion points with a small residual does not affect the system's transfer function considerably. Because we only want to use expansion points corresponding to physical poles to train the regression models, we only use the expansion points λ i with a residual R i larger than a certain threshold to compute the projection bases V, W. As well as removing these residuals from the transfer function, removing the corresponding expansion points from the projection bases has only little effect on the transfer function of the reduced system. The resulting sets of interpolation points for different parameters can now be ordered in a way, that the first entry in the set corresponds to the first physical pole and so on. Again, all poles can be treated individually, as proposed in Sec. 3.1.

Data-driven interpolation methods
The parametric reduction strategy proposed in Section 3.1 can be used with different data-driven interpolation or machine learning methods. No special requirements have to be met by the chosen strategy. In the following and for the numerical experiments, a nearest neighbors regression, a multivariate polynomial regression, and a support vector machine regression model are considered.

Nearest neighbors regression
A k-nearest neighbor (kNN) model fits the predicted valueσ i (p) given a parameter set p not present in the training data based on the k data points in the training dataset having the shortest distance in a defined metric. The fit is defined aŝ where p j are the k closest points to the new parameter set p, N k (p) defines the neighborhood of p according to a metric, w j is a weighting factor, and σ i,j is the ith expansion point regarding the jth neighbor of the queried value. So the value of the approximationσ i (p) is defined as the weighted mean between the k valuesσ i (p j ) with the smallest distance between p and p j in parameter space [26]. We choose the Euclidean distance as metric and the inverse of the distance as weighting function. This assigns a higher weight to a point from the training data with a smaller distance, allowing a smoother approximation of the original solution space. For the numerical experiments covered in the following, the kNN models were found to be able to approximate also higher dimensional solution spaces with good accuracy.

Multivariate polynomial regression
In the context of multivariate polynomial regression (MPR), the function σ i (p) of the location of the ith expansion point can be approximated by a polynomial of order q of the form σ i (p) = α 0 + l j1=1 α j1 p j1 + l j1=1 l j2=j1 α j1j2 p j1 p j2 + · · · + l j1=1 l j2=j1 · · · l jq=jq−1 α j1...jq p j1 p j2 · · · p jq , with coefficients α j0 , α j1 , . . . α j1...jq . The coefficients are obtained by solving an overdetermined equation system and minimizing the mean squared residual error, also known as least squares fitting. If the individual parameter values in p greatly differ in size, the parameters should be scaled to have a similar magnitude before fitting the polynomial. Otherwise, the least squares problem is likely to be ill-conditioned. Also a high order q might lead to an ill-conditioning of the least squares problem. However, in the setting of this paper, the functions σ i (p) are relatively smooth, so a sufficient accuracy can be achieved with a relatively low q. Compared to the local approach of the kNN regression, the MPR is fitted to the global parameter space allowing a better generalization if the solution space is smooth, but being not as accurate regarding jumps or discontinuities in the solution space. Increasing the polynomial order to approximate this behavior is very likely to lead to a numerically ill-conditioned least squares problem. MPR therefore performs better if approximating smooth functions.

Support vector regression
A support vector machine (SVM) finds vectors describing a high dimensional tube containing samples forming a smooth manifold [27]. This tube is found by solving a regression under a certain error tolerance and weighting all samples lying outside this tube by a loss function. The resulting support vectors can be used for regression analyses, termed support vector regression (SVR). Predictions for new input parameters are found bŷ www.gamm-proceedings.com where the parameters α, α * are found by solving a minimization problem, b is the intercept of the regression, and G (·) is a Gram matrix built using a kernel function. This function can be a radial basis function G (p i , p j ) = exp − p i − p j 2 , for example.

Numerical experiments
To assess the performance of the different regression models, two parametrized models of vibrating structures are considered: an Euler-Bernoulli beam, clamped at one end and excited by a tip load at the other end, and an acoustic cavity, excited by a vibrating membrane attached to a boundary of the cavity. The beam model is a standard benchmark example for model order reduction of second-order dynamic systems, having symmetric system matrices and proportional damping. Due to the vibro-acoustic coupling, the cavity model's system matrices are non symmetric and the damping behavior is spatially varying, making it an interesting experiment to assess the methods in a vibro-acoustic context. The performance of the parametric reduced order models regarding the different regression models and underlying reduction procedures is measured using the normalized root mean square error (NRMSE) where y is the frequency response of the full order model in the considered frequency range, y r the response of the reduced model obtained with predicted expansion points, and n f the number of frequency samples in the response. All computations have been performed on a workstation with an Intel® Xeon® Gold 6136 CPU @ 3.0 GHz with 256 GB RAM.

Beam model
We model the vibration of two sets of clamped Euler-Bernoulli beams with squared cross section A = 1 × 10 −6 m 2 subjected to a harmonic load at the free end using a finite element model of order n = 300. Proportional damping is applied, such that C = αM + βK. Set P 1 is parametrized regarding length l and Young's modulus E, set P 2 additionally regarding material density ρ and proportional damping parameters α, β. The parameter sampling for both datasets is summarized in Table 1 Each combination of the parameters is present in the dataset, resulting in 200 samples for P 1 and 1125 samples for P 2 . The training data for both sets contain every third parameter combination of the complete dataset, the rest of the dataset is used for testing the approximation. The training set for P 1 contains 67 samples and the resulting models are tested with the remaining 133 samples; the training set for P 2 contains 375 samples, its test set contains 750 samples. For each set of models P 1 , P 2 , two datasets are computed: one with IRKA, reducing the original model to a fixed size of r = 20, and one using the adaptive IRKA strategy described in Sec. 3.1 for a frequency range of s = [0, . . . , 10 000] rad s . Increasing l and ρ shifts the transfer function peaks towards lower frequencies, increasing E shifts them towards higher frequencies. Increasing α, β decreases the amplitude of the transfer function but has little effect on the location of the peaks. The effect of the parameters on the transfer function is shown in Fig. 3. Comparing the model with the most response peaks to the one with the fewest, it is obvious that different reduced orders are required to reduce these two models efficiently and setting a fixed reduced order for all samples in the dataset would either lead to a poor approximation for some parameter sets or to an overestimation of the required order for the other parameter sets, thus reducing the efficiency of the reduced model.
For each sample, the expansion points are predicted according to the chosen regression method. The according reduced model is computed and the NRMSE over the frequency range s = [0, . . . , 10 000] rad s is computed. Figure 4 shows the NRMS of the approximation error for all models created with kNN, MPR, and SVR with the training data from P 1 , Fig. 5 for the training data from P 2 . The boxes mark the lower and upper quartile, the horizontal lines in the boxes show the median error, and the whiskers reach to the maximum and minimum values of the NRMSE. The kNN model was fit for k = 3 neighbors and a Gaussian kernel function G (p i , p j ) = exp − p i − p j 2 was used in the SVR. MPR is employed using a quadratic polynomial (q = 2) as well as a polynomial of order q = 8. The least squares problem is well-conditioned for all considered cases. Figure 6 exemplary plots the transfer functions for one parameter realization. The full model's transfer function is compared to the transfer functions of the reduced model computed with IRKA and the reduced model computed with interpolated expansion points obtained from a kNN model. The computation times to fit the different models for the two parameter case are t kN N = 0.076 s for the kNN model, t M P R,2 = 0.057 s and t M P R,8 = 0.063 s for the MPRs, and t SV R = 4.358 s for the SVR; for the five parameter case the models take t kN N = 0.351 s, t M P R,2 = 0.133 s, t M P R,8 = 4.428 s, and t SV R = 28.485 s to be fitted. The SVR did not succeed in robustly approximating the location of expansion points for projection in the five parameter setting, possibly due to a too small set of training data. kNN and MPR yield good results, even for the high dimensional parameter space of P 2 . For the two parameter case P 1 , all methods performed similarly with comparable median errors. Using IRKA with an adaptive order and learning an additional model with this order does not have a large influence on the median approximation error for P 1 . However, learning the required reduced order for P 2 using a MPR yields a model prone to underestimating the required order for parameter combinations at the boundaries of the sampled region. In this case, the local approach of kNN outperforms the global approach of the MPR. The resulting outliers have very high errors compared to the kNN models. The actual approximation quality of the sampling point locations is of comparable quality for kNN and MPR, so combining a kNN model for the required order with a MPR approximation for the sampling point locations yields better results. The SVR with adaptive order also produced some outliers originating from an underestimated required order of the reduced system.
www.gamm-proceedings.com The order of the polynomial employed in the MPR does not have a large influence on the approximation quality. Despite the rather sparse parameter sampling in P 2 , the models fit with kNN and MPR perform well. The average order of all considered models is r = 10.4 for P 1 and r = 12.9 for P 2 , compared to the fixed order of r = 20. By fitting also the reduced orders, the computational cost of computing the projection basis can therefore be potentially decreased by around 40 % in this case.

Acoustic cavity model
The second model considered is a two-dimensional rectangular acoustic cavity with rigid walls parallel to the coordinate axes. At the wall along x = 0, a vibrating membrane of height a attached to an aperture in the middle of the wall is excited by a harmonic pressure load and radiates into the acoustic fluid of the cavity. The acoustic pressure is recorded at a location near the middle of the cavity. The model of order n = 3000 is parametrized regarding the size of the vibrating membrane a = [0.05, . . . , 0.2] m and the cavity length in x-direction l = [0.4, . . . , 0.7] m and the dataset contains 200 samples. The training dataset contains 67 samples, the remaining 133 samples represent the test dataset. The parameters are sampled according to a linear distribution between the respective minima and maxima and the training data is chosen to be every third parameter combination. Contrary to the beam model, increasing parameter a does not shift the peaks of the transfer function in whole, but changes the distribution of the peaks, as plotted in Fig. 7. Because of the parametrization of the model's shape in a, it is not trivial to find an affine parameter representation of the model, which is a prerequisite for many parametric model order reduction methods.   Figure 9 compares the NRMS approximation errors of the reduced models using the parameters from the test dataset regarding the different regression models. All learning methods yield reduced models of comparable quality with acceptable accuracy. The median errors are slightly higher than those present in the beam model and outliers with errors of some magnitudes higher than the median can be observed. The models based on fixed and adaptive order IRKA again . Contrary to the beam models, not all moments in the transfer function are well separated and some modes may overlay each other. This results in locations of the expansion points lying on a not as smooth surface as it is the case for the beam model (see Figs. 1, 2) and thus a higher overall error. Additionally, a mode being overlaid by other modes for a certain parameter combination might be required for a good approximation of a model with similar parameters. In this case, the interpolation misses this expansion point location, again resulting in a higher approximation error. Nevertheless, all errors are in an acceptable range and no reduced model fails to approximate the full system response completely. The computation times to fit the different models for the two parameter case are t kN N = 0.063 s for the kNN model, t M P R,2 = 0.054 s and t M P R,8 = 0.082 s for the MPRs, and t SV R = 1.767 s for the SVR. This shows, compared to the two-parameter beam model, that the time to fit the models is independent of the full model order n.  Fig. 9: NRMSE for the cavity reduced models obtained with expansion points from kNN, MPR, and SVR models with parameters from the test dataset.

Conclusions
We present a strategy for parametric model order reduction of vibro-acoustic systems using regression models. The regression model yields expansion points for a moment matching MOR method suitable to create a reduced model of good approximation quality without iterations. After having obtained the database with optimal expansion points for a number of parameter samples, the main computational cost lies in projecting the full order model onto the reduced space, requiring r solves of a linear equation system of size n. As the original problem is not required to have an efficient affine parameter representation, the strategy to leverage this computational cost by reusing the subspaces created while computing the database from [10] cannot be applied here. Especially the kNN regression model is able to robustly predict valid locations for the expansion points while being computationally inexpensive to fit. The MPR performs well, also for a high number of parameters, if the solution space is smooth. The SVR yields good results for smooth solution spaces, but cannot be fitted for a larger number of parameters given the dataset of limited size. However, fitting the SVR has a substantially higher computational cost than the other two methods. It should be noted that the fitting of all models is independent from the full model order n. The presented method requires the full system's operators to be available during the procedure. As this is not always possible, a completely data-driven approach would be favorable.