Nonlinear dimensionality reduction for parametric problems: a kernel Proper Orthogonal Decomposition (kPOD)

Reduced-order models are essential tools to deal with parametric problems in the context of optimization, uncertainty quantification, or control and inverse problems. The set of parametric solutions lies in a low-dimensional manifold (with dimension equal to the number of independent parameters) embedded in a large-dimensional space (dimension equal to the number of degrees of freedom of the full-order discrete model). A posteriori model reduction is based on constructing a basis from a family of snapshots (solutions of the full-order model computed offline), and then use this new basis to solve the subsequent instances online. Proper Orthogonal Decomposition (POD) reduces the problem into a linear subspace of lower dimension, eliminating redundancies in the family of snapshots. The strategy proposed here is to use a nonlinear dimensionality reduction technique, namely the kernel Principal Component Analysis (kPCA), in order to find a nonlinear manifold, with an expected much lower dimension, and to solve the problem in this low-dimensional manifold. Guided by this paradigm, the methodology devised here introduces different novel ideas, namely: 1) characterizing the nonlinear manifold using local tangent spaces, where the reduced-order problem is linear and based on the neighbouring snapshots, 2) the approximation space is enriched with the cross-products of the snapshots, introducing a quadratic description, 3) the kernel for kPCA is defined ad-hoc, based on physical considerations, and 4) the iterations in the reduced-dimensional space are performed using an algorithm based on a Delaunay tessellation of the cloud of snapshots in the reduced space. The resulting computational strategy is performing outstandingly in the numerical tests, alleviating many of the problems associated with POD and improving the numerical accuracy.


Introduction
Solving parametric problems is an important challenge in computational mechanics. Usually, the set of parametric solutions lies in a manifold of low dimension (equal to the number of independent free parameters, that is the intrinsic dimension of the problem) inside a large-dimensional Euclidean space (typically, the dimension is the number of degrees of freedom of the discretization). Reduced-Order Models (ROM) are devised to decrease the computational complexity, curtailing the number of degrees of freedom of the full-order problem and enabling real-time solutions of problems requiring multiple queries to the model. The methodology presented in this paper is classified as an a posteriori ROM. That is, it is based on sampling the parametric space, computing the corresponding solutions offline, and devising a strategy to drastically diminish the cost of the subsequent online computations (corresponding to different values of the parameters). Alternative a priori approaches construct the ROM offline, explicitly accounting for the parametric dependence with no arbitrary selection of the snapshots [3,4,5,6]: then, the subsequent computations are just functional evaluations.
The most popular a posteriori ROM is Proper Orthogonal Decomposition (POD). POD is extensively used in the framework of dimensionality reduction of boundary value problems to identify linear spaces of lower dimension including the manifold of solutions [14,15,18]. The linear space is identified by doing a Principal Component Analysis (PCA) on a set of solutions of the parametric problem (snapshots). Thus, the number of degrees of freedom is readily reduced to the dimension of this linear space using a Reduced Basis technique.
Often, the manifold of solutions is curved, and therefore the linear subspace embedding the manifold has a dimension much larger than the intrinsic dimension of the parametric family of solutions. The kernel PCA (kPCA) [19] technique aims at reducing the large dimensional full-order solutions space (also denoted as input space for the role it plays for kPCA) into a nonlinear manifold. Following this idea, a kernel POD (kPOD) strategy is introduced here to solve the multidimensional problem in the kPCA nonlinear manifold. Contrary to other nonlinear dimensionality reduction techniques (e.g. Locally Linear Embedding, Isomap... [21,17,2]), kPCA is an algorithmically simple approach, proposing an explicit and analytical forward mapping from the full-order space to the reduced-order space.
The driving force of this paper is to take advantage of the kPCA nonlinear dimensionality reduction to solve the parametric problem in a space of a reduced dimension as low as possible, ideally equal to the intrinsic dimension of the problem (number of parameters). Devising a viable a posterior ROM methodology based in this notion requires developing novel strategies that are presented along the paper. The major concepts introduced to complement the kPOD strategy are • using local linear spaces generated by the neighbouring snapshots • enriching the approximation space with the cross-products of the snapshots, introducing a quadratic description • defining an ad-hoc kernel for kPCA, based on the nature of the problem to be solved • introducing a novel iterative algorithm based on a Delaunay tessellation of the cloud of snapshots to solve the nonlinear problem in the reduced space The resulting strategy is interpretable as a local POD approximation, in the sense that for every subsequent query, the approximation is based on a small set of neighbouring snapshots. The idea of selecting a local subset of the snapshots, based on heuristic or automatic clustering techniques is exploited by different authors, see for instance [16,1,10,13]. Here, the local approach is devised together with the exploration of the nonlinearly reduced space, and naturally associated with the kPCA reverse mapping. The remainder of the paper is structured as follows. First, in section 2, the parametric problem is presented, already in discrete form and the standard POD is recalled. Next, in section 3, the kPCA is briefly described, emphasizing the explicit nature of the forward mapping and the different alternatives for the backward mapping. Also in section 3, the possibility of enriching the family of snapshots with quadratic terms is introduced. Section 4 introduces the kPOD concept, with emphasis in the approximation criterion given a local tangent space (subsection 4.1) and the strategy to explore the reduced space (subsection 4.2). Finally, section 5 presents some numerical tests demonstrating the efficiency of the proposed methodology in two advection-diffusion problems, and section 6 draws some concluding remarks.
The conceptual structure of the methodology proposed in the paper is summarized in Figure 1. The problem to be solved is a parametric linear system of equations, corresponding to the discretization of some parametric boundaryvalue or initial-value problem. Solving this problem maps points from the parametric space µ ∈ IR nP into full-order solutions x(µ) ∈ IR d (input space for the dimensionality reduction). Thus, n S samples µ i , i = 1, 2, . . . , n S (red points in left panel of Figure 1) correspond to n S full-order solutions computed offline in IR d : the snapshots to train the ROM (red dots in the center panel of Figure  1). The kPCA technique recalled in section 3 defines the reduced dimension k d and maps the training set into the reduced snapshots z i , i = 1, 2, . . . , n S (red dots in the right panel of Figure 1, the reduced space). Thus, the kPOD presented in section 4 introduces a methodology to compute the solution associated with a new value µ . This requires exploring the reduced space seeking the optimal z and the corresponding x that is best possible approximation to x(µ ). It is worth noting that each candidate z defines a local linear space V U (represented as a sphere in the center panel of Figure 1) where the solution is computationally affordable.
2 Problem statement, linear dimensionality reduction and linear Reduced Order Model

Problem statement
The discretization of some parametric Boundary Value Problem results in the following parametric linear system of equations where the unknown x ∈ IR d corresponds typically to the vector of nodal values in a grid, having d degrees of freedom. Input data, both matrix K and vector f , depend on the vector of parameters µ ∈ IR nP (being n P the parametric dimension). The solution x depends also on µ, as it is explicitly marked in the notation. Thus, the set of solutions x(µ) lies in a manifold of dimension n P (at the most) inside IR d . In this type of problems the number of parameters is much lower than the number of degrees of freedom of the discretization, that is n P d. Therefore, it is expected that the computational efficiency would drastically improve if the set of solutions in the manifold is described with a substantial reduction of the number of variables, ideally n P if some intrinsic parametrization of the manifold is available. In practice, if the manifold is described by a number of k variables (n P ≤ k d), the size of system of equations (1) is going to be reduced from d to k.

Linear dimensionality reduction: PCA
The first idea is to characterize the manifold in IR d containing all the solutions x(µ) as a low-dimensional linear space. Obviously, the manifold is not expected to be linear but it may be contained into a linear subspace of a dimension much lower than d (in a way, a sort of bounding box).
In order to characterize the manifold, it is populated by a number of samples or snapshots. In practice, the parametric space is sampled selecting n S points in IR nP , say µ i for i = 1, 2, . . . , n S . The corresponding solutions x i = x(µ i ) are duly computed solving n S times the full-order system (1). These solutions constitute a training set that is going to be used to construct the reduced order model.
The snapshots in the training set are collected in a matrix, namely If the sampling is representative of the parametric space, the linear subspace generated by the samples is expected to contain all the solutions, also the ones corresponding to the values of µ that have not been sampled. Thus, for a new value of µ the original problem (1) is to be solved for x in V X instead of for x in IR d . As indicated in section 2.3 below, the size of the final system to be solved is the dimension of the reduced subspace where the solution is sought, in this case V X .
In order to guarantee that the sampling is representative, the number of samples n S is taken as large as possible. This is not only increasing the dimension of V X (and therefore the size of the system to solve); for large n S the family of snapshots generating V X may contain linear dependencies (redundancies) resulting in an ill-conditioned reduced system of equations.
In order to eliminate the redundancies and to further reduce the dimension of the linear subspace where the solution is sought, the well known Principal Component Analysis (PCA) technique is used, based in the Singular Value Decomposition (SVD) of X. The SVD of X ∈ IR d×nS produces two unit matrices U ∈ IR d×d and V ∈ IR nS×nS and a diagonal matrix Σ ∈ IR d×nS such that where, for d > n S , matrix Σ reads The singular values are sorted in descendent order, σ 1 ≥ σ 2 ≥ . . . σ nS ≥ 0. The columns of U ∈ IR d×d , denoted as u 1 , u 2 , . . . , u d , are an orthonormal basis of IR d . Noting that, (3) is rewritten as it is clear that, if the dimension of V X is n S (that is, the snapshots are linearly independent and σ nS = 0), then the first n S columns of U are an orthonormal basis of V X . If the actual dimension of V X is lower than n S , the lower singular values are zero. In practice, this is equivalent to the case in which they are negligible with respect to the larger ones. In order to keep only the relevant modes, a tolerance ε is selected and the lower singular values are neglected, retaining only the k largest values such that Thus, taking U = u 1 u 2 . . . u k ∈ IR d×k as the matrix of the first k columns of U , the space is an approximation to V X collecting most of the information (up to the tolerance ε). For instance, if ε = 0.01, taking k eigenvalues such that (6) is satisfied guaranties that V U is a subspace collecting at least 99% of the amount of information (of the variance of the model, to be precise).

Linear Reduced Order Model: Reduced Basis and Proper Orthogonal Decomposition
The Reduced Basis (RB) idea consists in solving the original problem (1) in a subspace of IR d of much smaller dimension but representative of the solutions to be computed.
The crude RB version is to assume that the family of snapshots X is a basis, and therefore to solve (1) taking x ≈ x RB = Xα ∈ V X , where α ∈ IR nS is the vector of unknowns. Thus, system (1) is transformed into which is a system of n S equations and n S unknowns.
Remark 1 (Residual minimization) In the case K is symmetric positive definite, Eq. (8) is equivalent to find α minimizing Being the · E the discrete form of the standard energy norm defined as Note that minimizing the energy norm is consistent with the underlying approximation criterion for the Galerkin formulation. For non-symmetric matrices, this equivalence does not hold but the solution of system (8) is all the same an approximation of (1) in V X .
Remark 2 (Minimizing residual in Euclidean norm) If, instead, the standard Euclidean norm of the residual is selected, that is x 2 = x T x, and the discrepancy form is defined as the resulting reduced system of equations is which is valid also for non-symmetric K, provided that the symmetric part of K is positive definite.
Remarks 1 and 2 are pertinent to realize that the reduced system presented in equation (8) is valid in any case, and consistent with the standard Galerkin strategy if K is symmetric positive definite. The reduced system (12) uses a Least-Squares criterion to project from the full-order space to the reducedorder space, which is not in agreement with the energy projection used from the continuous space to the full-order discrete space. In that sense, the reduced equation (8) is preferred to (12) because it is consistent with the residual minimization for symmetric matrices, and it is adopted also for the case in which K is non-symmetric. Both options were implemented in the examples of section 5, resulting in solutions which are equivalent for all practical purposes. The family of snapshots becomes a proper basis of V X if the snapshots are generated guaranteeing that they do not include redundancies. This is the case when a greedy strategy guided with some error assessment is used to generate the basis, see [7,20,12]. If the sampling is performed arbitrarily (or even randomly), the family of snapshots typically includes linear redundancies associated with any exhaustive sampling. This results in a rank-deficient matrix X (of rank lower than n S ) and therefore a singular matrix in system (8) (singular if there are null singular values or extremely ill-conditioned if there are very small with respect to the largest).
The PCA, described in section 2.2, provides an orthonormal basis of dimension k ≤ n S and the associated approximation space V U . Combining this with the RB idea leads to the Proper Orthogonal Decomposition concept, that consists in taking x ≈ x POD = U z ∈ V U , where z ∈ IR k is the new vector of unknowns. Analogously as for (8), the POD reduced system reads which is a system of k equations and k unknowns. The POD has eliminated in U the possible redundancies in X. Thus, the reduced system (13) does not spoil the well-conditioned character of matrix K (i.e. if K is well-conditioned, then also U T KU is well-conditioned). Actually, the most disadvantageous degradation of the condition number in the reduced system is controlled by tolerance ε. Recall that the PCA is a dimensionality reduction technique representing the vectors x ∈ IR d (assumed to be in the set of parametric solutions) in a subspace V U of dimension k. Thus, a vector x is mapped into a It is important noting that in PCA, both the forward mapping x → z = U T x and the backward mapping z → x = U z are explicitly defined. In the following, the kPCA technique is described as a generalization of PCA to nonlinear multidimensionality reduction and the difficulties associated to properly define the backward mapping are discussed in detail.

Nonlinear dimensionality reduction with kernel-PCA
Kernel Principal Component Analysis (kPCA) is a nonlinear dimensionality reduction technique, based on applying PCA to a transformed training set, being the transformation defined by a kernel form κ(·, ·) defined in IR d × IR d , see [8] for details. The kernel κ(·, ·) is not necessary bilinear but it has to guarantee that the matrix resulting of evaluating the kernel in all the elements of a basis (see below) is symmetric positive definite (this matrix is intended to stand for a Gram matrix). Once a kernel κ is selected, the Gram matrix G ∈ IR nS×nS is introduced as having components An application g(·) from IR d to IR nS is introduced associated with kernel κ: Note that the image by g(·) of the j-th snapshot, g(x j ), coincides with the j-th column of G, denoted as g j .
Being G symmetric (recall that the kernel κ has to guarantee that G is symmetric positive definite), its SVD decomposition reads where in (16) the n S × n S square diagonal matrix Σ has as diagonal entries the squared singular values σ 2 1 ≥ σ 2 2 ≥ . . . σ 2 nS ≥ 0.

kPCA forward mapping
The dimensionality reduction is performed setting a tolerance ε and selecting the reduced dimension k with the criterion given in (6). Matrix V ∈ IR nS×k is taken as containing the first k columns of V . Then, the forward mapping on the reduced space IR k is readily defined for the snapshots: [8] for details. The images of the snapshots in the reduced space are collected in matrix Using the definition of g(·) in (15), an exhaustive forward mapping F applied to every x ∈ IR d reads Note that this forward mapping is explicit, similarly as with PCA.

kPCA Backward mapping
For the kPCA technique, the question of the backward mapping (or pre-image, or the determination of F −1 ) is not as simple as for the PCA.
Recall that in PCA, the k-dimensional linear manifold in IR d is naturally defined as V U in (7) and therefore, the backward mapping from IR k to V U , maps z into x = U z .
For nonlinear dimensionality reduction techniques, the definition of the preimage is not as obvious because it is required to choose the description of the nonlinear k-dimensional manifold in IR d . Often, see [11,8], the low dimensional manifold is defined locally as a tangent space linearly generated by the neighbouring snapshots.

Remark 3 (Tangent space)
The concept of tangent space refers, here and in the following, to any local linear approximation of the manifold. The dimension of this linear space is typically larger than the dimension of the manifold (here, equal to k). In that sense, it differs from the standard geometrical definition of tangent space, that has the same dimension of the manifold: here, the tangent space is a locally defined linear space that includes the geometrical tangent space.
In order to define the inverse of the forward mapping F described in (17), one aims at mapping back any z ∈ R k into some x ∈ IR d such that z = F (x ). Function F defined in (17) is at the best surjective (it browses all the elements in IR k ) but it cannot be in any case injective (because different elements x ∈ IR d have the same z image). Thus, the inverse is not properly defined unless the target space is a proper low-dimensional manifold V ⊂ IR d . The same applies for standard PCA but in this case the linear subspace V U is implicitly selected as target manifold.
Once the low-dimensional manifold V ⊂ IR d is chosen, the backward mapping is taken as or, in other words, the pre-image is the element x ∈ V such that F (x ) is equal to (or is the best approximation in V to) z. Thus, the definition of the backward mapping depends on the selection of both the manifold V and the approximation criterion, that is the norm · .

Manifold embedded in the linear subspace
The first idea is to select V ⊂ V X (see section 2.2), that is a subset of the space generated by the snapshots. This is equivalent to say that x ∈ V is such that namely x ∈ V is a weighted average of the snapshots. The general approach is to select the weights based in the minimization of a discrepancy functional involving the forward mapping, say and finding w as Having as much as n S unknown weights would make the minimization problem ill-conditioned. This is because different combinations of weights (in particular when they correspond to weights associated with snapshots distant of the point of interest) may result in very similar values of x = Xω that are transformed by F into indistinguishable points z ∈ IR d . It is therefore necessary to restrict the number of weights to those related with the closest neighbouring snapshots: that is replacing X by X and n S by n S . Even so, solving the minimization problem (21) is not easy and is typically associated with a number of numerical difficulties (identify a proper first guess, get trapped in local minima...). Often, a more straightforward criterion is often adopted to compute w corresponding to z.
In practice, the weights are selected locally (only the weights corresponding to closer snapshots in IR d are taken as non-zero). Thus, for a given z only the closest neighbours are taken into consideration. The set of indices corresponding to these snapshots is denoted by η (the dependence of z is not explicit in the notation), the number of elements in η is denoted by n S . Correspondingly, the locally reduced matrix of snapshots is denoted by X (the columns of X in η). The number of unknowns (weights to compute), n S , is significantly lower than n S . This is to say that, for a given z, the manifold V is defined locally as V ⊂ V X , and there is some w ∈ IR nS such that x = Xw. The linear subspace V X of dimension n S is seen as a tangent space to V in the vicinity of F −1 (x), in the sense of Remark 3.
The most classical strategy to compute the weights is using a radial basis interpolation. This consists in computing the distances from z to the images of the snapshots in the reduced space, d i = z − z i , for i = 1, 2, . . . , n S , and define the weights inversely proportional to the squared distances, i.e. w i ∝ 1 The constant of proportionality is taken such that the sum of the weights is the unity: The selection of the squared distances is somehow arbitrary and may be replaced by any monotonic function of the distances (typically with different exponents). Generally, the criterion to select the set of neighbouring samples η is based on the distances d i : i is in η if d i is lower than some threshold distance. For the radial basis interpolation, this is equivalent to set to zero the weights below some tolerance. An alternative to compute the weights w is proposed here following a classical idea of least-squares fitting. Note that typically k n S and even if only n S neighbouring snapshots are selected, k is also lower than n S . Thus, a linear representation of z in terms of the snapshots is an underdetermined system of k equations for n S (or n S ) unknowns. The typical least-squares-inspired strategy is to use the More-Penrose pseudoinverse of Z ∈ IR k×nS , which is a n S × k matrix usually denoted by Z † . Then, the solution of (23) is Note that the solution proposed in (24) is valid for both overdetermined systems of equations (k > n S , which does not hold here; in this case the equality in (23) would be an approximation) and underdetermined systems. In the latter case, the pseudoinverse selects among all possible solutions of (23) the one having the lowest norm.

Manifold including quadratic terms.
The main advantage of kPCA with respect to PCA is the possibility of describing a nonlinear manifold V of dimension k d. Actually, the backward mapping strategies proposed in subsection 3.2.1 are built as linear tangent approximations, see equation (19), constituting a globally curved manifold. Nevertheless, the local linear approximation may represent a limitation to capture the local curvature of the actual manifold V.
Thus, a new local approximation scheme is introduced accounting for a quadratic representation. The idea is replacing (19) by where denotes the Hadamard product (component by component). The new vector of weights w ∈ IR nQ is of length n Q = n S + 1 2 n S (n S − 1) (in practice, n S is to be replaced by n S to compute n Q , accounting for the local character of the approximation), and the basis matrix B ∈ IR d×nQ is or the local counterpart B including only x i and the different cross-products The same strategies defined in subsection 3.2.1 (viz. minimization of a discrepancy functional and least-squares fitting) are to be used here just replacing matrix X (or its restriction X to the selected neighbouring snapshots) by matrix B (discarding also the columns involving to the non selected snapshots). Note that this is equivalent to take a larger tangent space to V, including also the quadratic terms. In the case of the least-squares fitting, the matrix Z has to be augmented including the images by F of the quadratic terms.

A kPCA-based ROM: the kPOD
The POD strategy leads to a reduced system of equations (13) by seeking, for a given value of µ , an approximate solution x(µ ) ≈ U z of equation (1) in the linear subspace V U .
The kPCA proposes a nonlinear manifold, of reduced dimension k which is expected to be lower than the one proposed by PCA and POD. The idea of devising a kPCA-based ROM is to seek the solution x of (1) in the nonlinear manifold V containing the pre-images F −1 (z) of every z ∈ IR k . The approach we present here, labelled as kernel Proper Orthogonal Decomposition (kPOD), aims at finding in the reduced space the best approximation of the solution of (1) corresponding to a given µ ∈ IR nP . That is exploring all possible z (ranging a space of dimension k) to pick up z ∈ IR k such that, Note that the dependence of z on µ is omitted in the notation for the sake of clarity, being the different alternatives to define the backward mapping F −1 discussed in section 3.2.
The overdetermined system of equations (27) (with d equations and k d unknowns) is to be solved using some fitting criteria that is expressed in a general fashion as the minimization of a discrepancy functional Alternatively, the criterion is stated directly for x = F −1 (z ) ∈ V, defining the discrepancy functional analogous to (9), namely According to expression (28) the space to explore in the minimization is of dimension k, associated with the kPCA reduction, which is expected to be lower than the one provided by POD (the kPCA nonlinear dimensionality reduction is likely closer to the intrinsic dimension of the problem). The price to pay for reducing the k is, in this case, the fact that problem (28) is nonlinear while the POD reduced system (13) keeps the linear character of the original problem (1). The advantage of using formulation (29) is that once the set of neighbouring snapshots η (see section 3.2.1) is defined, the local neighbourhood of manifold V is embedded in a linear space similar to V X (or V B if the quadratic terms are also included) and the problem to be solved is analogous to the reduced system of the POD, see equation (13).

Optimal solution for a given point z in the reduced space
The given point z in the reduced space is associated with a set of neighbouring snapshots, whose indices are collected in η. Thus, the characterization of the backward mapping F −1 (·) is, in practice, a matter of selecting a proper distribution of weights such that F −1 (z ) = x = Bw, as described in section 3.2. This is equivalent to determining an element in the tangent space V B . In this context, to improve the local approximation properties, it is relevant to center the selected set of samples and generate the quadratic terms correspondingly. Namely, to compute the local averagē and to express x in the tangent space as (including quadratic terms) In order to eliminate possible redundancies in B ∈ IR d×nQ , following equation (3), the SVD is performed and reads B = U Σ V T . The matrix is reduced selectingk with a criterion similar to (6). Then, matrix U ∈ IR d×k containing the firstk columns of U is used to define the reduced approximation The manifold V (locally approximated by the affine subspaces {x} + V B or {x} + V U ) contains the solutions of (1). The optimal value of w is therefore readily computed solving thek ×k linear system analogous to (13)

Exploration of the reduced space
The strategy proposed in section 4.1 procures the solution according to the criterion expressed in (29). The only difference lies in the set η, associated with the space where x is selected for the minimization problem: instead of the nonlinear manifold V, the minimization is performed in the tangent affine space {x} + V U , which depends on the point z selected in the reduced space IR k . z 1 z k Figure 2 The cloud of snapshots (left) defines a Voronoi tessellation of cells associated with each snapshot (center). The right panel illustrates the levels of connectivity associated to some snapshot z i (marked as · ). Connectivity is based on the Voronoi diagram. The first-level patch includes z i itself and all the snapshots whose Voronoi cells are adjacent to it (marked with · ). The second-level patch includes also the cells associated with snapshots marked with · .
As it is clear from (28), the reduced space has to be explored to find the point z producing the best solution. The quality of the different approximations is to be assessed using some residual-based discrepancy functional, see (28).
It is also clear from section 4.1 that the role of z in the computation of x is only to define the closest neighbours, that is, the set η containing the indices of the n S neighbouring snapshots. Accordingly, it makes sense to explore IR k discretely, browsing the zones in which the neighbouring points in the training set are the same, that is patches around each snapshot.
Thus, the strategy proposed here to explore the different z boils down to define patches in IR k , associated with the cloud of points in the training set mapped into the reduced space. These patches are in fact characterizing the neighbours connected to each snapshot.
Here, this is readily performed using a generalized Delaunay tessellation (and the equivalent Voronoi diagram) of the training set in IR k . Note that the Voronoi cells Ω i ⊂ IR k are centered in the points z i of the training set (snapshots), for i = 1, 2, . . . , n S . A straightforward criterion is to consider as neighbours of z ∈ Ω i all the points connected to z i in the Delaunay tessellation, see Figure 2. This can be taken with one level of connectivity (being vertices of the same simplex) or two. Figure 2 illustrates how the cloud of points in IR k (in the illustration k = 2) is tessellated establishing both the Voronoi cells and the connectivity collected in the set of indices η. For a given snapshot z i (point marked with · in the figure), the set η may account for one level of surrounding cells (those marked with · ) or two (adding also those marked with · ) The use of the Delaunay algorithm is an excellent option to select the neighbouring points because it is available in many platforms and it is generalized to arbitrary dimensions (no limit in the value of k), see [9].
The strategy described above is defining the set η associated with any candidate point in the reduced space z. Once η is determined (having chosen if the neighbourhood takes one or two levels of Voronoi cells), the corresponding x  Figure 3 Illustration of the optimal path strategy to iteratively explore the z space. The patch of the initial guessz 0 produces a local solutionz 1 using the first connectivity level (left panel). Ifz 1 is not in the same Voronoi cell of z 0 , the next iteration z 2 is computed as the local solution in the patch ofz 1 with one connectivity level (center panel). Ifz 2 is in the same cell Ω j asz 1 , the next iterationz 3 is computed increasing the level of connectivity (right panel). The iterative algorithm is stopped ifz 3 is in the same cell asz 2 (and hence, asz 1 ).
is computed as indicated in section 4.1. The question is whether this solution is acceptable or if there is need of a further iteration in the reduced space. The idea presented here, and illustrated in Figure 3, is to devise an iterative algorithm generating a succession of points in IR k ,z 0 ,z 1 , . . . ,z ν ,z ν+1 , . . . The iteration scheme starts with some initial guessz 0 , taking just one level of Voronoi cells around the cell Ω i wherez 0 is located. This characterizes η and therefore allows computing x associated withz 0 . Then, x is mapped into the reduced space using the kPCA forward mapping, see equation (17) andz 1 is obtained. In the casez 1 is in a Voronoi cell Ω j different than the previous (j = i), the computation is performed again selecting η using a one-level Voronoi connectivity around Ω j . The operation is repeated until two successive iterationsz ν andz ν+1 belong to the same Voronoi cell, say Ω j . In order to avoid a possible local stagnation and to eventually increase the accuracy of the proposed solution, in this case the computation is performed again selecting a two-level connectivity around Ω j . If, as it is observed in all the examples in section 5, the next iteration, sayz ν+2 is also in the same cell Ω j , the iterative process is stopped. Else, ifz ν+2 is in Ω with = j, the iterative process continues. Besides being illustrated in Figure 3, this process is algorithmically described in the next section.
One important point is properly selecting the initial guessz 0 . A very effective possibility is to compute a POD solution and to map it to the reduced space using the standard forward kPCA mapping (17).

Algorithmic description
The proposed methodology is summarized here using an algorithmic description. The process is split into an offline phase and an online phase. The latter corresponds to the iterative procedure described in section 4.2. The offline phase includes the computation of the snapshots (full-order problems corresponding to selected points in the parametric space), the kPCA analysis and the preprocess of the training set in the reduced space (Delaunay tessellation). These two phases are synthetically described next and summarized in algorithms 1 and 2.
The first offline phase consists in 1. sampling the parametric space in points µ i for i = 1, 2, . . . , n S and compute the corresponding snapshots x i (training set X ∈ IR d×nS ) solving equation (1) n S times.
2. performing the kPCA dimensionality reduction: compute matrix G, perform SVD as in equation (16), and find the reduced dimension k and matrix V based in the tolerance ε.
3. map the snapshots into IR k and compute Z ∈ IR k×nS 4. Build a Delaunay tessellation in IR k with the reduced snapshots Z, characterize cells Ω i and the snapshots connected to them.
Input Data: Sampled parameters µ i for i = 1, 2, . . . , n S matrix K(µ) and vector f (µ), tolerance ε, kernel κ. In order to guarantee the quality of the local approximation, for snapshots located close to the boundary of the z-domain where the number of natural neighbours is small, a minimum number of surrounding snapshots is enforced according the dimension of the reduced space, k, and the selected number of levels.
4. perform SVD of B, find reduced dimensionk, matrix U and compute x solving equation (33) (k ×k system) and (32) 5. compute next iteration with the forward kPCA image of x ,z ν+1 = V T g(x ) 6. find the cell Ω j to wherez ν+1 belongs If Ω j = Ω i (and Ω j has not been visited yet): go to step 3

Numerical examples
This section presents two numerical examples involving a advection-diffusion problem. The first example is a transient one-dimensional problem, while the second one solves a steady state problem in a two-dimensional domain. In the two examples, the physical properties of the problem are exploited to build an ad hoc kernel function for kPOD, which successfully identifies the intrinsic dimension of the manifold of solutions. Throughout the section, kPOD includes quadratic terms in the approximation space. Errors are measured with the Euclidean norm with respect to a reference solution, which is taken as the full-order FE solution.

Transient advection-diffusion problem in 1D
We consider the transient boundary value problem u(0, t) = u(4, t) = 0, where the diffusivity is ν = 5 · 10 −3 and the advection velocity v takes values in the range [1,2]. For some v, the expected evolution of a representative solution in time is sketched in Figure 4. The problem is to be solved when varying two parameters: the time and the advection velocity. Nevertheless, due to the small diffusivity, vt is almost constant as the solution evolves in time, and the solutions lie in (or are very close to) a one-dimensional manifold.
The spatial discretization is uniform with 2000 intervals, leading to nodal solutions in IR 2001 . In time, we take 250 steps with an increment of ∆t = 0.005.
Discretizing equation (34) using centered finite differences in space and Crank-Nicolson in time, the nodal solution u n+1 at time step n + 1 is computed by solving a system of the form where u n is the nodal solution from time step n. Accounting for the Dirichlet values in the extremes of the interval, A and D are matrices of size 1999 × 1999.
Our goal is to reduce the size of these systems by means of POD and kPOD, in order to obtain a more efficient computation at every time iteration. The training set for this problem consists of 501 snapshots, including the inital condition and the full-order solutions of the problem for 10 equiespaced velocities in [1,2], for 50 time steps corresponding to n = 5, 10, 15 . . . 250.
Within the POD formulation, system (35) becomes whereū is the mean of the snapshots and U is used to define the reducedorder approximation as indicated in (32). Taking a tolerance ε = 10 −8 in the dimensionality reduction criterion (6), the resulting system of equations is of size 74 × 74. With kPOD, it is necessary to solve, for each time step, as many systems as determined by the optimal path algorithm, described in Section 4.3. Thus the system for each iteration in algorithm 2 reads with ak ×k system matrix,Ũ andk chosen as described in Section 4.1. The dimensionk depends on the particular neighborhood of snaphots under study, also taken with a tolerance ε = 10 −8 . Here, the algorithm at time step n + 1 is initialized atz 0 = F (u n ). The choice of an appropriate kernel function is relevant to guarantee the efficiency and the accuracy of the method. Often, the kernel is selected agnostic to the problem, based on the Euclidian distance of the snapshots in the full-order space IR d . Here, the kernel is selected such that the distance between samples corresponds to physical considerations, taking advantage of the a priori knowledge on the nature of the solutions. The expected evolution of the solutions is depicted in Figure 4: the initial profile propagates to the right with time, due to advection, and decreases in heigh and increases in width, keeping the area constant, due to diffusion. Thus, the solution u(x; t, v) is mainly characterized by the position of its centroid. In other words, the horizontal and vertical coordinates of the centroid are describing the progress of the initial configuration, accounting for both advection (horizontal position) and diffusion (vertical Thus, a sensible choice for the kernel in this problem results from replacing in a classical Gaussian kernel the Euclidean distance of the solutions by the centroid distance, namely u, v being two solutions (snapshots) and β a constant, here is set to 10 −4 . The value of β is chosen after numerical experimentation, trying to accumulate a significant amount of information for k = 1. In fact, with the proposed kernel the first principal component collects 99.73% of the acumulated σ. Therefore it is sufficient to take the reduced space of dimension k = 1. Note that in this 1D case, exploring the z space consists in taking sets of 3 neighbouring snapshots for the first level of connectivity, 5 snapshots for the second level, etc. Figure 5 shows the (z 1 , z 2 ) components of the snapshots both in the PCA and the kPCA reduced spaces. It is clear how kPCA (with the proposed kernel) is able to identify the instrinsic dimension k = 1, being all the values of z 2 in a very narrow band. On the contrary, with PCA the range of the second component z 2 is similar to the range of z 1 (associated with only 8.90% of the acumulated σ).
The accuracy of two formulations to obtain new solutions is analysed next. We use the two methods to integrate in time for an advection velocity v = 1.5, which is not in the training set, taking increments in the temporal scheme of ∆t = 0.005 until a final time t = 1. The solutions at some intermediate time steps are shown in Figure 6.  The POD solutions unphysically oscillate for this particular problem. Even though the obtained profile manages to follow the peak of the reference solution as time evolves, the relative errors with respect to the full-order solution are over 10 −1 , see Table 1. With kPOD, on the other hand, solutions do not oscillate and the relative errors decrease significantly. As listed in Table 2, errors for kPOD are of the order of 10 −4 . Also, notice that the average number of steps in the optimal path is below 4, with an averagek around 7 during the whole process.
Higher accuracy is obtained with kPOD if increasing the levels of connectivity, accounting for more snapshots in the approximation of the solution. Table  3 shows the errors at time t = 1 for levels of connectivity 1, 2 and 3 (corresponding to levels 2, 3 and 4, respectively, in the final solution of Algorithm 2). The results suggest that, as expected,k increases with the level of connectivity,  Table 3 1D problem. kPOD: relative errors, average number of steps in the optimal path and averagek at time t = 1.00, for different levels of connectivity.
while the number of iterations in the optimal path tends to decrease.

Steady advection-diffusion problem in 2D
The   Figure 7 shows some representative solutions when varying the two parameters, α and µ. The pollution propagates across the domain accordingly to µ and v(α), therefore it seems logical to think that solutions lie in a two-dimensional manifold. We can distinguish three different regimes depending on whether the wake of pollution is above, crossing or below the centered island in the domain. Crossing solutions are identified as those with u > 0.1 at some node on ∂B 0.3 (0, 0). The lack of similarity between solutions in different regimes hints the need of a proper sampling to obtain accurate results in all of them when using a reduced-order formulation.
We first consider a training set with 60 snapshots corresponding to randomly distributed values for (µ, α) ∈ [−0.8, 0.8] × [10, 80], see Figure 8. Note that most of the samples correspond to propagating paths below the island, while only one of them propagates above it.
With a tolerance ε = 10 −8 in criterion (6), the PCA reduced space is of dimension k = 58. As in the previous example, PCA fails to identify the intrinsic dimension of the manifold, capturing 14.15% of the acumulated σ in the training set for k = 2. For kPOD, we define the kernel function where C Γ D and C out denote the centroid of the solution on Γ D and on the outlet ∂Ω ∩ ({y = −1} ∪ {x = 1}), respectively, as defined in (38). Taking β = 10 −3 , the kernel explains 76.01% of the acumulated σ for k = 1, and 99.96% for k = 2. We reduce to the intrinsic dimension k = 2, and the optimal path search is performed in the Voronoi tessellation in Figure 9. As an intial approximation for the algorithm, we takez 0 = F (u P OD ) (the image through the forward mapping of the POD solution).
In order to analyse the limit accuracy of kPOD with a quadratic approximation, we also consider the case in which all the snapshots in the domain are identified as neighbors. This is equivalent to POD with a quadratic approximation instead of the usual linear approach. The solution from quadratic POD Parameters Relative error µ = 0, α = 50 4.00 · 10 −1 µ = 0.5, α = 30 4.21 · 10 −1 µ = 0.7, α = 20 1.04  is expected to overcome the other ones in accuracy, since all the snapshots are involved. With the same tolerance ε = 10 −8 , the resulting systems of equations have dimensionk = 1216. Next, POD, kPOD and quadratic POD are used to approximate the solutions for three new points in the parameteric space, see Figure 8. We take (µ, α) = (0, 50), (0.5, 30) and (0.7, 20), belonging to the below, crossing and above regimes, respectively. Figure 10 shows the reduced-order solutions.
POD leads to results with oscillations and poor accuracy in the three cases. As listed in Table 4, all relative errors with respect to the FE reference solution are over 0.40, with a relative error over 1 for the solution corresponding to the above regime.
The accuracy in kPOD strongly depends on the number of snapshots in each one of the regimes, see Table 5. In the most-populated regime, the reduced

Parameters
Relative error µ = 0, α = 50 7.87 · 10 −6 µ = 0.5, α = 30 1.91 · 10 −2 µ = 0.7, α = 20 4.31 · 10 −1  solution for (µ, α) = (0, 50) has a relative error of 7.97 · 10 −4 if taking 1/2connectivity levels to explore the reduced space, and the error decreases to 3.92 · 10 −5 if 2/3-connectivities are considered. Also, note that the optimal path only takes two iterations to converge. For the solution in the crossing regime, the kPOD solution slightly oscillates, and oscillations are more evident for the new solution above the island, where the sampling is extremely poor, with only one snapshot. However, results are more accurate than those from POD in all cases.
Results with quadratic POD, in Table 6, set the limit accuracy that we can reach with kPOD and a quadratic approximation for the given training set. Errors decrease and oscillations disappear (for the crossing-regime case) or become more subtle (for the above-regime one). Quadratic POD is an appealing option for its simple implementation and accuracy, but it is limited to datasets with a small number of snapshots. For a larger training set, for instance with 200 snapshots, we may have memory limitations when performing the SVD of matrix B in (31).
Finally, we repeat the POD and kPOD computations when enriching the training set up to 200 snapshots, see Figure 8. Again for ε = 10 −8 , the reduced dimension of PCA is k = 198. The kernel function (41) enables kPCA to capture 99.97% of the acumulated σ for k = 2. Figure 11 shows the POD and kPOD solutions for the same three cases as before. With POD, the errors decrease with the more populated training set, see Table 7, but we still observe oscillations in the tests on the crossing and above regimes. kPOD leads to stable solutions in all cases. The errors decrease significantly, specially for the 2/3-connectivities, see Table 8.

Concluding remarks
The numerical strategy presented in this paper, the kPOD, aims at alleviating the computational complexity of the parametric model by invoking nonlinear dimensionality-reduction techniques. More precisely, the kPCA is selected as an alternative to the linear PCA used in POD.
The kPCA backward mapping (from the reduced space to the input space) is determined locally, using the snapshots surrounding the point to be mapped. This is inducing a local character of the kPOD algorithm, and has suggested devising an iterative exploration of the reduced space, based on a Delaunay tessellation of the cloud of snapshots (in the low-dimensional space).
Another novel idea introduced here is to increase the basis of snapshots in the input space by including the cross-products of the snapshots, that is, the quadratic terms. This is introducing new features of the solutions that contribute to a better resolution in the kPOD. In practice, this is only affordable in a kPOD setting where the solutions are computed locally (with a reduced number of neighbouring snapshots). The number of resulting quadratic terms arising from a global representative family of snapshots is too large and compromises the use of the standard POD.   Moreover, the selection of the kernel to define the kPCA reduction is carried out using considerations based on the nature of the problem. The kernel is taken such that it extracts features of the solution that discriminate the difference in the associated parameters (e.g. the location of the barycenter of the distribution as an indicator of the time and the diffusion).
The combination of the different novel elements presented in the paper is increasing the portfolio of numerical tools to devise reduced-order strategies dealing with complex problems. Taking advantage of the nonlinear manifold allows increasing the accuracy of the solution with respect to the standard POD, noting that this methodology does not necessarily reduce the computational cost.
The performance of the presented strategies in two parametric advectiondiffusion problems (one transient, one steady-state) shows a considerable improvement with respect to the standard POD.