A hybrid finite element/neural network solver and its application to the Poisson problem

We analyze a hybrid method that enriches coarse grid finite element solutions with fine scale fluctuations obtained from a neural network. The idea stems from the Deep Neural Network Multigrid Solver (DNN-MG), (Margenberg et al., J Comput Phys 460:110983, 2022; A neural network multigrid solver for the Navier-Stokes equations) which embeds a neural network into a multigrid hierarchy by solving coarse grid levels directly and predicting the corrections on fine grid levels locally (e.g. on small patches that consist of several cells) by a neural network. Such local designs are quite appealing, as they allow a very good generalizability. In this work, we formalize the method and describe main components of the a-priori error analysis. Moreover, we numerically investigate how the size of training set affects the solution quality.


Introduction
Recent advancements in employing neural networks to approximate solutions to partial differential equations (PDEs) mostly focus on Physics Inspired Neural Networks (PINNs) [2] such as the Deep Ritz method [3]. They leverage the expressive power of neural networks while incorporating physical principles and promise substantial efficiency increase for high dimensional or parameter dependent partial differential equations. One main drawback of PINNs is that they need to re-train when the problem parameters change. Also, for classical problems, such as three dimensional fluid dynamics problems, highly sophisticated and well established discretization methods regarding the efficiency and accuracy are available that beat neural network approaches by far.
The method of this paper was introduced as main component the of DNN-MG [1] for the instationary Navier-Stokes equations. At each time step, a coarse solution is obtained by a classical finite element solver and corrections to finer grids are predicted locally via neural networks. Here, we focus on a simpler linear problem and aim to understand the mechanism of such a hybrid approaches by discussing its a-priori errors and via numerical experiments.
Let Ω ⊂ R d , d ∈ {2, 3} be a domain with polygonal boundary. We are interested in the weak solution of the Poisson's equation with a given force term f ∈ H −1 (Ω).
Moreover, let V (r) h be the space of piecewise polynomials of degree r ≥ 1 satisfying the homogeneous Dirichlet condition on the boundary ∂Ω, i.e. where P (r) (T ) is the space of polynomials of degree r on a cell T ∈ T h . We assume that there is a hierarchy of meshes where we denote by T l−1 T l , that each element of the fine mesh T ∈ T l originates from the uniform refinement of a coarse element T ∈ T l−1 , for instance, uniform splitting of a quadrilateral or triangular element into four and of a hexahedral or tetrahedral element into eight smaller ones, respectively. Accordingly we have the nesting V h is the space defined on the mesh level l. With a patch P ∈ T h (Ω) we refer to a polyhedral subdomain of Ω, but for simplicity we assume that each patch corresponds to a cell of T H (Ω). By V h (P) we denote the local finite element subspace and R P : V h → V P denotes the restriction to the local patch space, defined via The prolongation P P : V h (P) → V h is defined by n ∈ N being the number of patches containing the node x 0 otherwise.
The classical continuous Galerkin finite element solution of the problem (1) with the L 2 inner product (·, ·). We are interested in the scenario where one prefers not to solve (3) on the finest level V h due to lacking hardware resources or too long computational times, but in V H with H h. This is the so-called coarse solution u H ∈ V H and fulfills (∇u H , ∇φ) = (f, φ) ∀φ ∈ V H . The key idea of our method is to obtain the fine mesh fluctuations u h − u H in forms of neural network updates w N corresponding to the inputs u H and f . Hence, the neural network updated solution has the form u N := u H + w N in the case where the network operates globally on the whole domain. A more appealing setting is where these updates are obtained locally, such that the network is acting on the data not on the whole domain at once, but on small patches P ∈ T h (Ω). In this case, while the training is performed in a global manner, the updates are patch-wise and the network updated solution has the form u N := u H + P P P w P N .
2 Hybrid finite element neural network discretization

Neural network
In this section we introduce the neural network we use and formalize the definition of finite element/neural network solution. Definition 1 (Multilayer perceptron). Let L ∈ N be the number of layers and let N i be the number of neurons on layer i ∈ {1, . . . , L}. Each layer i ∈ {1, . . . , L − 1} is associated with a nonlinear function l i (x) : and an activation function σ : where W i ∈ R Ni−1×Ni denote the weights, and b i ∈ R Ni the biases.

Hybrid solution
On a patch P, the network receives a tuple (R P u H , R P f ), restrictions of the coarse solution R P u H and of the source term R P f and it returns an approximation to the fine-scale update In order to obtain a globally continuous function, the prolongation (2) is employed. Definition 2 (Hybrid solution). The hybrid solution is defined as For simplicity we will mostly use the notation in place of (5).
being the basis of the fine finite element space V h and U Hh being the coefficient vector of interpolation of u H into V h . As we update the coarse solution u H on fine mesh nodes, this procedure can be considered as a simple update of coefficients U i Hh , i.e.
or simply U N := U Hh + W N being the coefficient vector of u N .

Training
The neural network is trained using fine finite element solutions obtained on the mesh T h (Ω) and with the loss function where N T is the size of training set and N P is the number of patches. Here, w fi N stands for the finite element function defined by the network update N (f i ) on the patch P. The training set F = {f 1 , . . . , f Ntr } consists of N tr ∈ N source terms f i together with corresponding coarse and fine mesh finite element solutions u fi H and u fi h , respectively.

On the a-priori error analysis
The difference between the exact solution u ∈ H 1 0 (Ω) of (1) and the hybrid solution u N from (5) can be split as u fi h , u fi N ∈ V h being the finite element solution and the neural network updated solution corresponding to the source term f i , respectively. Let us discuss individual terms in (7).
• u − u h is the fine mesh finite element error. Estimates of this error are well-known in the literature and are of O(h r ) in the H 1 semi-norm.
• (u h − u fi h ) is a data approximation error and in the H 1 semi-norm it can be bounded by f − f i −1 due to through stability of the finite element method.
is a network approximation error and is introduced by the approximation properties of the network architecture. This is bounded by the tolerance which depends on the accuracy the minimization problem (6).
consists of a generalization error of the network and a further error term depending on the richness of the data set. While the term u fi H − u H can be handled via the stability of the finite element method, the remaining term requires a stability estimate of the neural network.
Overall, an estimate of can be obtained for sufficiently smooth source term f and domain Ω. Improvements of this estimate with the consideration of patch-wise updates is part of an ongoing work.

Stability of the neural network
The network dependent term of (8) is linked with the stability of the network. For a study of the importance of Lipschitz regularity in the generalization bounds we refer to [4]. Lemma 1. Let N be a multilayer perceptron (Def. 1) and σ : R → R satisfy |σ(y) − σ(y i )| ≤ c 0 |y − y i | with c 0 > 0. Then, on each patch P for the inputs y and y i and the corresponding FE functions N (f ) and N (f i ) (uniquely defined by the network updates) holds Proof. The definition of the network gives where z i = l i • · · · • l 1 and l i are as defined in (4). By using the definition of z j and the Lipschitz constant of σ(·) we obtain for an arbitrary layer j Then, by applying (11) recursively from the second to the last layer we obtain Hence, by applying it to (10) and using the inequality we arrive at the claim.

Corollary 1. Lemma 1 leads to
with the constant c 1 = c · c N L 0 · c W arising from Lemma above and c inv and c Ω arising from inverse and Poincaré estimates, respectively.
Proof. The definition of inputs together with the triangle inequality and the inequality In the whole domain this, with Poincareś inequality, leads to The stability of the coarse discrete solution and the inverse estimate shows the claim.

Remark 1.
A different network architecture may include several layers that perform convolutions. This kind of networks are called convolutional neural networks. In the two-dimensional setting, this would correspond to replacing l i of Definition 1 with a nonlinear function l c i : where * is the matrix convolution operator. While N * i stands for the dimension of the kernel W i of the corresponding convolution, we assume N i = N c i · N c i and N i+1 = N c i+1 · N c i+1 . The embedding into the multilayer perceptron is usually performed with the use of reshape N (R N 2 → R N ×N ) and flatten N (R N ×N → R N 2 ) operators so that the dimensions of convolutional layer matches with the dense layer. In a scenario where a dense layer j of MLP is replaced with a convolutional layer, equation (11) must be modified as Hence, for a neural network with an index set of dense layers S d and convolutional layers S c the result (9) holds with the modified constant by taking into account, that reshape(·) F = · 2 and flatten(·) 2 = · F .

Numerical experiments
We consider the two-dimensional Poisson equation on the unit square Ω = (0, 1) 2 with homogeneous Dirichlet boundary conditions. The training data is picked randomly from the set of source terms together with the corresponding fine and coarse finite element solutions u H and u h , respectively. We employ a multilayer perceptron as described in Definition 1 with 4 hidden layers, each with 512 neurons and σ(·) = tanh(·) as an activation function. We train it using the Adam optimizer [5] and loss function L from 6. Figure 1 shows the mean error of the proposed method w.r.t. a reference one, which is one level finer that the target one.
Here we consider the error on training and testing datasets of different sizes. We also consider different refinement levels, i.e. h = H/2, H/4 and H/8. The x-axis corresponds to the fine step size h and the y-axis to the mean error. Here, the two topmost lines (blue) show the error of the coarse solution, which is used as an input to the neural network. The two bottom-most lines (green) show the error of the fine solution, used for the computation of the loss. The rest of the lines depict the errors of the proposed method for training data of different size. Here we observe that given enough data, one is able to get arbitrarily close to the fine solutions used for training. Figure 2 shows an example of how the loss function behaves during the training. Here we have trained a network for 400 epochs and have used learning rate decay with a factor of 0.5 every 100 epochs. Due to this one can observe significant drops in the value of loss function at 100, 200 and 300 epochs. Figure 3 shows an example of coarse, fine and network solution for a particular right hand side from the test data.
Here we observe, that the quality of the network solution is significantly better than the quality of the original coarse solution.
Coarse Solution

Fine Solution
Hybrid NN Solution