Data‐driven diffusion with uncertainties

In a data‐driven finite element analysis the experimental data are directly employed as an input for computational analysis, thus evading any state of matter modeling. The essential physical principles, such as balance laws and continuity, remain unchanged, as do all the numerical schemes used in their discretization. In addition, uncertainties and fluctuations of the experimentally measured data directly enter the simulation.


Governing equations
The diffusion of particles fulfills the continuity equatioṅ for a particle source f (u) and a flux vector j. In other words, Eq. (1) follows from the universal principle of mass balance (or mass conservation if we disregard the source term f (u) for a moment) and is, therefore, model-free. The flux, however, is usually derived from the assumption of a uniform and spatially homogeneous material. With a diffusion coefficient D it is usually modeled as j = −D∇u and is, at best, empirical. In the proposed data-driven approach, the flux modeling will be replaced by a time-dependent data set D t , which is composed of n measured values for the spatial flux vector j and the descent of particle concentration i = −∇u Spacial heterogeneity can be accounted for by multiple data sets assigned to different locations. In total, the resulting datadriven problem consists of the minimization of a distance function to the current data set subject to continuity and kinematic constraints. Specifically we illustrate the character of the data-driven boundary-value problem by recourse to the stationary solution of Eq. (1). It leads to a classical Poisson problem for u(x, t),  where the first equation describes equilibrium (conservation of linear momentum), the second equation plays the role of a constitutive equation and the last relation is the kinematic constraint. Figure 1 (left) illustrates that the corresponding Poisson problem has a solution at the intercept of the two i − j-curves. In the data-driven problem, however, the constitutive relation is replaced by the data set D t and we cannot presume that we can find an intercept with the equilibrium equation (3) 1 . The aim is therefore to find the minimal distance between the physical equilibrium E, including all constraints of the system, and the constitutive data set D, where we understand the data points which do so as the solution of the data-driven problem, (i * , j * ). The norm · D is defined as Norm (5) weights the different magnitudes (and units) of i and j by means of a numerical diffusion tensor D = DI. We remark that this tensor can be chosen arbitrarily; a non-isotropic form is also possible. Correspondingly, we define a global penalty function for the distance between numerical values (i, j) and data set test values (i , j ) ∈ D, Functional (6)

Finite element discretization
The diffusion equation is solved in weak form and integrated in time using a backward Euler method. Then, at a time step t k ∈ {t 0 = 0, t 1 , t 2 , . . . , t k−1 + ∆t = t k , . . . } the constrained penalty functional (6) has the form We remark that the kinematic constraint (3) 3 is implicitly fulfilled by a conform finite element discretization and so only condition (3) 1 needs to be enforced by Lagrangian multipliers λ. For spacial discretization a standard finite element method is used. Taking the variations and with the usual calculus we arrive at the finite element system where N is the matrix of shape functions, B the matrix of their spacial derivatives, f is the source vector,j are prescribed boundary fluxes, and u, λ are the vectors of unknowns.
To solve for the optimal data points (i * k , j * k ) an iterative algorithm is used which starts with a random initialization or the value of the last time step (i * k−1 , j * k−1 ). The finite element equations (8) need to be solved then. Afterwards new data points (i k , j k ) can be assigned which are used in the next iterative step of the algorithm. The algorithm stops if the change in the global penalty function is smaller than a certain threshold, i. e. when (i * k , j * k ) is reached. More details can be found in [3,7].

Numerical examples
Here we perform numerical simulations on a two-dimensional domain Ω ∈ [0, 100] 2 . The finite element mesh is regular and consists of 2 · 25 2 linear triangular elements (P1 elements). The boundaries are free, i.e. the fluxj is zero. In all simulations the initial state is a stamp-like cluster of concentration in the center of the domain, see Fig. 2

(left).
Prerequisite to a data-driven simulation are data sets, Eq. (4). Here the data are artificially sampled and dimensionless. The descent data space covers a square equidistantly, i ∈ [0, 10] 2 . The corresponding flux values are generated using artificial diffusion tensors D art . One data set then comprises the data points D = {(i, j) l } n l=1 with n = 100 2 entries. A first data set D 1 is based on a simple diagonal diffusion tensor and recreates isotropic diffusion with D 0 = 0.04. To represent real-life data, we add for the second data set D 2 an uncertainty term. The artificial diffusion tensor D art 2 is afflicted by two stochastic terms N and U D art where both stochastic variables are 2 × 2 matrices with N 1 , N 2 ∼ N 0,0.1 and U 1 ∼ U(0, 0.016). In this way the diffusion is still almost isotropic but, as it is observed in common experiments, the resulting flux data scatter in a normally distributed manner. Also, by U there is put a certain weight onto the off-diagonal element which results in a small bias of the flow. In our first simulation with data set D 1 we show that the data-driven solution approximates a classic diffusion problem. Using 100 data points per dimension we end up computing results which are close to the expected solution. In Fig. 2 (right) a profile through the center is shown for different time steps. Due to the sampling with an equidistant grid there is symmetry of the profile.
www.gamm-proceedings.com In a second simulation with data set D 2 we observe the effect of the stochastic terms. A snapshot after 1000 time steps of evolution is shown in the left of Fig. 3. On the right of Fig. 3 the concentration profiles along the domain's diagonals are displayed and we can see that the concentration profile of the lower-left to the upper-right corner is wider spread than the one from top-left to bottom-right. This variation is caused by the direction-dependent data.
Since we can not make any statement regarding uncertainties from only one simulation, we consider now the scenario of multiple measurements. Let us assume that several data sets have been measured which all have the deviations as modeled by Eq. (10). This results here in 200 different data sets of type D 2 which are now used for the data-driven finite element analysis. We use one of those D 2 data sets for the complete domain Ω but remark that it would be also possible to use different sets for subdomains. Such assumptions, however, would need a physical justification. Here we assume uncertain data of type (10) for the full domain and show the resulting histograms of concentration value distribution for three different points in Fig. 4.

Conclusion
Concluding we state that the data-driven finite element analysis is a promising approach to a model-free simulation. It allows to incorporate experimental data directly in the simulation, including the uncertainties of the underlying state space. Multiple measurements with the corresponding data sets employed in multiple data-driven simulations allow to derive probalistic densities of important simulation response variables.