Error estimation for second‐order partial differential equations in nonvariational form

Second‐order partial differential equations (PDEs) in nondivergence form are considered. Equations of this kind typically arise as subproblems for the solution of Hamilton–Jacobi–Bellman equations in the context of stochastic optimal control, or as the linearization of fully nonlinear second‐order PDEs. The nondivergence form in these problems is natural. If the coefficients of the diffusion matrix are not differentiable, the problem cannot be transformed into the more convenient variational form. We investigate tailored nonconforming finite element approximations of second‐order PDEs in nondivergence form, utilizing finite‐element Hessian recovery strategies to approximate second derivatives in the equation. We study both approximations with continuous and discontinuous trial functions. Of particular interest are a priori and a posteriori error estimates as well as adaptive finite element methods. In numerical experiments our method is compared with other approaches known from the literature.


INTRODUCTION
Many boundary value problems feature linear, second-order partial differential equations (PDEs) in divergence form. That is, the differential operator may be written as u ≔ div(Ã∇u) +b ⊤ ∇u +cu (1) with coefficientsÃ ∶ Ω → R × ,b ∶ Ω → R ,c ∶ Ω → R. Here and in the following Ω ⊂ R d , d ∈ N, is a bounded domain. Although formulation (1) covers a wide range of applications, there are some linear problems involving operators in nondivergence form Here, A : B denotes the Frobenius inner product ∑ i,j=1 a ij b ij of two matrices A, B ∈ R d × d , and A : Ω → R d × d , b : Ω → R d and c : Ω → R are given coefficients. The matrix A is assumed to be almost everywhere positive definite and symmetric.
Classical and strong solutions of problems in nondivergence form with Hölder-regular or continuous coefficients, respectively, have been analyzed in [1,Chapter 6,9]. In the applications of interest here, however, coefficients are only bounded and measurable. Under even higher smoothness assumptions on the coefficient A, a nondivergence form operator (2) can be transformed into an operator in divergence form (1) withÃ = A andb = b − Div A, where Div A denotes the row-wise divergence of the matrix A. Even if A is smooth, however, this transformation may lead to convection dominated problems which induce further challenges.
Our aim in this paper is to investigate the boundary value problem u = f in Ω, for some source term f ∈ L 2 (Ω). Let us briefly mention some applications where problems of this kind are of interest. Naturally, linear problems with operators in nondivergence form arise in the context of stochastic differential equations due to the Itô formula, see [2][3][4][5]. Such problems play a central role in financial mathematics, for example, the valuation of financial products. A closely connected area is the numerical solution of second-order Hamilton-Jacobi-Bellman (HJB) equations [6,7], where the existence of an operator in nondivergence form also follows due to the stochastic influence. In addition to the nonvariational nature of the linear operator, these problems possess further numerical challenges due to nonlinearities introduced by a pointwise minimization. A further application is the solution of highly nonlinear second-order PDEs. A linearization used, for example, in a Newton method, leads to a problem of the form (3) in the general case. Typical examples include the Monge-Ampère equation [8][9][10][11][12][13][14] which reads det(∇ 2 u) = ( xx u) ( yy u) − ( xy u) 2 = f in case of d = 2. The linearization at a function u 0 leads to a differential operator of the form (2) with .
For the solution of problems in the form (3) several different approaches avoiding the transformation into a divergence form PDE have recently been studied. Many approaches aim at approximating strong solutions, that is, A discrete approximation of the solution u ∈ H 2 (Ω) ∩ H 1 0 (Ω) of (4) in the case  = A : ∇ 2 is usually obtained by solving a problem of the form where  h,0 ≔  h ∩ H 1 0 (Ω) is a finite-dimensional trial and test space and H(u h ) is an approximation of the Hessian ∇ 2 u, also sought in a finite-dimensional space  h (R × ) with discretization parameter h > 0. Several approaches have been studied in the literature and most discretization strategies differ in the choice of the discrete spaces  h ,  h , the approximation H of the Hessian and the realization of the test function h ∶  h → L 2 (Ω).
Let us briefly summarize the most prominent approaches. The first article discussing a direct treatment of a nonvariational problem, to the best of the authors' knowledge, is Lakkis and Pryer [15]. Therein,  h and  h consist of continuous Lagrange finite element functions of order p ≥ 1, the choice h = id is used and the finite-element Hessian H(u h ) ∈  h (R × ) is obtained by a discrete version of the integration-by-parts formula, that is, Here, n Γ : Γ → R d denotes the outer normal vector on Γ ≔ Ω. A closely related approach using a discontinuous Galerkin approximation for the finite-element Hessian H(u h ) is studied by Neilan [16].
There are other approaches that avoid the coupling with an additional variational formulation used for the computation of a Hessian approximation. This is possible when using the cell-wise exact Hessian H ≔ ∇ 2 h but additional jump penalty terms over the interior cell edges/faces have to be added to the bilinear form. This idea is first studied by Smears and Süli [17], under the weak assumption that A belongs to L ∞ (Ω; R d × d ) and fulfills a so-called Cordes condition. In that work, discontinuous Galerkin approximations and the choice h (v h ) = Δ h v h are used and appropriate jump penalty terms are added to the bilinear form so that discrete coercivity is guaranteed. Quite similar is the approach of Neilan, Salgado, and Zhang [18] who use continuous Lagrange elements. In both approaches the coercivity is shown via a discrete Miranda-Talenti estimate. In a related line of research, Feng, Hennings, and Neilan [19] employ continuous Lagrange finite elements using the choice h =. They show well-posedness of the discrete scheme ensuing via a discrete inf-sup condition. For this approach, at least continuity of the coefficients of A has to be assumed as a localization argument by freezing the coefficients of A is applied in the proofs. Analogous results are presented in Feng, Neilan, and Schnake [20] for a discontinuous Galerkin approximation. Finally, an extension of the technique of Smears and Süli [17] to curved domains can be found in [21].
Before continuing, it is worth pointing out that the respective discrete linear systems and the techniques employed to prove their well-posedness differ in the references above and have far-reaching implications for computational practice. In particular, [19,20] rely on a discrete Calderon-Zygmund estimate and therefore on a continuity assumption for the leading coefficients of the differential operator, as well as sufficiently fine meshes. Unfortunately, the former is typically not satisfied for HJB equations, which we have in mind as future applications. Moreover, the requirement of sufficiently fine initial meshes obstructs the utility of an adaptive mesh refinement strategy, which we develop here. Such limitations are not present in discretization approaches relying on the Cordes condition, including [17,18,21] and the present work.
A further method, which is proposed by Gallistl [22], is based on a stabilized mixed finite element discretization involving an approximation of the gradient This is, to the best of our knowledge, the first contribution proving also a posteriori error estimates and the convergence of an adaptive finite element method for the solution of nondivergence form PDEs.
In order to complete our survey, we want to mention that there are many further approaches that do not directly fit into the framework (5). This includes for instance regularization approaches like the vanishing moment method studied in [23] and the references therein, the primal-dual weak Galerkin method [24], the Alexandroff-Bakelman-Pucci (ABP) method [25], or certain finite element schemes based on a very weak formulation of the model problem [26].
In the present paper we discuss a new method combining multiple ideas of the previously outlined approaches. To be more precise, we consider the discrete formulation (5) with a finite-element Hessian obtained either by continuous finite elements as in (6) or by a discontinuous ansatz that we specify later. For the test functions we The main results of this article include a rigorous proof of the well-posedness of the discrete scheme, which is also based on a discrete Miranda-Talenti estimate following from a Cordes condition. Moreover, preconditioning strategies for the resulting system of linear equations are studied and we observe in experiments that the preconditioner is robust with respect to the mesh parameter. Furthermore, we study a priori and reliable a posteriori error estimates in the energy norm. Based on the a posteriori error estimates we implement an adaptive finite element method and confirm by experiments that the convergence rate is optimal in all test cases, even for less smooth solutions.
Our method combines several advantages of the previously mentioned approaches. First, it is applicable to problems with discontinuous coefficients A and hence allows an extension to HJB equations. Among the approaches presented in our survey, only [17,18] possess this property as well. Second, under additional assumptions, our discretization can be realized without the addition of stabilization terms, which would involve jump penalties at the cell interfaces. In numerical experiments we observed that all approaches which do use stabilization terms do not converge with an optimal rate in the L 2 (Ω)-norm. This surprising observation deserves further investigation. In addition to our approach, only the methods from [16,27] likewise exhibit optimal L 2 (Ω) rates. It should be noted that the computational cost for the approaches using a Hessian recovery strategy, including the proposed scheme, is naturally higher than the cost for schemes relying on the broken exact Hessian. However, the advantages mentioned above may justify this additional effort.

THE CONTINUOUS PROBLEM
Throughout this article Ω ⊂ R d , d ∈ N, is a bounded and convex domain. We consider the boundary value problem with a second-order differential operator in nondivergence form with f ∈ L 2 (Ω). The coefficient matrix A is assumed to belong to L ∞ (Ω; R d × d ), to be symmetric and uniformly positive definite, that is, there exists a constant E > 0 such that almost everywhere in Ω.
As the coefficient matrix A is not necessarily differentiable, one can at most ask for strong solutions of (7), that is, Since the Laplacian Δ : X → L 2 (Ω) is bijective due to the convexity of Ω ⊂ R d , the latter equation is equivalent to Existence of strong solutions follow for instance under the slightly stronger assumption A ∈ C(Ω; R × ) and when Γ ≔ Ω is of class C 1, 1 , even for nonconvex domains, see [1,Theorem 9.15].
Another idea, which implies well-posedness even for general convex domains and which allows for discontinuous coefficients, is to impose a Cordes condition, that is, the existence of a constant ∈ (0, 1] such that In the two-dimensional case, this assumption follows from (8). As has been discussed in the recent literature, for example [17], a rescaling of equation (10) with the normalization coefficient becomes advantageous in the analysis of the problem. This can be explained with the following result, whose proof is stated in [17,Lemma∼1].

Lemma 1
Assume that A belongs to L ∞ (Ω; R d × d ) and satisfies (11). Then the inequality

holds.
Obviously, (11) guarantees that the rescaled matrix A is close to the identity matrix, and consequently, the differential operator A : ∇ 2 is close to the elliptic Laplace operator. Thus, if the Cordes condition is fulfilled one can consider instead of (10) a variational problem with the bilinear form a : X × X → R defined by and the linear form F ∈ X ′ defined by The variational problem we are going to study in this article is defined by Under the assumption (11) the bilinear form a is elliptic in X = H 2 (Ω) ∩ H 1 0 (Ω) and with the Lax-Milgram Lemma one can immediately prove the following result.

Lemma 2
Assume that the coefficient matrix A belongs to L ∞ (Ω; R d × d ) and fulfills the Cordes condition (11) with ∈ (0, 1]. Then, the problem (12) possesses a unique solution u ∈ H 2 (Ω) ∩ H 1 0 (Ω). Moreover, the a priori estimate Proof. See [17,Theorem 3]. ▪ Notice that we restrict our discussion to problem (7) mainly for notational simplicity. For related investigations of equations involving also drift and potential terms, that is, the differential operator of the PDE is of the form (2) with b ≢ 0 and/or c ≢ 0, we refer the reader, for example, to [6].

DISCRETIZATION
We decompose our domain Ω into a family of feasible triangulations  h (triangular for d = 2, tetrahedral for d = 3) with discretization parameter h = max T∈ h h T , h T ≔ diam(T). The diameter of the largest inscribed ball in a cell T ∈  h is denoted by T . Throughout this article we assume that { h } h>0 is shape-regular, that is, there holds where the maximal aspect ratio is independent of h. Moreover, meshes are considered which have a limited variation in the element size of neighboring elements, that is, there is a constant Cv > 0 with By  h we denote the set of facets of  h and by n F a unit normal vector on F ∈  h . The normal vectors n F are chosen to point outwards if F is a boundary facet and it has arbitrary but fixed orientation for interior facets. The diameter of a facet F ∈  h is denoted by h F . Moreover, we denote the set of facets in the interior by   h . This includes all facets in the intersection of two elements in  h . Entities on either side of an interior facet are denoted by ⋅ + and ⋅ − , respectively, chosen in such a way that for F = T + ∩ T − , n F points towards T + .
By  p (T), T ∈  h , we denote the set of polynomials on T of degree not larger than p ∈ N. Throughout this article generic constants are denoted by c v 1 ,v 2 , … where v 1 , v 2 , … , are the quantities they depend on.
Furthermore, we introduce the following average and jump operators. The average operators are defined by In a similar way, we define the jump operators for matrix-valued functions u ∈ H 1 ( h ; R × ) and for vector-valued functions v ∈ H 1 ( h ; R ) by with n + and n − the outward unit normal vectors on T + and T − . For scalar-valued functions we simply set [[u] We will frequently use inverse inequalities and trace theorems in our analysis. These results are summarized in the following lemma.

Lemma 3 The following inequalities hold:
a. For given 0 ≤ ≤ k ≤ 1 and s, t ∈ [1, ∞] there exists some C tr > 0 depending on k, , s, t, p, d and such that the inequality b. For given s ∈ [1, ∞] there exists some C tr > 0 depending on s, p, d and such that the inequality is fulfilled for all v h ∈  p (T).
Note that we use the same notation for both constants in the previous lemma as they depend on the same quantities.
For our analysis we need the following broken Sobolev spaces Moreover, we introduce a mesh-dependent norm for the space

Approximation of the Hessian
Our discretization approach relies on a finite element approximation of the Hessian of u also referred to as Hessian recovery. For related ideas we refer to [28] and the references therein. In this article we study two different approaches. The first approach uses an approximation with C 0 -conforming finite elements. To illustrate the idea of the construction, consider the integration-by-parts formula for the second derivatives, that is, which is valid for all u ∈ H 2 (Ω) and i, j = 1, … , d. Here, n(x) = (n 1 (x), … , n d (x)) ⊤ denotes the outer unit normal vector on Γ. Alternatively, one can use the more compact equivalent formulation The Hessian approximation is sought in the finite-dimensional space with polynomial degree p ∈ N. To shorten the notation we will omit the superscript (p), except when a different polynomial degree is used. The previous integral identity motivates the following definition.

Definition 1 (Continuous Galerkin Hessian) For each
A further strategy is an approximation by piecewise polynomial but discontinuous functions. To this end, we define the space We obtain a Hessian approximation by discretizing the element-wise integration-by-parts formula . This motivates the following definition: Many results in this article are independent of the choice of the Hessian approximation. In this case we drop the superscript and simply write H and  h which means either H CG and  CG h or H DG and  DG h . We conclude this section with the following approximation result: Moreover, there holds the stability estimate Proof. The desired result follows from the definition (14) and the integration-by-parts formula which yields To show the stability result we exploit the cell-wise integration-by-parts formula, taking into account the equality ∇u ⋅ (w h n) = (∇u ⊗ n) : w h , the Cauchy-Schwarz inequality and the discrete trace Lemma 3. We obtain for each With a simple computation taking into account that u h is continuous at the facets Finally, we test the previous inequality with w h = H CG (u) and divide the left-and right-hand side by ||H CG (u)|| L 2 (Ω) to conclude (16).
With similar arguments one can conclude the same results for the DG Hessian H DG . The proof can be found in [16,Lemma 2.1]. ▪

A finite element scheme
The finite element approximations of our problem (10) are sought in the space of continuous Lagrange finite elements of order p ∈ N, that is, and moreover, we define  h,0 =  h ∩ H 1 0 (Ω) to incorporate essential boundary conditions. The polynomial degree p ≥ 2 is the same as for the space  h . Later, we will see that this choice leads to an optimal balance of the approximation errors for the Hessian ∇ 2 u and the solution u. Motivated by the strong formulation of the continuous problem (10) we test the discrete equations with the finite element Laplacian The bilinear and linear forms we are going to use in the discrete scheme are defined by The discrete problem reads The bilinear form J h ∶  h,0 ×  h,0 → R may contain several stabilization terms in order to guarantee discrete coercivity. The specific form of the stabilization terms will be introduced later.
The nodal basis functions of  h,0 and  h are denoted by To realize our algorithm with H = H CG we first assemble the matrices and load vector In the case H = H DG the matrices C ij have to be modified according to the right-hand side of (15). Moreover, the dimension of the matrices increases as the number of degrees of freedom N W is higher for the function space  DG h . Obviously, equations (14) or (15) with u replaced by u h can be expressed by means of The application of and can be computed by means of The right-hand side of (17) can be evaluated by means of A representation for the left-hand side follows after insertion of (18)-(20) into (17). This yields Consequently, problem (17) is equivalent to Although the system matrix cannot be assembled explicitly, one can compute matrix-vector products, each of which requires the solution of d 2 + 1 linear equation systems for the mass matrix M W . In our numerical experiments we precomputed an LU factorization of M W . Each evaluation of M −1 W then corresponds to an inexpensive forward-backward substitution. The nonsymmetric system (22) can be efficiently solved by a preconditioned algorithm. As a preconditioner we utilize the matrix whereM −1 W is the inverse of the main diagonal of M W andB ij is the main diagonal of B ij . This allows us to assemble P explicitly. EmployingB ij instead of B ij yields a sparser preconditioner and in case of a problem with vanishing off-diagonal entries of A, that is, A ij = 0 for i ≠ j, a symmetric preconditioner P. A direct solver is then used to solve the systems of linear equations involving P. Note that it is not appropriate to use the lumped mass matrix as this might yield a singular matrix whenever the polynomial degree of the space  h is larger than one. The numerical experiments conducted in Section 4 indicate that the preconditioned method for (22) is robust with respect to mesh refinement. For a more sophisticated preconditioning strategy for nondivergence form PDEs we refer to [29], where a domain decomposition preconditioner is studied.
An alternative viable strategy is the solution of a block system equivalent to (22). This becomes particularly useful if, in addition to the solution vector u, one is interested in the finite-element Hessian, for example, for the solution of HJB equations. To this end, we use the substitution from (18) as well as and arrive (in case d = 2) at the equation system (22). The modification for the three-dimensional case is obvious.

Well-posedness of the discrete scheme
The scheme (17) can be interpreted as a nonconforming discretization of the variational problem (12) as the usage of approximate Hessians and Laplacians implies a ≠ a h and F ≠ F h , and there also holds

Lemma 5 The inequality
Proof. To show (24) we merely have to discuss the jump terms in the definition (13).
To this end, we apply the triangle inequality ∑ and a trace theorem on a reference setting ). Using also the assumed shape regularity, which implies The main ingredient for the proof of the existence result for strong solutions (Lemma 2) is a Miranda-Talenti estimate of the form |u| H 2 (Ω) ≤ ||Δu|| L 2 (Ω) which is valid, for example, if the underlying domain Ω is convex. To show well-posedness of our discrete scheme we first have to prove a discrete Miranda-Talenti estimate. A similar result, but for a discretization using the element-wise exact Hessian and Laplacian, is proved in [30,Theorem 1]. We begin with the following auxiliary result.
is the nodal basis of (T,  p , Σ), that is, T, i ( T, j ) = ij holds for all i, j = 1, … , s. The precise definition of our lifting operator is Next, we derive local estimates for the lifting error on a single element T ∈  h . From the definition of E  CG h and the triangle inequality we conclude We distinguish several cases: if x T, i is a Lagrange point in the interior of T or in the interior of a boundary facet F ∈  h with F ⊂ Γ, then  T,i = {T} holds and consequently ∑ If x T, i is located in the interior of an inner facet F ∈   h , there holds  T,i = {T, T ′ } with F = T ∩ T ′ and we obtain together with the inverse inequality from Lemma 3 If x T, i coincides with a vertex of T or, in the case d = 3, is located at an edge of T, we choose a sequence of simplices T = T 1 , T 2 , … , T = T ′ such that T j and T j + 1 , j = 1, … , − 1, share a common facet F j ∈  T, i . With the triangle inequality and similar arguments like in the previous case we deduce We summarize the previous cases and infer Insertion into (26) yields together with the discrete Cauchy-Schwarz inequality are fulfilled.

Proof.
We first introduce a further lifting operator E h ∶  h →  h,conf which maps u h into an H 2 -conforming finite element space  h,conf . In the case d = 2, we will make use of the space generated by the Hsieh-Clough-Tocher (HCT) element [33] or some higher-order analog. The lifting operator E h fulfills the estimate with a constant C E > 0 depending on Cv, C tr , , d and on the geometry of Ω. In particular, if an opening angle at a sharp corner (d = 2) or edge (d = 3) tends to , then C E → ∞.
A proof of (29) in the two-dimensional case can be found in [ [30], where a 3D HCT element for polynomial degrees p ∈ {2, 3} is studied and to [37], where a different function space based on virtual elements of arbitrary order is used.
We setũ h ≔ E h (u h ) and obtain with the triangle inequality It remains to bound the two last terms on the right-hand sides of (30) and (31). From the error estimate (29) we infer In order to prove a bound for the approximations To bound the first term on the right-hand side of (33) in case of H = H CG we test (14) with the function , apply the orthogonality of P  CG h , the definition of H CG , the integration-by-parts formula and Lemma 3 to arrive at Note that we used the relation w h : (n F ⊗ ∇ u h ) = ∇ u h ⋅ (w h n F ) as well as the fact that the jumps in tangential direction [[∇u h ⋅ t F ]] vanish as u h is continuous along the facets F. In the case H = H DG we use similar arguments, in particular the integration-by-parts formula and (15), to obtain Next, we discuss the second term on the right-hand side of (33). In case of H = H CG we obtain an estimate from Lemma 6 and the property ∇ 2 Note that the jump operator for matrix-valued functions involves only jumps in normal direction. In order to confirm the last step in the previous estimate, one just has to take into account that u h is continuous along the element facets so that the tangential components of the jumps vanish. Finally, one observes that the second term on the right-hand side of (33) vanishes in case of H = H DG , that is, which is due to the fact that ∇ 2 h u h ∈  DG h (R × ) holds. The discrete Miranda-Talenti estimates follow after inserting (34) and (36) in case of H = H CG , or (35) and (37) in case of H = H DG , into (33), and combining the resulting estimates with (30) and (31). ▪ The following result is needed in order to treat the fact that we work with different norms for the spaces  h which have to be bounded by each other.

Proof.
We start with the case H = H DG . We apply the definition (15) and the integration-by-parts formula and obtain for arbitrary Testing this equation with w h = H DG Δ (u h ) ∈  DG h yields together with a further application of the integration-by-parts formula Together with the identity With the Cauchy-Schwarz and the Young inequality using also the discrete trace inequality from Lemma 3, we then deduce for arbitrary > 0. We use the choice = C 2 tr ( + 1)∕2 and after rearrangement of the above inequality we arrive at In a similar way we derive the estimate for H Δ (v h ) = H CG Δ . First, we use the definition (6) and the integration-by-parts formula and obtain for each where we exploited that w h is continuous across the element facets. We choose the test function w h = H CG Δ (u h ), use the orthogonality of the L 2 (Ω)-projection P  CG h onto  CG h and get with a further application of the definition (6) and the integration-by-parts formula From this we infer with the properties of P  CG h and similar arguments as in (38)

Furthermore, with the property ||P
Next, we want to mimic the proof of Lemma 2 for the continuous setting in order to show well-posedness of our discrete scheme. However, due to the jump terms on the right-hand sides of the estimates (27) and (28) the proof of the coercivity of the bilinear form a h will fail if is too small, see (11). To this end, stabilization terms in the discrete scheme are needed and we define to be inserted into (17). The penalty parameters 1 , 2 ≥ 0 have to be chosen appropriately to guarantee the coercivity of a h + J h . For the stabilized scheme one can show the following well-posedness result.
are fulfilled, provided that the penalty parameters in J h fulfill the inequalities The constants 0 and 0 depend on , d, p, C tr , Cv and the geometry of Ω, but not on and h. As a consequence, problem (17) possesses a unique solution u h ∈  h,0 for each f ∈ L 2 (Ω).
Proof. First, we show the boundedness of a h + J h . With the Cauchy-Schwarz inequality and (16) we obtain With Lemma 1 and the assumed Cordes condition (11) we moreover conclude To derive a similar estimate for the stabilization term J h we apply the Cauchy-Schwarz inequality on each inner facet F ∈   h , and for the second term in J h we additionally employ the discrete trace theorem from Lemma 3 from which we deduce The inequalities (43) and (45) lead to (40). The coercivity follows from Lemma 7, taking into account the Cordes condition (11) with the estimate from Lemma 1 and Young's inequality with weight > 0, that is, The jump terms in (46) can be canceled by the stabilization terms from (39). We use the choice = (1 − ) −1 and insert the estimate from Lemma 8 to arrive at Taking into account the assumptions (42) we may further estimate The right-hand side forms a norm on H 2 h (Ω) ∩ H 1 0 (Ω) which is equivalent to the norm defined in (13). This is a consequence of a Miranda-Talenti estimate for the broken Hessian, see [30].
The Lax-Milgram lemma finally implies the existence and uniqueness of a discrete solution u h ∈  h,0 of (17). ▪

Remark 1
The assumption (42) can be relaxed such that the choice 1 = 2 = 0 is also feasible. This requires the following modification in the proof of the previous theorem.
As ||H Δ (⋅)|| L 2 (Ω) is also a norm in the finite-dimensional space  h,0 , there exists a constant C H independent of such that the estimate ||H Δ (u h )|| 2 is valid for all u h ∈  h,0 . Using this estimate and (44) we can modify the last step in (47) to arrive at One observes that coercivity of a h + J h can be guaranteed without the presence of the penalty terms, provided that 4 (1 − ) C 1 ≤ C H and 4 C tr (d + 1) (1 − ) C 2 ≤ C H are fulfilled. Note that C 1 , C 2 , C tr and C H are independent of . Thus, these inequalities are valid when is sufficiently close to 1.
In the numerical experiments we observed that neglecting the penalty terms J h is in most situations feasible, but has negative influence on the robustness of preconditioned iterative solvers. However, the experimental convergence rates are better when the penalty terms are omitted, see Section 4.1.

A priori and a posteriori error estimates
This section is devoted to the a priori and a posteriori error analysis of the finite element approximation (17).

Theorem 1 (A priori error estimate). Let Ω ⊂ R d be a bounded and convex domain.
Assume that the penalty parameters 1 , 2 in J h satisfy (42) and that the solution u of (7) belongs to H s (Ω) with The approximate solutions u h ∈  h,0 of (17) with p ≥ 2 fulfill the a priori error estimate

Proof.
We introduce the nodal interpolant I  h (u) as an intermediate function and deduce with Lemma 5 and standard interpolation error estimates Next, we derive an estimate for the norm of the discrete function w h ≔ u h − I  h (u). Therefore, we apply the discrete ellipticity (41), the definition of u h and the strong formulation (9) taking into account H Δ (v h )w h ∈  h ⊂ L 2 (Ω) as well as J h (u, w h ) = 0 which holds under the assumption (48). These arguments imply With the triangle inequality, Lemmas 4 and 5 and standard interpolation error estimates we conclude . Note again that || A|| L ∞ (Ω) ≤ 2 holds due to (11). Analogously, we derive the following estimate for the jump terms in (50) . The latter step follows from the trace theorem ||v|| H 2 (T) ≤ c || v|| H s (T) on the reference elementT and the polynomial approximation results in fractional-order Sobolev spaces from [38]. After insertion of the previous two estimates into (50) we arrive, together with (49), at the assertion.
with c 1 ≔ 4 C 2 a and c 2 ≔ 1 2 (1 + (2 + 16 C 2 a ) C 2 E ), where  T denotes the set of facets of the element T ∈  h . The constants C a and C E are defined in Lemma 2 and (29).
Proof. As in the proof of Lemma 7 we introduce the lifting operator E h which maps functions from  h into the H 2 (Ω)-conforming HCT or virtual finite element space  h,conf . With this operator at hand we introduce a further approximation of the finite element With the triangle inequality, the definition of the norm in H 2 h (Ω) ∩ H 1 0 (Ω) and the fact that the jump terms vanish for u ∈ H 2 (Ω) we may represent the error term under consideration by We start by proving an estimate for the first term on the right-hand side of (52). We define the error functional J ∈ X ′ (recall that and easily confirm This functional forms the right-hand side of a dual equation a( , z) = J( ) ∀ ∈ X and from Lemma 2 we conclude the existence of a unique solution z ∈ X satisfying The definition of the lifting operator E h guarantees u −ũ h ∈ X and thus, The right-hand side of the previous equation is treated as follows. We apply (12), insert the intermediate function A ∶ ∇ 2 h u h , apply the Cauchy-Schwarz inequality as well as (54) to obtain Finally, using || A|| L ∞ (Ω) ≤ 2, insertion of (55) into (52) and applying the estimate (32) for the lifting error terms leads to the desired result. ▪ The error estimate from the previous lemma provides a local a posteriori error estimator, namely and a global estimator which is a reliable bound for the error ||u − u h || H 2 h (Ω) .
Theorem 3 Let the assumptions of Theorem 1 be fulfilled. The a posteriori error estimate (51) is sharp in the sense that Proof. The jump terms from the left-hand side of the desired estimate appear also in the norm of the right-hand side. We merely have to take into account that [[∇u ⋅ n F ]] = 0 a.e. on all interior facets F ∈   h . The volume residuals are bounded by the element-wise H 2 (Ω)-seminorm due to and || A|| L ∞ (Ω) ≤ 2. ▪

A method using the piecewise Hessian
Instead of using Hessian recovery techniques for the realization of our method, as investigated in the previous sections, it is also possible to use the cellwise exact Hessian, that is, H ≔ ∇ 2 h . This idea is proposed in [18]. As the resulting bilinear form is not coercive additional jump penalty terms have to be added. The resulting equation reads for all v h ∈  h,0 . Under the assumption that 1 > 0 is sufficiently large ( 1 = 0 is not allowed here) and that the Cordes condition (11) is fulfilled with some ∈ (0, 1], it has been proved in [18,Lemma 4.3] that the bilinear form a h (⋅, ⋅) is uniformly coercive on  h,0 and hence, (58) possesses a unique solution u h ∈  h,0 . This is a direct consequence of a discrete Miranda-Talenti estimate similar to Lemma 7 and the techniques applied in the proof of Lemma 9.
Due to the consistency of this scheme, one can easily conclude the a priori estimate , s ∈ [2, p + 1], provided that u belongs to H s (Ω).
A posteriori error estimates can be derived with the same argument as in Theorem 2.
To be more precise, one can show by a slight modification of the proofs from the previous section that the estimator from (57) is a reliable and sharp bound for ||u − u h || H 2 h (Ω) .
An advantage of the direct scheme (58) is that the computational effort is less than for our system (17) since no additional equations for the computation of the Hessian approximation are needed. As we will observe in our numerical experiments, the approximation properties for the error u − u h in the H 2 h (Ω)-norm as well as in the H 1 (Ω)-norm will be the same for both approaches. However, it turns out that the convergence rate in the L 2 (Ω)-norm is higher for the approach studied in the previous sections.

NUMERICAL EXPERIMENTS
In this section, we perform different numerical experiments. All implementations were done in using the finite element library 2019.1 [39,40]. Our code is residing in the public GitHub repository; https:// github.com/janblechschmidt/NonvarFEM. It is our purpose to compare four discretization approaches, that is, • the method using a finite-element Hessian with continuous and discontinuous trial functions (denoted by CG and DG in the following) discussed in the present article (Section 3.2), • the Petrov-Galerkin scheme (N) proposed by Neilan [16], which likewise utilizes a DG finite-element Hessian but with h = id, that is, there is no Laplacian acting on the test function, • and the method using the piecewise Hessian proposed by Neilan, Salgado, and Zhang (NSZ) [18] that we discussed briefly in Section 3.5.

A problem with almost violated Cordes condition
We choose a problem on the unit square with matrix and determine the source term f such that the smooth, exact solution of (7) is given by The matrix A fulfills the Cordes condition if ∈ (−1, 1). If is sent to 1, and hence the coercivity constant from Lemma 9 will tend to zero so that the problem is harder to solve with an iterative method like. This behavior is also observed in our numerical experiments. The iteration numbers required to realize our method with a CG Hessian for piecewise quadratic trial functions (p = 2) for different stabilization parameters in J h and different values of are reported in Table 1. Obviously, with the preconditioner proposed in (23) and the stabilization term J h , we observe that the iteration numbers mildly increase when the mesh parameter decreases or when approaches 1. The incorporation of an additional jump term for the second derivatives in J h , that is, the choice 2 > 0 in (39), did not lead to an improvement of the computational results.
In a further numerical test, we computed the discretization error for different polynomial degrees. Here, we used the choice = 1/2. As the Cordes condition for this example is fulfilled with a sufficiently large we dropped the stabilization terms, that is, we set 1 = 2 = 0. For comparison, we also present computational results for the piecewise Hessian approach (NSZ). The error plots in different norms and for varying polynomial degrees are shown in Figure 1. All convergence rates in the H 2 h (Ω)-norm coincide with the ones predicted by Theorem 1. It is also observed that both approaches behave quite similarly. In the H 2 h (Ω)-norm the errors decay almost identically. However we observe two advantages for our approach using a Hessian recovery strategy. First, it even converges in the L 2 (Ω)-and H 1 (Ω)-norm if the polynomial degree p = 1 is used. This coincides with the observations from [27], where the case p = 1 is allowed as well. Second, the convergence rate in the L 2 (Ω)-norm is higher for the Hessian recovery approach in case of quadratic elements. This is caused by the fact that a stabilization term is not needed in the present situation. In a last test for this example we check how the methods studied in the present article compare with the approaches (N) and (NSZ) mentioned at the beginning of this section. The error curves for different norms and different polynomial degrees can be found in Figure 2. Although all approaches behave quite similarly, we observe a difference in the convergence rates in L 2 (Ω) for quadratic elements. Obviously, the approaches taking into account stabilization terms (these are our approaches with 1 ≠ 0 and [NSZ]) converge only with order 2, while the remaining approaches (these are our approach with 1 = 2 = 0 and [N]) converge with order 3. A proof of this conjecture is subject of future research.

A problem with singular solution
In this example we consider the Poisson problem, that is, the diffusion matrix is chosen as A = I 2×2 , in the domain Ω = (0, 1) 2 . Emphasis is put on problems whose solutions have reduced regularity.
To this end, we construct the right-hand side f in such a way that is the exact solution. Here, (r(x), (x)) are polar coordinates centered in the origin. A simple computation shows that u ∈ H s (Ω) holds for all s < 1 + . In the present experiment we choose the value = 3/2 and expect the regularity of almost H 5/2 (Ω), and thus, as predicted by Theorem 1, the convergence rate in the H 2 h (Ω)-norm should be 1/2 − for arbitrary > 0. We would also expect that an adaptive finite element method will retain the optimal convergence rate. The adaptive strategy we implemented uses the local error estimator (56), the Dörfler marking strategy in such a way that those elements contributing 90% to the globally estimated error are marked, and the bisection refinement strategy provided by the library. The results shown in Figure 3 confirm the optimality of the adaptively generated finite element meshes. It is also observed that the convergence rates in the H 1 (Ω)-and L 2 (Ω)-norm are optimal.

A problem with discontinuous coefficient matrix
This example illustrates the capability of the method to handle discontinuous diffusion coefficients. Problems of this type are of particular interest as a transformation to a PDE in divergence form is not possible. The coefficient matrix in the present example is sgn(x 1 x 2 ) 2 ) , and f is chosen in such a way that u(x 1 , x 2 ) = x 1 x 2 (1 − e 1−|x 1 | ) (1 − e 1−|x 2 | ) is the exact solution. The computational domain is Ω ≔ (−1, 1) 2 . This example is also used in the numerical experiments from [6,20,24], where different discretization approaches are studied. Here, we apply our finite element scheme from Section 3.2 and investigate the behavior of an adaptive finite element method based on the error estimator derived in Theorem 2. The adaptively generated mesh as well as the error curves can be found in Figure 4. Finally we illustrate the convergence behavior for different choices for the polynomial degree in Figure 5 and compare again our method without stabilization and the piecewise Hessian approach (NSZ). In the H 2 h (Ω)-norm both approaches behave similarly and the convergence rate predicted in Theorem 1 is also confirmed. Our approach performs even slightly better when comparing the error in weaker norms.

A problem with anisotropic and discontinuous coefficient matrix
In this example we consider a problem in Ω ≔ (−1, 1) 2 with the input data ) .
The diffusion in x 1 -direction is very small so that the solution exhibits a boundary layer at the boundary edges x 2 = − 1 and x 2 = 1. Moreover, the coefficient A 22 is discontinuous. The computational results for our adaptive finite element method are illustrated in Figure 6. We observe that the discontinuity and the boundary layer are both resolved by the mesh. Furthermore, the propagation of the error is illustrated and one observes that the adaptive refinement retains the optimal convergence rate. Note that we used the value of the global estimator as an error measure since an explicit solution is not available for this example.

A three-dimensional problem with reduced regularity
In this numerical experiment we show the applicability of our procedure to the three-dimensional case. We choose the diffusion matrix to be A = I 3×3 in the domain Ω = (0, 1) 3 . Similar as in Section 4.2, the solution u(x) = r(x) with r(x) = √ ∑ 3 i=1 (x i − 0.5) 2 possesses a reduced regularity, that is, u ∈ H s (Ω) for s < 1.5 + . The results for various choices of are shown in Figure 7 and confirm the expected behavior. For example, the selection = 1 results in a regularity of almost H 5/2 (Ω), which in turn yields an expected convergence rate in the H 2 h (Ω)-norm of 1/2 − for arbitrary > 0 and 3/2 − and 5/2 − in the H 1 (Ω)-and L 2 (Ω)-norms, respectively. The corresponding error plot in Figure 7 confirms this. An adaptive refinement strategy using the local error estimator (56) with a refinement threshold of 95% is capable of confirming the convergence rate of the errors in the L 2 (Ω)-norm and improving the convergence rates in the H 1 (Ω) and H 2 h (Ω)-norm.

CONCLUSION AND OUTLOOK
The proposed method can be extended to parabolic problems. Given a regular solution and an appropriate time stepping scheme one can observe the same convergence rates as in the elliptic case. Numerical tests have been performed to confirm this and they are included in the GitHub repository accompanying the paper. A detailed analysis as in [41] is left to future research. Another subject, and this was the authors' original motivation to study this topic, is the application of the proposed discretization to Hamilton-Jacobi-Bellman equations. A preliminary implementation is also available in the repository and the related theoretical foundation will be examined in further publications. Besides these two extensions there are further interesting questions left. An obvious question is the proof for error estimates in lower-order norms. Note that one advantage of the approach proposed in this article is that optimal convergence in L 2 (Ω) is observed. However, a proof of this observation is still missing. To the best of the authors' knowledge the only article dealing with estimates in lower-order norms is [42], where an H 1 (Ω)-norm estimate for the Petrov-Galerkin approach using h = id is shown. Related studies for methods exploiting the Cordes condition are not available in the literature. A proof based on the usual duality argument is likely not expedient as, for instance, the dual equation of a nondivergence form PDE is a PDE in double-divergence form whose solutions possess insufficient regularity. In the special case A = id and for the method proposed in Section 3.5, the discretization coincides with a  0 -interior penalty discretization of the biharmonic equation and error estimates in lower-order norms can be directly concluded from [43]. However, an extension to approaches using recovered Hessians is not straightforward and requires further investigations.