A conforming auxiliary space preconditioner for the mass conserving stress‐yielding method

Summary We are studying the efficient solution of the system of linear equations stemming from the mass conserving stress‐yielding (MCS) discretization of the Stokes equations. We perform static condensation to arrive at a system for the pressure and velocity unknowns. An auxiliary space preconditioner for the positive definite velocity block makes use of efficient and scalable solvers for conforming Finite Element spaces of low order and is analyzed with emphasis placed on robustness in the polynomial degree of the discretization. Numerical experiments demonstrate the potential of this approach and the efficiency of the implementation.


INTRODUCTION
Let Ω ⊂ R d be a bounded domain with d = 2 or 3 with Lipschitz boundary Γ ∶= Ω. Let u and p be the velocity and the pressure, respectively. Given an external body force f ∶ Ω → R d and the double of kinematic viscosity denoted by , the velocity-pressure formulation of the Stokes system is given by where (u) = 1 2 (∇u + (∇u) T ). By introducing additional matrix valued variables ∶= − (u) for the stress and ∶= where dev( ) denotes the deviatoric part (see Section 2), and (2a) is motivated by the fact that for the solution of (1) we have = − (u) = − dev( (u)) = dev( ). The introduction of as a Lagrange multiplier enables the derivation of discrete methods that enforce the symmetry constraint (2c) weakly, see also References 1-3. As boundary conditions, we consider Dirichlet ones for the velocity u, homogenous purely for clarity of the presentation, and two kinds of outlet conditions, where I is the d × d identity matrix and u t is the tangential part of u. We assume that both Γ D and at least one of Γ N or Γ Ñ have positive measure. As usual, when Γ N = Γ Ñ = ∅, an additional condition must be imposed on the pressure to make it unique. In recent years, divergence-free and pressure-robust Finite Element discretizations, that is those whose solutions fulfill (2d) strongly, and allow for pressure-independent a-priori error estimates respectively, have been of great interest. 4 For the velocity-pressure formulation (1), one class of such methods are certain Hybrid Discontinuous Galerkin (HDG) methods that take the velocity in H(div, Ω) and the pressure in L 2 (Ω), that is, they only build normal continuity into the Finite Element space while the tangential continuity of the solution is enforced via Lagrange parameters. To make the resulting system for the velocity positive definite, a consistent stabilization term has to be added, often involving either a parameter that has to be sufficiently large or a lifting of the jump, see References 5,6. In References 7,8, the authors presented a novel variational formulation for the Stokes equations that still takes the velocity in H(div, Ω) and pressure in L 2 (Ω), retaining the property of yielding exactly divergence-free and pressure-robust solutions, but is based on (2) instead of (1). This mass conserving stress-yielding (MCS) method features a normal-tangential continuous stress space and requires no stabilizing term. It was already remarked in the original work 7 that static condensation can be performed to eliminate certain degrees of freedom (dofs). Later, in Reference 9, the normal-tangential continuity of was broken and instead an additional Lagrange parameter û was introduced. This technique makes it possible to eliminate altogether and to reduce the problem to one for the velocity in H(div, Ω), the pressure in L 2 (Ω), and the newly introduced û which approximates the tangential velocity trace on the mesh facets. The velocity unknowns u, û take the place of as primal variables in the condensed saddle point system, with the pressure remaining the Lagrange parameter enforcing (2d). The condensed system involves the same variables, and has the same structure as the above HDG methods, but does not require a stabilization term. The first contribution of this work is a characterization of the velocity block of the condensed system based on its relation to the velocity block resulting from an HDG method with optimal stabilization. In particular, we prove that the velocity block, as previously claimed in Reference 9 for that of a related low order MCS method, is indeed positive definite.
We then move on to the question of how to efficiently solve the condensed system and consider preconditioned Krylov space methods. Preconditioning techniques for saddle point systems based on separate preconditioners for the primal (velocity) and Lagrange (pressure) unknowns are a well studied subject, see Reference 10, and the pressure Schur complement is easily preconditioned, see Reference 11. Therefore, our focus is on identifying and analyzing suitable preconditioners for the condensed velocity block.
The literature on preconditioners for conforming methods is vast and includes, among others, domain decomposition, see Reference 12, as well as Geometric, see Reference 13, and Algebraic, see Reference 14, Multigrid methods and an even somewhat comprehensive review would be beyond the scope of this work. We will take as given that efficient and scalable solvers for conforming methods exist and are available.
Preconditioners for HDG methods are not quite as well studied in the literature, one recurring theme is the attempt to reuse conforming preconditioners for these non-conforming spaces. For example, a non-nested Multigrid method with conforming coarse grid spaces was studied in Reference 15, and auxiliary space preconditioners (ASP, see Reference 16) that also feature a conforming sub-space were considered in Reference 17. The idea at the heart of both approaches is to decompose functions in the non-conforming space into a conforming component plus a (small) remainder and to address them separately with some pre-existing conforming preconditioner and a simple, computationally inexpensive method such as (Block-)Jacobi, respectively.
The principal focus in this work is on the introduction and analysis of ASPs for the MCS method. The main improvement over the theory in Reference 17 is that the analysis of the velocity preconditioners extends techniques from Reference 18 and is explicit in the polynomial degree of the discretization. In particular, the main result, Theorem 3, states that the condition number of a particular ASP is bounded by ⋅ (log(k)) 3 , where k is the polynomial degree of the discretization and is a constant stemming from the relation between condensed MCS and HDG norms.
We close out the discussion with numerical experiments that demonstrate the robustness and scalability of the proposed preconditioners. It is a testament to the elegance and simplicity of the ASP method that we were able to scale the computations to a relatively large scale by leveraging existing, scalable and highly Performant software. Outline We gather notation used throughout this work in Section 2 and introduce various Finite Element spaces and norms in Section 3 which also contains some useful technical results. Section 4 reviews the MCS method itself and contains a thorough discussion of static condensation as well as results on the obtained condensed systems. Approaches for preconditioning saddle point matrices with separate preconditioners for the primal unknowns and Lagrange multipliers as well as the method of auxiliary space preconditioning are recalled in Section 5. The main results can be found in Section 6, where different variations of ASPs for the velocity block of the Stokes system are discussed. In Section 7, we sketch the treatment of the lowest order case which is not covered by the theory developed in previous sections. Finally, numerical experiments are performed in Section 8.

NOTATION
With M denoting the vector space of real d × d matrices, we define the subsets of skew-symmetric and skew-symmetric trace-free matrices by where (⋅) T denotes the transpose and I ∈ M the identity matrix. To differentiate between scalar-, vector-and matrix-valued functions on some subset D ⊆ Ω we include the range in the notation for the latter two while we omit it for the former one, i.e. where L 2 (D, R) = L 2 (D) denotes the space of square integrable R-valued scalar functions, the spaces L 2 (D, R d ) and L 2 (D, M) denote the analogous vector-and matrix-valued spaces. Similarly, P k (D, R) = P k (D), and so forth, denote the set of scalar-, vector-or matrix-valued polynomials up to degree k on D. We use the notation (⋅, ⋅) D for the L 2 -inner product on D and set || ⋅ || 2 D = (⋅, ⋅) D . The L 2 -orthogonal projection onto P k (D, ⋅) (the range should be clear from context) is denoted by Π k D and we will occasionally omit the subscript. Similarly, the L 2 -orthogonal projector onto the (restrictions to D of) the rigid body modes In the following, let , Φ, and Ψ be smooth scalar-, vector-, and matrix-valued functions, respectively. The operator ∇ is to be understood from context as resulting in either in a vector whose components are i ∶= ∕ x i or a matrix with components ( i Φ j ). For vector-valued functions in three dimensions the operator curl is defined as curl Φ ∶= ∇ × Φ and in two we understand it to refer to the scalar-valued curl ∶= − 2 1 + 1 2 . The divergence operator div is understood as divΦ ∶= ∑ d j=1 j Φ j for vectors and is applied row-wise to matrices, that is, Besides the well known trace operator tr(Ψ) ∶= ∑ d j=1 Ψ ii and the deviatoric part dev(Ψ) ∶= Ψ − 1 d tr(Ψ)I we further introduce the operator Based on these differential operators, we use standard notation for the Sobolev spaces H m (Ω, R) = H m (Ω), H(div, Ω) and H(curl, Ω) with m ≥ 0. Further, for some Γ * ⊆ Ω, a subscript "0, Γ * " indicates that the corresponding natural traces vanish on Γ * , and we use only the zero subscript if Γ * = Ω.
We denote by  h a quasi-uniform and shape regular triangulation of the domain Ω into simplices. Let h denote the maximum of the diameters of all elements in  h . The set of element interfaces and boundaries, or facets, is denoted by  h and the set of facets of a particular element T ∈  h is  T ∶= {F ∈  h ∶ F ⊆ T}. By an abuse of notation, we shall also use  h to denote the domain formed by union of all F ∈  h . We assume that the mesh resolves the domain boundary parts in the sense that According to this mesh we also introduce the "broken" spaces where, as before, we include the range explicitly, for example, as in P k ( h , R d ). On each F ∈  h we denote by ⟦⋅⟧ and {{⋅}} the standard jump and mean value operators and take them to be the identity on boundary facets. On each element boundary and each facet F ∈  h we denote by n the outward unit normal vector. The scalar normal and vector-valued tangential traces of a sufficiently smooth function v are given by v n ∶= v ⋅ n and v t ∶= v − v n n. Similarly, the normal-normal and normal-tangential traces of a smooth matrix-valued function Ψ are Ψ nn ∶= Ψ ∶ (n ⊗ n) = n T Ψn and Ψ nt = Ψn − Ψ nn n. We write functions in general Sobolev spaces as u, û, , and so forth, discrete functions with a subscript h as u h , û h , h , and so forth, and their via Galerkin isomorphism identified coefficient vectors w.r.t to some given Finite Element basis as u,û, , and so forth. For readability of the presentation we make no difference between row and column vectors and, for example, write (u, û) for the coefficient vector of (u h , û h ) which should strictly speaking be the column vector (u T , û T ) T . Similarly, operators are capital letters A, B, and so forth, their discrete counterparts A h , B h , and so forth, and the corresponding Finite Element matrices A, B, etc. Occasionally, when it is useful to emphasize the Galerkin isomorphism we use ∼ G , for example, Finally, throughout this work we write A ≲ B when there exists a constant c > 0 independent of the mesh size h and the viscosity such that cA ≤ B and A ∼ B ⇔ A ≲ B ∧ B ≲ A. For example, due to quasi-uniformity we have h ∼ diam(T) ∀T ∈  h . For two elliptic operators A, B (or symmetric and positive definite matrices A, B) we take A ≲ B to mean that the maximum eigenvalue of the generalized eigenvalue problem Ax = Bx is bounded by a constant C similarly independent of h and . Note that in inequalities related to discrete functions or operators, unless explicitly stated otherwise, these constants can depend on the polynomial degree. Henceforth we assume that is a constant.

FINITE ELEMENTS AND NORM EQUIVALENCES
Reminding our self that the lowest order case is addressed separately in Section 7, we define the following approximation spaces for k ≥ 2: See Reference 19 for a detailed discussion of the H(div)-conforming Brezzi-Douglas-Marini (BDM) space appearing in the definition of V h . Note that, restricted to a single element T, in addition to P k−1 (T, D), the stress space Σ h also includes functions in P k (T, D) with vanishing normal tangential trace ("nt-bubbles"). We further define the space of divergence where Π −1 F ∶= 0 and the equivalence was shown in [Theorem 2]. 20 Where it is clear from the context which volume element T is meant, we omit it from the subscript and simply write || ⋅ || j,F . We define Hybrid Discontinuous Galerkin (HDG) norms on  h and U h by In (10), the terms for F ∈ Γ Ñ , where û h = 0, weakly enforce u t = 0 from (2g). There holds the equivalence (see Reference 9)

Technical results
For readability, the technical details of this Section are moved to the Appendix.

Interpolation operators
where p is the set of all elements that share the vertex p, and | p | is the number of such elements. Bounds for the approximation error of  f in H 1 -like norms are very standard and well known, and with a Korn inequality for broken H 1 spaces like derived in Reference 21, it can easily be bounded by an || ⋅ || ,h like one. However, as the kernel of is controlled only by the  D h terms, C k can degenerate depending on the shape of Ω and Γ D . As it would otherwise later on enter into condition number estimates, the following Lemma 1 bounds the approximation error of  f independent of C K .

Lemma 1. There holds
Proof. See Appendix A. ▪

Trace norms
For F ∈  h and an arbitrary element T ∈  h with F ∈  T we define for all û ∈ P k (F, R) discrete versions of the H 1∕2 (F, R) and the H 1∕2 00 (F, R)-norm for (scalar) HDG spaces as In Reference 20 the authors proved the inverse estimate A similar estimate can be derived for the hybrid, vector-valued velocity space  h and norms involving the symmetric gradient, The difference lies not only in the appearance of instead of ∇ but also, and more importantly, in the fact that, as V h ⊆ H(div), the normal trace is enforced strongly, and one has to slightly modify the strategy from Reference 20.
Proof. See Appendix B. ▪

THE MCS METHOD
The method considered in this work is based on formulation (2), where is used as a Lagrange multiplier to weakly enforce the symmetry constraint (2c), see also References 1-3. In Reference 7, a novel variational formulation of (2) without the symmetry constraint was presented where the velocity and pressure spaces were H(div, Ω) and L 2 (Ω) and the stress space for the variable was defined as H(curl div) ∶= { ∈ L 2 (Ω, D) ∶ div( ) ∈ H(div, Ω) * }, where the superscript * denotes the classical dual space. The variational version of (2b) then became where ⟨⋅, ⋅⟩ div denotes the duality pairing on H(div, Ω). The authors showed that Finite Element approximation of in H(curl div) demands normal-tangential continuity. The method described in the following is based on this variational formulation and in many ways is a variation of previous MCS methods from 22. Like in the method from Reference 9, we incorporate the normal-tangential continuity of h via a Lagrange multiplier inV h , similar to approaches taken in hybridized mixed methods for the Poisson problem, see Reference 23-25. For a detailed discussion on this hybridization technique see also section 7.2.2 in Reference 19. The main motivation for breaking the normal-tangential continuity by hybridization is that it enables local, element-wise elimination, or static condensation, of all Σ h and W h dofs. The resulting, condensed, system is the one we actually have to solve and are interested in preconditioning. It will therefore be discussed in great detail in Section 4.2.
The hybridized mass conserving stress-yielding method with weakly imposed symmetry finds with the bilinear form The first two integrals in b can be interpreted as a discrete version of the duality pair given in (22) and the third weakly enforces the symmetry constraint. The last terms incorporate the normal-tangential continuity of h and the tangential part of (2f). Since On Γ D and Γ Ñ the integrals vanish together with (v h ) t = 0 and on Γ N the remaining integrals weakly incorporate the tangential part of (2f), For more details on boundary conditions see Remark 2.
We also define the sub problem where we leave out the divergence constraint and solve only for the velocity and stress: with Note the addition of the term d −1 (div(u h ), div(v h )) in (23b) which is not present in any of the previous MCS methods. It is consistent since the solution is exactly divergence-free (by (23c) and div(V h ) = Q h ) and guarantees the inf-sup stability of  by compensating for the missing trace of h which can only approximate the deviatoric part of (u). The choice of the constant d −1 is motivated by the identity (u) = dev( (u)) + d −1 div(u)I. It makes (24) a discretization of −div( (u)) = f and the velocity Schur complement introduced in Section 4.2 equivalent to a linear elasticity problem, see (31) in the proof of Lemma 5. We want to stress again that preconditioners for such problems (although only for conforming methods) are standard in the literature and widely available. A different choice of constant would be possible, but changes the norm represented by the velocity Schur complement. For example, a large (penalty) parameter −1 (i.e., with a small > 0) instead of d −1 would be a valid choice but would make the resulting velocity Schur complement extremely difficult to precondition as it then has the divergence constraint essentially built in. Remark 1. The MCS method here, like the one from Reference 8 to which it is most closely related to, features insufficient inter-element coupling for the lowest order case k = 1 and is only stable for k ≥ 2. This is discussed in detail in Reference 9, where a stable minimal order MCS method for k = 1 was introduced. We treat the lowest order case separately in Section 7.
Remark 2. Reference 7 contains an exhaustive discussion of possible boundary conditions for the standard MCS method. One type of boundary condition for which it is particularly suitable are non-homogenous viscous stress ones, where ( h ) nt = g t can be demanded strongly for a given tangential traction g t , and is incorporated as an essential boundary condition in Σ h . In the hybridized MCS method, where the normal-tangential continuity in Σ h is no longer given, this boundary condition is instead realized by an additional right-hand side term ∫ Γ N −g tvh in (23). Since ( h ) nt ∈V h , this still strongly yields ( h ) nt = g t , as testing (23b) with (0,v h , 0) shows.

Stability analysis
In the following we summarize the stability results for the discrete method defined above. We only prove solvability of (24), all other results follow with the same techniques and steps as in References 7-9,22. Lemma 4, which can, just as Lemma 3, be found in the stated literature, is an inf-sup stability result for the constraint given by the bilinear form b. It is posed in the semi-norm | ⋅ | U h , * as, since all elements in Σ h are trace-free, the divergence of functions in U h can not be controlled. Theorem 1 states that with the addition of the term (24) is solvable independently of the divergence constraint. Finally, Corollary 2, which is again already proven in the literature, provides solvability of (23) including the divergence constraint. Note that the stability analysis in this section is only robust in terms of the mesh size h. Particularly the inf-sup constant in Corollary 2 and Theorem 1 is not known to be independent of the polynomial order k.

Lemma 3. There hold the continuity estimates
Proof. This follows with standard techniques, that is, using Lemma 4, Young's and Cauchy Schwarz's inequality and the norm equivalence (13). ▪ For a smooth exact solution we can further derive the following optimal error estimate.
be the exact solution of the Stokes problem (2) and set û = u| F for all F ∈  h . Let (23) and let s = min(m − 1, k), then there holds the error estimate ) .

Static condensation of local variables
We now discuss the structure of the Finite Element matrix directly obtained from the MCS method (23) and that of various Schur complements thereof. Writing , u , û , and p for the basis functions of Σ h , V h ,V h , W h and Q h respectively and, complying with the notation for the Galerkin isomorphism introduced in Section 2, u for the coefficients of u h with respect to the basis given by u , and so forth, (23) in matrix form is The right hand side vector F is given by The system matrix with its entries is a saddle point matrix with Lagrange multipliers , u, û and p enforcing (2c), (2a), the nt-continuity of , and (2d) respectively.

Static condensation of ,
The diagonal block for , does not couple with the pressure via the incompressibility constraint and, thanks to the introduction of û h as additional multiplier, is block diagonal. It is also invertible since every block represents the simple projection problem of finding where Σ h (T), W h (T) are the restrictions of the corresponding (discontinuous) global spaces to T and g T is some right hand side. Standard arguments and the Brezzi theorem prove that (26) is inf-sup stable, that is, writing

M is invertible and the Schur complement
T is well defined and, as M is block diagonal, can be computed element-wise. Eliminating , from (25) in this way leaves us with the system The symmetry of K is obvious and in the next Lemma 5 we show that the upper left block A is also positive definite and we are now in the very standard setting of a saddle point problem with symmetric and positive definite (SPD) "A-block". The velocity unknowns u h , û h move to the position of primal variables, while the pressure p h remains the Lagrange parameter for the divergence constraint. After solving (27) to get u h , û h , p h , we can recover h and h by solving the local problems (26).

Lemma 5. The Schur complement A is symmetric positive definite and with
where we used an element-wise integration by parts for b.

into (26a) and test with
We can use (26b) again to see It remains to prove the other direction. By Lemma 4 there exists a h ∈ Σ h with || || 0 ≲ |u h , û h , h | U h , * such that, once again inserting (30) into (26a), we see that and therefore there holds and the distributional terms in (30) vanish. The solutions of (26) are then simply given by h = − dev( (u h )) and h = (curl(u h )) and (31) states In A, we have a discretization of div( (u)) with degrees of freedom u h and û h only. This is less reminiscent of a mixed method like MCS than of a HDG method and it is interesting to further elaborate on the relationship between the MCS method and DG and HDG methods. In general, DG and HDG methods require a stabilizing term to assure solvability. An example is the well known interior penalty method where the L 2 -norm of jumps, some sufficiently large is used. Any dependence on such a parameter is avoided here, however this is not an unique feature of the MCS method. Other DG and HDG methods that also avoid this parameter feature a lifting h of the jump similar to (9) instead of its L 2 norm, see Reference 5. That lifting has to be explicitly computed and is then condense out. A final class of DG methods, for example the one in Reference 26, see also References 5,27, features a simultaneous lifting of the jump and the fluxes. This is similar to what happens here, where h both approximates the flux − (u) and automatically and canonically stabilizes the condensed system through its interaction with the tangential jumps.

Static condensation of high order velocity functions
The basis functions of V h can be split into two different types, see Reference 28. We write u,• for the high order "element bubble" basis functions whose support is entirely within some element T ∈  h and whose normal trace on T vanishes. The span of these basis functions is denoted by The remaining basis functions u, of V h have support entirely within the patch of some facet F and their normal trace on all other facets in the patch vanishes. As the supports of u,• belonging to different elements do not overlap, in the upper left block A •• is block diagonal and invertible. This lets us form a second, "double" Schur complement In the bigger system (27), all V h degrees of freedom couple with the divergence constraint and we cannot perform this static condensation independently of the pressure variables. However, for higher order problems, implementing multiplication with A via the exact factorization is still advantageous. Both the left and right factors as well as A •• are block diagonal and only A instead of the larger A needs to be assembled as a proper sparse matrix. We will revisit the idea of also preconditioning A via this factorization in Section 6.1. Splitting the coordinate vector u of the V h component of (u h , û h ) ∈  h into u • and u , the norm induced by A on (u , û) is That is, the norm induced by A is just the one induced by A on the energy minimal extension to  • h dofs. The lifting operator, or (discrete) harmonic extension,  ∶  h →  h maps (u h , û h ), to the minimizer in (33): The range of  is and for such "discrete harmonic" or "lifted" functions Here we encounter a slight complication of notation: Per default, (u h , û h ) ∈  harm h is associated with its coordinate vector (u • , u , û), but A only takes the (u , û) coordinates (which determine u • according to (35)). The space  harm h is spanned by lifted, discrete harmonic, basis functions, where Φ u, is the set of all u, and Φ û the one of all û basis functions. The induced Galerkin Isomorphism ∼ G defines the natural operator A associated with A and identifies , u , û).
Analogously, we define the Schur complement like norm and the associated lifting operator  ∶  h →  h such that Note that both || ⋅ || A and || ⋅ || ,h, can be defined on the entirety of  h but are only semi-norms as they vanish on ker  = ker  =  • h . Restricted to  harm h they are proper norms and equivalent.

Corollary 4. There holds
for some (k) > 0 discussed in more detail shortly in Remark 4.
Proof. Follows immediately from Lemma 5. ▪ Remark 4. An interesting question is whether, and if so, how strongly, both the constant in the lower bound in (28) and depend on k. Note that only the latter constant enters into condition number estimates in Section 6. Numerical experiments on the unit tetrahedron suggest that the former depends linearly on k while there holds = (1) or possibly = (log (k) l ) with some moderate l > 0. We have not further pursued a rigorous proof for this latter, admittedly crucial, fact. Such a proof would essentially require a k-explicit version of Lemma 4 for functions in  harm h .

PRECONDITIONING FRAMEWORK
A final Schur complement can be formed with respect to the pressure unknowns, however this involves the inverse A −1 .
With the resulting (negative) pressure Schur complement S p ∶= BA −1 B T , we have the exact factorization for the saddle point matrix K. Solving (27) could in principle be reduced to solving separate problems for the pressure and velocity. While this is not feasible due to the appearance of A −1 in the pressure Schur complement, this line of thought still takes a prominent role in common preconditioning techniques for K based on separate preconditionersÂ for A and S p for S p . See Reference 10 and the references therein for an overview of such methods. Motivated by (39), here we usê Note that unlike suggested by (40), the operation x  →K −1 x can be implemented such that it requires only two applications ofÂ −1 instead of three. A rigorous analysis ofK for the generic saddle point case as well as a number of other, similar, preconditioners built fromÂ andŜ p can be found in Reference 29.

Pressure Schur preconditioner
From the standard Stokes-LBB condition on  h using the norm || ⋅ || ,h , see for example in Reference 9, and the equivalence result Lemma 5, we can conclude the MCS Stokes-LBB condition with L > 0 independent of h. It is generally well known that given this LBB condition, S p is equivalent to the scaled mass matrix (M p p, q) ∶= −1 (p h , q h ) 0 for p h , q h ∈ Q h . The bounds in that equivalence depend on a continuity constant (Lemma 3) and, more importantly, L , see References 11,30. As Q h is completely discontinuous across elements in the MCS discretization, inverting the block-diagonal matrix M p is feasible and we useŜ p ∶= M p . Since therefore L is the primary constant determining the quality of the pressure preconditioner, it is very relevant what else it might depend on. For norms similar to the ones used here, L was proven to be independent of k in two dimensions in Reference 31 and numerical experiments carried out in the same work strongly suggest that the independence also holds in three dimensions. The k-robustness is further supported by numerical experiments carried out in this work, see Section 8. There remains, however, a dependence of the domain Ω itself, and in fact L even tends to zero for certain domain shapes, for example, channels with increasingly higher aspect ratios, see Reference 32. This imposes a fundamental limitation on all methods that, like the one here, precondition the velocity problem independently of the divergence constraint.

Auxiliary space preconditioning
We give the fictitious space Lemma 6 below in the compact form it takes, for example, in Reference 14 [Theorem 6.3].

Lemma 6. Let H,H be two real Hilbert spaces equipped with norms induced by A ∶ H → H * andÃ ∶H →H * and let there exist a linear operator Π ∶H → H such that the continuity condition
and the stability condition, ∀v ∈ H ∃ṽ ∈H such that v = Πṽ and ||ṽ|| 2 hold. Then, for the preconditioner Â a defined by Â −1 a ∶= ΠÃ −1 Π * , there holds the spectral estimate The term auxiliary space method, as coined in Reference 16, refers to the case where the titular fictitious spaceH is a product space that contains H itself as a component,H = H × V 1 × … × V n , so Π takes the form Π = (I|Π 1 | … |Π n ), and Ã ∶= diag(M,Ā 1 The stability condition (42) then demands the existence of a stable decom- The underlying idea is that the remainder v 0 ∈ H in this composition is small and somehow localized and M can be a computationally cheap method (or "smoother"). Often, M is given by some form of additive or multiplicative Schwarz method such as (Block-)Jacobi or (Block-)Gauss-Seidel. In the only relevant case here, where all involved spaces are finite dimensional and n = 1, the ASP Â a is just As alluded to by the subscript, Â a is an additive preconditioner in that, given some right hand side vector b and intermediate approximation x 0 with residual r 0 ∶= b − Ax 0 , one Richardson iteration with preconditioner Â a is to perform that is, to perform two updates additively. The multiplicative ASP Â m is implicitly defined by performing these updates successively instead, and then, performing another smoothing step with the adjoint smoother M * Then Â m is self-adjoint and positive definite and there holds Proof. This is proven within the framework of space decomposition and subspace correction, see Reference 33. The analysis there rests on a strengthened Cauchy-Schwarz type inequality and a stable decomposition. The former is implied by limited overlap of subspaces and the additional requirements posed on M and Ã 1 and the latter is directly related to (42). See also the discussion in Reference 14 [section 6], where convergence bounds for multiplicative two-grid Algebraic Multigrid methods are derived from the fictitious space lemma. ▪

PRECONDITIONERS FOR A
From the point of view of Section 5.2, a straightforward approach to preconditioning A is to use the conforming low order space V h , where preconditioning is well understood and efficient and scalable software is widely available, as basis for an ASP. A slight complication in the analysis arises due to the non-conformity in boundary conditions between  h = V h ×V h , where tangential Dirichlet conditions on Γ Ñ are imposed inV h , and V h , where Γ Ñ does not feature any Dirichlet conditions. While imposing strong tangential Dirichlet conditions in V h would sidestep the issue and be convenient for theory, in practice this is only a simple matter when the outflow lies in an axis-aligned plane and we can impose Dirichlet conditions in the x, y or z component. Therefore, we for now assume that Γ Ñ = ∅ and address the case Γ Ñ ≠ ∅ separately in Lemma 9 at the end of this Section. On V h , we define the bilinear formā(⋅, ⋅) (as usual, with associated operatorĀ and Finite Element matrix A) bȳ To define the operator Π in (41) we need the embedding operator with associated Finite Element matrix E.

Corollary 5. Forū h ∈ V h there holds
Proof. Forū h ∈ V h and Γ Ñ = ∅ there holds Eū h ∈  c h from Lemma 5 and (48) follows from (29). ▪ To establish the stable decomposition (42) we use  from Lemma 2 and define Proof. Per definition of E there holds Π k−1 (w h −ŵ h ) t = Π k−1 (u h − û h ) t and the facet terms are bounded trivially. The volume terms are bounded with Lemma 2 and the identity for the jump terms in (9). ▪ With V h being of low order, robustness in the polynomial degree k has to be achieved by the smoother.

Theorem 2. Let M be the overlapping Block-Jacobi preconditioner for A that has one block per facet F ∈  h that contains all  h degrees of freedom associated to either F or any T ∈  h such that F ∈  T . Let C be an SPD preconditioner for A such that C ∼ A. Then, conditions (41) and (42) of Lemma 6 are fulfilled for H
We postpone the proof of Theorem 2 to Section 6.1, where we discuss preconditioning of A , as obtaining the logarithmic bound in k is more natural in that context.
Remark 5. If one is satisfied with a polynomial bound in k, Theorem 2 can be shown only using standard Finite Element inverse estimates and Corollaries 5 and 6.

Preconditioning via the condensed system
Using the factorization in (32) to implement multiplication with A also opens up a way to precondition it. Replacing A by some preconditionerÂ yieldsÂ ext ∶= as a preconditioner for A. From the factorization (32) it clearly follows that and we are left with the task to precondition the "double" Schur complement A . Analogues for A of the preconditionerŝ A a andÂ m can be constructed straightforwardly with the modified embedding operator Note that the matrix E ∼ G E is just a sub-matrix of E as  does not change (u , û) coefficients, see (34), with .
We could modify the bilinear form in V h and use A ∶= E ,T A E , which would be computable element-wise. In that case, the exact analogue of Corollary 5 would hold. However, as we now show, this is not strictly necessary and for ease of implementation we opt to keep A defined by (46).
At any vertex p of T, d linearly independent components of (u • h ) |T (p) vanish, and therefore (u • h ) |T (p) and u • h also vanish as a whole. This concludes the proof as ,  only add some Proof. The sharp upper bound is a consequence of the energy minimization (33) and Corollary 5, With (u h , û h ) ∶=  Eū h and (52) we conclude the identity u h =  Eū h = Eū h =ū h . Now, (17) and (9) show and the rest follows form the lower bound in (38). ▪ Proof. With the readily apparent  V h E V h =  V h and (53) we see This lets us insert a zero into (w h ,ŵ h ) to obtain an expression without  in front, The proof is concluded by the energy minimization of  , where  1 BDM is the standard BDM 1 interpolator, see Reference 19, that is, Proof. The continuity condition (41) For the second term, C ≲Ā and Corollary 7 bound it by ||E  V h (u h , û h )|| 2 ,h, which is then further bounded by the || ⋅ || A norm with the continuity of E  V h in || ⋅ || ,h, , as implied by Corollary 8,and (38) where we incur the factor . The other bound requires a more careful approach. For general (v h ,v h ) ∈  harm h , and therefore also for In the first step we bounded the facet terms in the sum, where an infimum is taken over functions with arbitrary traces on neighboring faces, by ||(v h ,v h )|| ,h, , where these traces are fixed. On the other hand, the upper inequality in (38) shows where (v h ,v h ) (F) denotes the element of  harm h that has the same coordinates as (v h ,v h ) for degrees of freedom associated to F and whose degrees of freedom are zero otherwise (Galerkin isomorphism ∼ G ). Given the continuity of E  V h in || ⋅ || ,h, (see Corollary 8), the crucial step is therefore to bound || ⋅ || ,F,0 by || ⋅ || ,F , as in Corollary 1. However, Corollary 1 is only applicable This does not pose a problem for low-order functions or, crucially, their harmonic extensions, where an alternative path via an inverse inequality bypasses the trace estimate. For a low order , a standard, and necessarily k-independent, inverse estimate is Because of the energy minimization in || ⋅ || ,h, , the estimate holds with the same constant also for discrete har- The right hand side can then further be bounded using Corollary 8 or Corollary 9 if (v h ,v h ) takes the form of an approximation error, as required there.
Therefore, the strategy is to use the operator Π lo as defined in (56) to split the || ⋅ || M term in (58) into low and high order components. The former can then be bounded via the inverse estimate and the latter via the trace inequality, we have and the high order term can be simplified, Note that we can apply Corollary 1 not only to (I − Π lo )(u h , û h ), which is apparent from the definition of Π lo , but also to (I − Π lo )(u h , û h ) because again, as argued above,  does not change the relevant traces. Therefore, (60), (21) and then (59) shows where the continuity of Π lo used in the last estimate follows from the Bramble Hilbert Lemma as in the proof of Corollary 9. Finally, the bound follows with (38). As for the low order term, with Π lo E  V h = E V h we see and applying (61) to the (harmonic extension of) the low order function .

Corollary 8 and (38) bound the former two terms,
and Corollary 9 and (38) the latter two, ( Similarly to the proof of Theorem 3, the continuity condition follows from Corollary 5, limited overlap of basis functions and this time also limited overlap of the Jacobi blocks themselves. Also similarly, the stability condition is proven by settingṽ ∶= ((I − E V h )(u h , û h ),  V h (u h , û h )) ∈H and using Corollary 6. The bound ||( which was already shown in the proof of Theorem 3, and the estimate The latter holds because (I − )E V h (u h , û h ) is a normal bubble, that is all its coupling degrees of freedom are zero, and A restricted to such functions is block diagonal. ▪ Corollary 10. LetÂ m andÂ m be the multiplicative versions ofÂ andÂ , respectively, with the Block-Jacobi smoothers M, M replaced by Block-Gauss-Seidel sweeps and let A ≤ C. Then there holds Proof. The former result (62) follows from Theorem 2 and Lemma 7, where condition (44) is fulfilled due to A ≤ C and Corollary 5. The latter one (63) follows along the same lines with Theorem 3 and the strict upper bound in (54) for Lemma 7. ▪ Remark 6. Although we have only experimental evidence that the constant in Theorems 2 and 3 is benign, the proofs of these theorems show that in the || ⋅ || ,h and || ⋅ || ,h, norm they hold independently of . That is, we have results for ASPs for HDG methods with optimal stabilization that are explicit and robust in k.

Non-conformity in boundary conditions
We now return to the case of Γ Ñ ≠ ∅. Instead of enforcing zero tangential Dirichlet conditions on Γ Ñ in V h , it suffices to add a tangential penalty toĀ and for E to zero outV h degrees of freedom on Γ Ñ .

Lemma 9.
For some C > 0, letĀ be defined by the modified bilinear form and 0 ∶  h →  h be the operator that zeros outV h degrees of freedom on Γ Ñ . Then, for C large enough there holds These estimates are robust in k.
Proof. With the upper bound in (28) and (9) there holds A similarly follows from the lower bound in (28) and the fact that, asū h ∈ P 1 ( h ) is of low order, the high order terms in (9) vanish and there holds with a k-robust constant. The estimates for the || ⋅ || A -norm follow from the ones for the || ⋅ || A -norm with energy minimization as in the proof of Corollary 7. ▪ ModifyingĀ and the embedding operators like this, one shows the proofs of Section 6 also for the case Γ Ñ ≠ ∅.

THE LOWEST ORDER CASE
The MCS method of Section 4 is, as already mentioned there, not stable in the lowest order case k = 1. Stability of the method is recovered when a simplified stress tensor = − ∇(u) is used in (1a), but we are interested in treating the full symmetric stress tensor = − (u). The five coupling degrees of freedom per facet we have for k = 1, that is three in V h ⊆ BDM 1 and two enforced byV h , are too few to capture the six rigid body modes. In Reference 9, this was remedied by using a vector-valued W h instead of the K-valued one here, which just means that all occurrences of h have to be replaced by ( h ) everywhere, and taking it as a subset of H(div), providing the missing coupling degree of freedom per facet. Motivated by the fact that the divergence of h = curl(u) ∈ H(div, Ω) vanishes for the true solution u ∈ H 1 (Ω, R d ), a consistent stabilizing term (div( h ), div( h )) 0 was added to the bilinear form. We only briefly sketch how to adapt the preconditioners and their analysis developed here. Since W h ⊆ H(div, Ω) has a coupling degree of freedom per facet, h remains after static condensation and A is a system for this is justified by the discrete Korn inequality We only need to change the "embedding" operator E which now has a W h component and projects into theV h component as for The analysis also needs to be only slightly modified using the equivalence introduced together with the Korn inequality in Reference 9 [lemma 3.1].

NUMERICAL RESULTS
We now present numerical results that were achieved using the Netgen/NGSolve meshing and Finite Element software, 34,35 and the Algebraic Multigrid extension library NgsAMG, 36 available from References 37,38. The computations were performed on the Vienna Scientific Cluster (VSC4). We considered two problems, the first of which is a standard benchmark problem from the literature (see Reference 39) where we investigate the relative performance of different ASP variations and demonstrate robustness in the polynomial degree. The second problem is a flow around an airplane model and is meant to demonstrate the effectiveness of the method even in less academic situations. For both cases, the viscosity is fixed to = 10 −3 , the preconditioner in the conforming auxiliary spaceV h was given by a single Algebraic Multigrid V-cycle. Instead of the difficult to parallelize Block-Gauss-Seidel smoothers inÂ m andÂ m , we use block versions of the scalable semi-multiplicative 1 -smoothers from Reference 40.
AlthoughK −1 is symmetric (see Equation 40) and would be suitable for MINRES, we use GMRES as the Krylov space method since in our experience it performs better, perhaps due to the explicit orthogonalization in every iteration. We chose to solve with a relative tolerance of 10 −6 as is used commonly in the AMG literature (e.g., References 41,42); Figure 1 demonstrates that we generally do not suffer notable loss of convergence until the error has been reduced by more than a factor of 10 −12 .
We show weak scaling results and therefore aim to keep the number of elements per core constant, however are only able to ensure this approximately because of the unstructured simplicial meshes we use.  Table 1 on the right. Only every other computation is shown.

TA B L E 1
Comparison of multiplicative ASPs for the full system A and the condensed system A for the channel problem with k = 2.

Full system
Condensed system The obtained results, listed in Tables 1-3 will be discussed in detail below. For every computation we list the number of elements in the mesh | h | and the number of cores #P. With the Σ h dofs condensed out of the system, the relevant number of dofs is that of  h × Q h which we list as #D. We give the number of iterations of GMRES needed as #IT and the total time to solve the problem (excluding loading of the mesh) as t tot . The latter is further subdivided into the setup time t sup , which includes the assembly of all finite element matrices and the setup of the preconditioner, and the time t sol taken for GMRES iterations.

Full versus condensed system
We first discuss whether preconditioning A via A as described in Section 6.1 is purely convenient for theory or also advantageous in practice. For that, we compare the multiplicative ASPs over a range of problem sizes and fixed polynomial degree k = 2. As can be clearly seen in Table 1, preconditioning via the condensed system leads to considerably better performance and is the approach we take from here on out.

Additive versus multiplicative ASP
The second choice is between additive and multiplicative ASPs, we again fix the polynomial degree to k = 2 for the comparison in Table 2. From the results it is once again clear that the multiplicative preconditioner is superior and our method of choice going forward. For the lowest order case k = 1, we cannot use the MCS method (23) from Section 4 since it is simply not stable, see also Remark 1. Instead, we employ the minimal order MCS method from Reference 9, see Section 7, where, analogous to (23), we added the consistent term d −1 (div(u h ), div(v h )).
Our choice of preconditioner, informed by previous results, is the multiplicative ASP for the condensed system, this time with two smoothing steps. Due to considerably increased memory requirements, different meshes were used for k = 4 than for k = 1, 2.

8.2
Flow around an airplane model The computational domain Ω here is the "air" in a cuboid-shaped box surrounding an airplane model Ω p depicted in The results can be found in Table 4.

CONCLUSIONS
In this work, we introduced and analyzed a series of auxiliary space preconditioners for certain mass conserving stress-yielding discretizations of the Stokes equations. In the norm induced by these MCS methods, the analysis is mostly explicit in the polynomial degree and even yields completely explicit results in the norm induced by certain hybrid discontinuous Galerkin methods that feature optimal stabilization. Numerical experiments demonstrate the robustness of the preconditioners in the polynomial degree. Any two elements in  h,T are connected via a path over a bounded number of other elements in  h,T , and we can bound this last sum by one over facet terms (see Figure A1): where T F,L and T F,R denote the two elements that share the facet F. Any facet is only summed up over a bounded number of times after reordering of the sum and due to the shape regularity of  h , (A4) holds with a single constant for all patches in  h . These jump terms can be bounded by an expansion of ETu at x F ∶= Π 0 F (x), ETu(x) = Π 0 F (ETu) + (curl(ETu)) ⋅ (x − x F ) for x ∈ F, and the elementary estimate ||x − x F || 2 F ∼ h 2 |F| ∼ h d+1 . Writing  F ∶= {T F,L , T F,R } we see Finally, an H 1 trace inequality and (A2) let us bound that is, (A3) holds for our specific choice of R which finishes the proof. ▪ Proof of Lemma 2. An inverse estimate for the piecewise linear  f u ∈ V f h shows As (I − 0 ) f u =  f u on F ∈  D h , and  f u ∈ P 1 ( h , R d ) We bound the second one with an H 1 trace inequality and a scaling argument, where T is the unique element such that F ∈  T . An explicit expansion of the piecewise linear  f u at the facet center of mass x F ∶= Π 0 F x shows The other volume terms ||∇(u − u)|| T in (17) are bounded analogously. ▪

APPENDIX B. TRACE ESTIMATES
The crucial step in the proof of (18) in Reference 20 was to construct for a given w ∈ P k (T, R) aw ∈ P k (T, R) that approximates it in || ⋅ || j,F and is bounded in the H 1 semi-norm and yetw | F = 0. That is, implicitly, for F ∈  h and T ∈  h with F ∈  T , an operator  s,F 0 ∶ P k (T, R) → P k (T, R) with ( s,F 0 u) | F = 0 and ||∇ s,F 0 u|| 2 T + || s,F 0 u − u|| 2 j,F ≲ log k||u|| 2 Proof. To simplify the notation we only show a proof in three dimensions, the two-dimensional case works analogously. It also suffices to show the estimate on a reference tetrahedron, for general tetrahedra it follows from scaling arguments. First, we use  div to get aũ 1 which fulfills (ũ 1 ) n = u n on F and (ũ 1 ) n = 0 onF ∈  T ⧵ {F} with a bounded H 1 norm, We have no control over the tangential traces ofũ 1 and have to add correction terms. For allF ∈  T we pick two arbitrary normalized, orthogonal tangent vectors tF andtF and write the "errors" we need to compensate for on F andF ∈  T ⧵ {F} as We write  s,F for  s applied to the extension by zero to T of functions that vanish onF. This defines an H 1∕2 00 stable extension and construct a correctedũ as where we understand  s,F to be applied to the respective trace onF. The added corrections are normal bubbles because on their associated facet they are a scalar times a tangential and their trace vanishes on all others, that is,ũ n = (ũ 1 ) n ∀F ∈  T andũ is admissible and we need to show that if fulfills (B2