Fast Iterative Solver for the Optimal Control of Time-Dependent PDEs with Crank-Nicolson Discretization in Time

In this article, we derive a new, fast, and robust preconditioned iterative solution strategy for the all-at-once solution of optimal control problems with time-dependent PDEs as constraints, including the heat equation and the non-steady convection--diffusion equation. After applying an optimize-then-discretize approach, one is faced with continuous first-order optimality conditions consisting of a coupled system of PDEs. As opposed to most work in preconditioning the resulting discretized systems, where a (first-order accurate) backward Euler method is used for the discretization of the time derivative, we employ a (second-order accurate) Crank--Nicolson method in time. We apply a carefully tailored invertible transformation for symmetrizing the matrix, and then derive an optimal preconditioner for the saddle-point system obtained. The key components of this preconditioner are an accurate mass matrix approximation, a good approximation of the Schur complement, and an appropriate multigrid process to apply this latter approximation---these are constructed using our work in transforming the matrix system. We prove the optimality of the approximation of the Schur complement through bounds on the eigenvalues, and test our solver against a widely-used preconditioner for the linear system arising from a backward Euler discretization. These demonstrate the effectiveness and robustness of our solver with respect to mesh-sizes, regularization parameter, and diffusion coefficient.

1. Introduction. PDE-constrained optimization is a subject area which has attracted significant interest in recent years (see [19,45] for a comprehensive overview of the field). Due the complex structure and high dimensionality of these problems when an accurate discrete solution is sought, preconditioned iterative solvers have been employed for the 'all-at-once' resolution of such formulations, see for example [37,39] for early work in this direction. In this paper, we consider fast and robust preconditioned solvers for matrix systems arising from the optimize-then-discretize approach applied to time-dependent PDE-constrained optimization problems. We examine the optimal control of the heat equation with Dirichlet boundary conditions, to allow benchmarking of our method against a widely-applied preconditioned iterative method for its solution, and time-dependent convection-diffusion control problems.
When preconditioners are sought for certain time-dependent problems, it is typical to apply a (first-order accurate) backward Euler method in time, as this leads to particularly convenient structures within the matrix and facilitates effective preconditioning; see [33] for a mesh-and β-robust preconditioner for the heat control problem, and [9,32,42,44,47] for applications to different problems. However, the required discretization strategy results in slower convergence in time than space: if a method is second-order accurate in space, it is reasonable to choose τ = O(h 2 ), where τ is the time step and h is the mesh-size in space. This results in a matrix system of huge dimension, and hence a high CPU time is required for its solution. The question we wish to investigate here is whether applying a higher-order Crank-Nicolson method in time is beneficial, due to the reduced dimension of the matrix system required to obtain a similar discretization error. The challenge here is that the much more complex structure of the resulting matrix system makes preconditioning a highly non-trivial task, so a more sophisticated numerical method and preconditioning strategy need to be devised to achieve fast and robust convergence of the iterative solver.
The remainder of this article is structured as follows. In Section 2 we introduce the problems we consider, that is time-dependent convection-diffusion control and heat control, and outline the matrices arising upon discretization of such problems as well as a stabilization technique used for convection-diffusion control problems. In Section 3 we describe the matrix systems obtained upon discretization of the firstorder optimality conditions, and in particular demonstrate a suitable transformation which allows symmetrization of the matrix system obtained from the Crank-Nicolson method. This enables us to apply a symmetric iterative solver such as MINRES [30], which is highly desirable from the perspective of proving convergence of the iterative method. In Section 4 we derive our proposed new preconditioner, using saddle-point theory along with suitable approximations of the (1, 1)-block and Schur complement, and provide eigenvalue results for the preconditioned matrix system. In Section 5 we benchmark our method against the widely-used backward Euler method, coupled with the preconditioner derived in [33], and demonstrate our new preconditioner's efficiency and robustness with respect to all parameters involved in the convectiondiffusion control problem.
2. Problem Formulation. In this article we consider the fast and robust numerical solution of time-dependent PDE-constrained optimization problems. In particular, we examine distributed convection-diffusion control problems of the form: in Ω × (0, t f ), in Ω, where the variables y, y, and u are the state, desired state, and control variables, respectively, β > 0 is a regularization parameter, > 0 is the diffusion coefficient, and w is a divergence-free wind vector (i.e., ∇ · w = 0). The problem is solved on a spatial Here ∂y ∂n (x, t) represents the normal derivative of y on ∂Ω N . The functions f, g D , g N , and y 0 are known.
In (2.2), the term − ∇ 2 y denotes the diffusive element, and the term w · ∇y represents convection. In physical (real-world) problems, as pointed out for example in [10], convection typically plays a more significant physical role than diffusion, so w for many practical problems. However this in turn makes the problem more difficult to solve [10,36] as the solution procedure will need to be robust with respect to the direction of the wind w and any boundary or internal layers that form. The presence of boundary or internal layers is also an issue that we have to deal with when discretizing the convection-diffusion differential operator. Indeed, when solving a convection-diffusion problem, a stabilization procedure is often utilized in order to 'capture' all the layers.
Though we will test our solver on the convection-diffusion control problem (2.1)-(2.2), a particular case of interest arises when = 1, w = 0, and ∂Ω D = ∂Ω, ∂Ω N = ∅, giving the following heat control problem with Dirichlet boundary conditions: This problem provides a valuable benchmark of our method against widely-used preconditioned iterative solution with a backward Euler method in time [33]. We note that our method can be readily generalized to heat control problems with Neumann and mixed boundary conditions.

Discretization matrices and stabilization.
To illustrate the matrices involved in the finite element discretizations of optimal control problems, consider a standard Galerkin finite element discretization for the (steady-state) convectiondiffusion problem: in Ω . (2.5) Letting {φ i } nx i=1 be the same finite element basis functions for y and u, then we would like to find approximations y( , a discretized version of (2.5) is where (excluding the effect of boundary conditions): and the matrix W denotes a possible stabilization matrix for the convection operator. For a heat control problem of the form (2.4), = 1, and N = W = 0. We note that K is generally referred to as a stiffness matrix, and is symmetric positive semi-definite (positive definite unless ∂Ω D = ∅), and M is referred to as a mass matrix, which is symmetric positive definite. The matrix N is skew-symmetric (meaning N + N = 0) in the case of Dirichlet problems; otherwise we obtain that using Green's theorem. In this latter case the spectral properties of the matrix H := N + N , which can be indefinite, will be useful for our subsequent analysis. We note that w · n = w cos θ, where θ denotes the angle between the wind w and the (outer) unit normal vector n on the boundary ∂Ω N , so setting c = max x w , we have that Defining the matrix where the notation Ψ 1 Ψ 2 means Ψ 1 − Ψ 2 is positive semi-definite. Hence, any negative eigenvalues of H are dominated by a constant factor of those of the mass matrix related to the basis functions on ∂Ω N . This observation will be useful when discussing our approach for problems with Neumann or mixed boundary conditions.
Concerning the stabilization scheme used for solving the forward convectiondiffusion problem, a popular stabilized finite element method is the Streamline Upwind Petrov-Galerkin (SUPG) method [20]. However, as pointed out in [8,16], applying this scheme to the (steady-state) control problem gives rise to extra difficulties. Specifically, applying the scheme to the control problem with the discretize-then-optimize strategy leads to symmetric discrete equations in which the discrete adjoint problem is not a consistent discretization of the continuous adjoint problem, whereas the optimize-then-discretize approach gives rise to a different, non-symmetric discretized system which therefore does not possess the structure of a discrete optimization problem. In this work, we thus employ the adjoint-consistent Local Projection Stabilization (LPS) approach described in [4,5,6,34], for which the discretization and optimization steps commute in the stationary case. Further, this stabilized finite elements leads to an order of convergence of O(h 3/2 ) for the L 2 -error [5], which is optimal on general quasi-uniform meshes for the forward problem, see e.g., [49]. In the LPS formulation, the stabilization matrix W is defined as Here, δ > 0 denotes a stabilization parameter, and π h is an orthogonal projection operator. From (2.9), the matrix W can be viewed as a shifted discrete diffusion operator associated with the streamline direction defined by w. It is symmetric and positive semi-definite, as shown by following the working in [10, p. 17]: letting 0 = v ∈ R nx , and setting π i For the convergence of the method, we require π h to be an L 2 -orthogonal (discontinuous) projection operator defined on patches of the domain Ω that satisfies the approximation and stability properties specified in [5], where by a patch we mean the union of elements of our finite element discretization; the projection operator π h is left free to be discontinuous on the edges of the patches. In our implementation we will make use of Q 1 elements, so the domain is divided into patches consisting of 2 elements in each dimension. To ensure the aforementioned properties are satisfied, as in [4] we define π h as where π h (v)| P is the restriction of π h (v) to the patch P, and |P| is the (Lebesgue) measure of the patch. We refer again to [5] for the theoretical proof of the convergence of this method with this definition of the local projection operator. As in [10, p. 253], we choose δ locally on each patch P k as δ k , with where w k is the 2 -norm of the wind at the patch centroid, h k is a measure of the patch length in the direction of the wind, and P k = w k h k 2 is the patch Péclet number. We observe that, with this (non-constant) choice of δ, the matrix W is still positive semi-definite. To prove this, it is sufficient to define π i,k h = π h (w · ∇φ i )| P k and π k := nx i=1 v i π i,k h locally on each patch and then proceed as above, obtaining The spectral properties of the matrices K, M , and W (whether Dirichlet, Neumann or mixed boundary conditions are imposed) will be useful later, when discussing the optimality of our preconditioning approach. Informed by the definitions of this section, we make the following assumption when carrying out the theoretical analysis in the remainder of this paper, and later discuss how our methodology could be applied if the assumption is relaxed: Assumption 1. When a convection-diffusion control problem is being solved we assume w is such that w · n = 0 on ∂Ω N . We allow heat control problems to satisfy any boundary conditions of the form (2.2), and for convection-diffusion control we remove this restriction on the wind w for a pure Dirichlet problem (i.e., ∂Ω N = ∅).
When Assumption 1 holds, the matrix H = 0. There are a number of wind vectors w that satisfy the property above on the whole of ∂Ω, see for example the well-known 'recirculating wind' example described in [10, p. 240].

First-Order Optimality Conditions and Discretizations in Time.
We now describe the strategy used for obtaining an approximate solution of (2.1)-(2.2). We apply an all-at-once approach coupled with the optimize-then-discretize scheme, in which the continuous Lagrangian is used to arrive at first-order optimality conditions, which are then discretized. For simplicity the working of this section considers Dirichlet boundary conditions, that is ∂Ω D = ∂Ω, but it may be readily extended to problems where ∂Ω D ⊂ ∂Ω. Introducing the adjoint variable (or Lagrange multiplier ) p, we consider the Lagrangian associated to (2.1)-(2.2) as in [5]. Then, by deriving the Karush-Kuhn-Tucker (KKT) conditions, the solution of (2.1)-(2.2) satisfies: where we have substituted the gradient equation βu − p = 0 into the state equation. Problem (3.1) is a coupled system of (time-dependent) PDEs, consisting of a forward PDE combined with a backward problem for the adjoint. Due to this structure, when trying to find numerical approximation of the solution, an A-stable method is usually applied for discretizing the time derivative, since this will allow the user to choose a time step that is independent of the spatial mesh-size. Further, in order to obtain a consistent system of linear equations, both functions y and p are approximated at the same time points.
We now introduce methods for approximating the time derivative when solving (3.1), and describe the resulting matrix systems, starting with the widely-used backward Euler method in Section 3.1, followed by the Crank-Nicolson method in Section 3.2. For the remainder of the paper, we discretize the interval (0, t f ) into n t subintervals of length τ = t f nt , and we use the notation y n ≈ y(x, t n ), p n ≈ p(x, t n ) for our approximations for all x ∈ Ω, with t n = nτ .

Backward Euler discretization.
Many widely-used preconditioned iterative methods for the solution of time-dependent PDE-constrained optimization problems of type (2.1)-(2.2) involve a backward Euler discretization for the time variable [9,32,33,42,44,47]. Applying this scheme gives the following system of equations 1 : where L E = τ L + M , y 0 is the discretization of the initial condition for y, and With this notation, we obtain the following (symmetric) linear system: where and the right-hand side vectors b E 1 and b E 2 take into account the initial and final-time conditions on y and p, as well as information from the desired state y and the force function f . Note that we have symmetrized the system by replacing the discretized initial and final-time conditions with The structure of the previous system is very convenient from the point of view of numerical linear algebra because it facilitates effective preconditioning.
3.2. Crank-Nicolson discretization and symmetrization of the system. Despite the convenient structure of the matrix system arising from the backward Euler method, the essential drawback is that the method is only first-order accurate in time. For instance, if a second-order accurate method is used for the spatial discretization, the numerical error is of order O(τ ) + O(h 2 ), given sufficient smoothness properties of the solution. Therefore, it is reasonable to choose τ = O(h 2 ), resulting in a matrix system of huge dimension, and hence an extremely high CPU time is required for its solution. In recent years, a significant effort has been invested in improving the accuracy of the discretized solution of time dependent PDE-constrained optimization problems involving the backward Euler method: see [14] for an application of deferred correction to time-dependent PDE-constrained optimization problems, [9,41,42,43] for low-rank tensor approaches to speed up the convergence of backward Euler, [17] for a multigrid technique applied to optimal flow control problems, and [48] for a preconditioned iterative solver for problem of the type (2.3)-(2.4) that uses a reduction in dimensionality of the arising system.
We also highlight recent works in which higher-order time discretizations are considered. For example, in [13] the authors derive a parallel-in-time algorithm coupled with a gradient-based method, using a Lobatto IIIA scheme in the time variable, and in [9] the authors note that their low-rank method can also be adopted to the Crank-Nicolson approach. Other valuable approaches for addressing time-dependent PDEconstrained optimization problems include multiple shooting methods [15], parareal schemes [23,24], and ideas from instantaneous control [7,18]. However, to the best of our knowledge, the crucial question of preconditioning PDE-constrained optimization methods by exploiting the precise matrix structures arising from higher-order discretization schemes in time has not been resolved, perhaps due to the increased complexity in the structure of the matrix systems, an issue pointed out in [17] for instance.
Remark 1. We note here that both backward Euler and Crank-Nicolson are unconditionally A-stable, that is, no restriction on τ and h is required. There are often other stability properties to consider when selecting an appropriate time-stepping scheme: for instance Crank-Nicolson, as opposed to backward Euler, is not L-stable, which is an important consideration when applying long-range integrations. In this work, we do not address specific questions about stability properties of the two methods, but rather whether it is possible in principal to take advantage of a higher-order discretization scheme in the time variable for increasing the rate of convergence of solvers for the optimal control of time-dependent PDEs. It is likely that our proposed method could be extended to time-stepping methods that achieve even faster convergence than Crank-Nicolson, including L-stable methods. We highlight the reference [2], which considers the question of preconditioning matrix systems arising from Lstable methods for forward evolutionary PDEs.
Remark 2. We note that, in order for Crank-Nicolson to achieve a second-order convergence rate, we require the state y and the adjoint variable p in (3.1) to have sufficient smoothness. Therefore, in the following analysis, we allow the assumptions of regularity and the results in [1, Sec. 3] to hold for the problem considered.
Considering again (3.1), we now discretize the time derivative using the Crank-Nicolson method. Denoting we have that the numerical solution of (3.1) satisfies y n + y n+1 + (L + ) p n + (L − ) p n+1 =M y n + y n+1 , n = 0, ..., n t − 1, with M p nt = 0 and M y 0 = M y 0 an appropriate discretization of the final and initial conditions on p and y, and f n defined as in (3.2). In matrix form, we write where the vectorsȳ andp are the numerical solution for the state and adjoint variables, and as before the right-hand side accounts for the initial and final-time conditions on y and p, as well as the desired state y and force function f . The matrices Λ ij , i, j = 1, 2, are given bȳ We immediately realize that the matricesΛ 11 andΛ 22 are not symmetric, and therefore no iterative method for symmetric matrices, such as MINRES, may be applied to (3.4). We therefore wish to apply a transformation to (3.4) to convert it to a symmetric system. We partition the matrix in (3.4) as Then if we eliminate the initial and final-time conditions on y and p, we can rewrite with the vectors y, p, b 1 , b 2 modified accordingly, and the matrices Λ ij , i, j = 1, 2, given by In order to symmetrize the system, we now apply the following linear transformation: where T 1 , T 2 ∈ R (ntnx)×(ntnx) , and I ∈ R nx×nx denotes the identity matrix. Then, where Furthermore, it holds that We have thus transformed (3.4) to a symmetric system using matrices T 1 , T 2 which are easy and cheap to apply and invert. We also highlight the further properties that where so we may work with the matrices A and C cheaply, using T 1 , T 2 , A D , C D . Further, since both A D and C D are symmetric positive definite, the same holds for A and C. So, in order to find an approximate solution to (3.1), we may now consider the saddle-point system (3.7), to which we can apply a preconditioned Krylov subspace method for symmetric indefinite matrices, such as MINRES.
4. Preconditioning Approach. In this section we describe an optimal preconditoner for the system (3.7), by making use of saddle-point theory as well as suitable approximations for the blocks of the matrix A.

Saddle-point systems.
We know that if we wish to solve an invertible system of the form (3.7), with invertible A, a good candidate for a preconditioner is the block diagonal matrix: where S denote the (negative) Schur complement S = C + BA −1 B . Indeed, if C = 0 and S is also invertible, the preconditioned matrix P −1 A will have only 3 distinct eigenvalues [21,25]: where we denote the set of eigenvalues of a matrix by λ(·). If instead A and C are symmetric positive definite, it holds that (see [3,40] and [31, Theorem 4]): In these cases, a preconditioned Krylov subspace method for symmetric indefinite matrices with preconditioner (4.1) would converge in few iterations (at most 3 if C = 0). We further observe that, if A and C are symmetric positive definite, the preconditioner P also has that property, so a natural choice of iterative solver is the MINRES algorithm. Of course, as in (4.1) the preconditioner P is not practical, as the exact inverses of A and S will need to be applied in each case. Applying S −1 is particularly problematic as even when A and C are sparse, S is generally dense. For this reason, we wish to find a suitable approximation P of P, with P = A 0 0 S , or, more precisely, a cheap application of the effect of P −1 on a generic vector. In the following section we commence by finding a good approximation of the inverse of the matrix A, then in Section 4.3 we describe the approach used for approximating the Schur complement S. Since we will benchmark our new preconditioning strategy against the preconditioner derived in [33] for heat control (i.e., L = K) after applying the optimizethen-discretize approach with backward Euler time-stepping, we briefly describe the preconditioner for the system (3.3), itself also a saddle-point system. In this case the (1, 1)-block is not invertible; a good preconditioner is found to be Here, 0 < ξ 1 is chosen such that the (invertible) matrix A E is 'close enough' to the matrix A E in some sense. The (1, 1)-block is approximated by a 'perturbed' matrix A E , whose inverse is then applied within the matrix S E . A term of the form (4.3) is justified in Section 4.3.

4.2.
Approximation of the (1, 1)-block. We now focus on devising a preconditioner for the matrix system (3.7), arising from a Crank-Nicolson discretization, starting with a cheap and effective approximation of the matrix A. We recall from (3.8) that A can be written as A = T 1 A D T 1 , with A D defined as in (3.9). We observe that A D is a block diagonal matrix with each diagonal block a multiple of M . Therefore, in order to obtain a cheap approximation for A, we require a suitable way to approximate the inverse of a mass matrix. As discussed in [11,12,46], the Chebyshev semi-iterative method represents a cheap and effective way to do this. From the discussion above, a good approximation of A is therefore given by where M c denotes a fixed number of steps (20, in our tests) of Chebyshev semiiteration applied to M . Further, since T 1 and T 2 = T 1 are (block) upper-and lower-triangular matrices respectively (which makes them very easy to invert), we use backward and forward (block-)substitution in order to apply their inverses. It is trivial to prove the following, on the effectiveness of our approximation of A: Lemma 4.1. The minimum (maximum) eigenvalue of A −1 A is bounded below (above) by the minimum (maximum) eigenvalue of M −1 c M . 4.3. Approximation of Schur complement. We now find a suitable approximation for the Schur complement S of (3.7). In the forthcoming theory, we assume that Assumption 1 holds, and later discuss the case where this is relaxed. Recalling how the matrices A, B, and C can be written as linear transformations involving the matrices T 1 and T 2 , we first rewrite S in the following way: where we set It is hence clear that if we find a symmetric positive definite approximation S int of then S := T 2 S int T 2 is a symmetric positive definite approximation of S. We may therefore use the (generalized) Rayleigh quotient in order to find an upper and lower bounds for the eigenvalues of the matrix S −1 S as follows. Let λ be an eigenvalue of S −1 S with x the corresponding eigenvector; then where we set x = T 2 x = 0 and use (4.4) together with the definition of S. Thus upper and lower bounds for the eigenvalues of S −1 S are given by the maximum and the minimum eigenvalues of S −1 int S int , respectively. From (4.4), and due to the convenient structure of the matrices A D and C D in (3.9), we can devise an approximation S int of S int using the 'matching strategy' discussed in [33,35] as follows. We seek an approximation: . We therefore wish that Using the definitions of C D and A D from (3.9), we obtain that for this to hold the matrix M (T 1 ) −1 is given by and therefore Finally, our approximation of S is given by with M as defined in (4.11), and the two expressions are equivalent since T 2 commutes with both Λ 21 and M . To understand the effectiveness of this approximation, we recall (4.7), telling us that we only need to study the spectrum of the matrix S −1 int S int . We next rewrite S int as follows: where we have used (4.9), (3.8), (4.5), and (4.10) in turn.
Since S int and S int are symmetric positive definite, we again consider the generalized Rayleigh quotient: where a = ( x, noting from (3.9) and (4.10) that C In order to find an upper bound for the Rayleigh quotient (4.14), we return to (4.13). Noting that we can rewrite Following the reasoning in [33], we can prove that R ≤ 1; from (4.14), this holds if where the last line uses (4.5) and sets z = T −1 2 x. Therefore, we wish to show that the matrix B = Λ 21 T 2 + T 2 Λ 21 is positive semi-definite. We easily obtain that with L = L + L = 2( K + W ) since both K and W are symmetric, and N is skewsymmetric due to Assumption 1. Furthermore, since K is positive definite in this case, with W positive semi-definite, L is also positive definite. Moreover, it is clear that The second inequality above is clear, since M is a block diagonal matrix with n t − 1 zero-blocks and the last block consisting of a positive definite matrix. To prove the inequality z L z ≥ 0, we first observe that L = T ⊗ L, with and therefore T is symmetric positive definite (since T 1 has full rank). Next, we state and use [22,Theorem 13.12] to prove that L is positive definite: Theorem 4.2. Let X ∈ R m×m have eigenvalues λ i , i = 1, . . . , m, and let Y ∈ R r×r have eigenvalues µ j , j = 1, . . . , r. Then, the mr eigenvalues of X ⊗ Y are λ 1 µ 1 , . . . , λ 1 µ r , λ 2 µ 1 , . . . , λ 2 µ r , . . . , λ m µ r .
Since both T and L are symmetric positive definite, applying Theorem 4.2 with X = T and Y = L tells us that the matrix L = T ⊗ L is symmetric positive definite. We therefore infer that B is symmetric positive definite, and hence that with a and b as defined above. Finally, the last inequality guarantees that the Rayleigh quotient R in (4.14) satisfies R ≤ 1.
We have hence proved the following result: Theorem 4.3. Let S int and S int be defined as in (4.6) and (4.8) respectively, with the matrices A D , B D , C D , Λ 21 , A, M defined as in (3.9), (4.5), (3.5), (3.8), and (4.11). Then, given Assumption 1: In Figure 4.1 we report the eigenvalue distribution of S −1 int S int for a range of values of β with diffusion coefficient = 1 100 , for a particular Dirichlet test problem. Further, using Theorem 4.3 and (4.7), we can prove the following: Corollary 4.4. Let S and S be defined as in (4.4) and (4.12), with the matrices defined as in Theorem 4.3, and T 1 , T 2 as in (3.6). Then, given Assumption 1: By Corollary 4.4, the matrix S in (4.12) is an effective approximation of the Schur complement S defined in (4.4). We highlight that solving (exactly) a system involving the matrix S is costly, so we look for a cheap approximation of the effect of S −1 on a generic vector. From (4.12), it is clear that the bulk of the work involves approximately , on an evenly spaced space-time grid (−1, 1) 2 × (0, 2) with τ = h = 1 8 .
applying the inverse of Λ 21 + M and its transpose. From (3.5) and (4.11), we have that Λ 21 + M is block-lower triangular, and a cheap application of its inverse on a vector is given by block-forward substitution, with each block diagonal approximated using a fixed number of V-cycles of a multigrid routine, for example. The matrix (Λ 21 + M ) can be handled analogously using block-backward substitution. Remark 3. Theorem 4.3 and Corollary 4.4 also hold if no stabilization is applied to the convection-diffusion control problem.
Remark 4. Let us now briefly discuss the applicability of our method if Assumption 1 does not hold, that is w · n = 0 on a portion of ∂Ω N . As above, this results in the matrix N not being skew-symmetric. Therefore, assuming (non-trivially) a discretization of the adjoint operator has been performed such that one is examining a symmetric matrix system, from (2.8) we can derive that so a sufficient condition for the matrix L to be positive semi-definite is that 2( K + W ) cM ∂Ω N . Even if the matrix L were not positive semi-definite, one could also argue that the upper bound on the eigenvalues in Theorem 4.3 (and hence that of Theorem 4.4) would be only slightly larger than 1 for moderate w and large diffusion coefficient , as the contribution (in terms of eigenvalues) of the matrix H to L is no greater than that of the (very sparse) mass matrix on ∂Ω N multiplied by w .
If the discretization of the adjoint operator is not such that the resulting matrix system is not symmetric, we would need to apply a non-symmetric Krylov solver such as GMRES [38], however it is reasonable to suppose that the system will not be highly non-normal. Therefore, we would expect a preconditioner of similar form to that derived in this section could be potent for such problems, but we highlight that the convergence of the Krylov method would not be guaranteed by the clustering of eigenvalues of A −1 A and S −1 S, as for symmetric problems.
5. Numerical Results. We now provide numerical evidence of the effectiveness of our preconditioning strategy. In Section 5.1, we show how this preconditioner results in more rapid convergence than the state-of-the-art backward Euler solver with the existing preconditioner of [33], for the heat control problem. In Section 5.2, we show the robustness of our strategy with respect to all the parameters involved, for the time-dependent convection-diffusion control problem.
In all our tests we consider only Dirichlet boundary conditions (i.e., ∂Ω N = ∅), but we emphasize again that the method is easily generalized to Neumann and mixed boundary conditions (with the caveats previously outlined for convection-diffusion control). We implement a finite element method, using Q 1 basis functions for state, control, and adjoint variables. As discussed in Section 4.2, when approximating the (1, 1)-block it is trivial to invert both matrices T 1 and T 2 = T 1 , and we apply 20 steps of Chebyshev semi-iteration to each mass matrix on the diagonal of A D . For the approximation of the Schur complement, we employ block-forward and block-backward substitution to solve for the matrix Λ 21 + M and its transpose, and approximate each block on the diagonal with 3 V-cycles of the AGMG algebraic multigrid routine [26,27,28,29]. All tests are run on MATLAB R2018b, using a 1.70GHz Intel quad-core i5 processor on an Ubuntu 18.04.1 LTS operating system.

Heat control.
We first benchmark our method against the backward Euler method coupled with the bespoke, mesh-and β-independent preconditioner derived in [33], for the heat control problem (2.3)-(2.4). Here, d = 2 (so x = [x 1 , x 2 ] ), Ω = (−1, 1) 2 , t f = 2, f = 0, and where s(x 1 , x 2 ) = cos ( πx1 2 ) cos ( πx2 2 ). The analytic solutions to this problem are: with initial condition y 0 and Dirichlet boundary condition g obtained from this y. We observe that in this case the discretized system obtained after applying Crank-Nicolson is (3.7), with L = K. In our tests, given a (spatial) uniform grid of mesh-size h = 2 1−l , with l the level of grid refinement, we set τ = h 2 for the backward Euler scheme and τ = h for Crank-Nicolson, motivated by the predicted convergence rates of these methods. We test both schemes with a range of values of h and β, running preconditioned MINRES to a tolerance of 10 −6 on the relative preconditioned residual norm. We take ξ = 10 −3 in (4.2)-(4.3) for the backward Euler implementation. Tables 5.1-5.3 present the number of MINRES iterations it required by each method for a range of β, the CPU time taken in seconds, and the relative errors y error and p error (in the scaled vector ∞ -norm) obtained for the state and adjoint variables.
We see from Tables 5.1-5.3 that the Crank-Nicolson approach with our new preconditioner achieves more accurate solutions than the backward Euler method, in lower CPU time. Comparing the results for grid refinement l = 4, 5, 6, that is, for h = 2 −3 , 2 −4 , 2 −5 , this can occur in orders of magnitude lower CPU time for the tests presented here. This is clear because the size of the system required to obtain a fixed accuracy should grow like O(h −4 ) for backward Euler as opposed to O(h −3 ) for Crank-Nicolson, and the effectiveness of our new preconditioner allows this to materialize in terms of CPU time. For instance, with this Ω and t f , the choice h = 2 −5 leads to a (Schur complement) system of dimension 8, 132, 481 for backward Euler and 254, 016 for Crank-Nicolson; as our preconditioner is optimal for all values of β, h, and τ , this leads to a substantial speed-up. Further, our preconditioned Crank-Nicolson is also able to obtain a solution for levels of refinement (e.g., l = 7) for which preconditioned backward Euler runs out of memory. We can therefore conclude that the Crank-Nicolson method, coupled with our new preconditioner, is significantly more potent that the widely-used preconditioned backward Euler method for all values of β.
We again run preconditioned MINRES to a tolerance of 10 −6 , constructing a (spatial) uniform grid of mesh-size h = 2 1−l at level l, and setting τ = h in all the tests presented. In Figure 5.1-5.2 we show the numerical solutions for the state y and adjoint variable p at time t = 1, for β = 10 −2 and l = 5, with both = 1 20 and = 1 100 . In Tables 5.4-5.6 we report the number of iterations it required for achieving convergence with values of = 1 20 , 1 100 , and 1 500 , for a range of l and β.  Table 5.5: Convection-diffusion control problem: MINRES iterations required with = 1 100 , for a range of l and β. β l 10 −1 10 −2 10 −3 10 −4 10 −5 10 − 6  3  23  26  23  21  15  10  4  25  26  26  25  21  15  5  24  27  26  26  23  18  6  24  25  26  26  26  22  7  23  25  26  26  26  25   Table 5.6: Convection-diffusion control problem: MINRES iterations required with = 1 500 , for a range of l and β. β l 10 −1 10 −2 10 −3 10 −4 10 −5 10 − 6  3  23  26  23  21  15  10  4  25  26  26  25  21  15  5  25  27  27  26  25  19  6  26  27  27  27  27  22  7  26  27  27  27  25  25 As can be seen from Tables 5.4-5.6, our new preconditioner is highly effective and robust, leading to convergence for all tests in at most 27 iterations. For β = 10 −5 or 10 −6 , and larger values h, convergence is achieved in a lower number of iterations: this is not surprising as for these values the Schur complement is spectrally 'close' to a mass matrix, making the problem easier to solve. Apart from this, we notice that the number of iterations is independent of the parameters involved. We therefore deduce that our method is a potent one for the resolution of time-dependent convectiondiffusion control problems, a class of problems which consists of substantial numerical difficulties. The number of iterations required to solve these problems is independent of mesh-size h, regularization parameter β, and diffusion coefficient .
6. Conclusions. We have applied an optimize-then-discretize strategy to tackle the optimal control of time-dependent PDEs, coupled with a Crank-Nicolson scheme in time. We have devised an invertible linear transformation that symmetrizes the resulting matrix system, and derived a new, fast, and robust preconditioner for the saddle-point matrix, which possesses a complex structure. We also have proved that the Schur complement approximation used is optimal with the respect to all parameters involved through bounds on the eigenvalues, and therefore that the preconditioner is optimal and scales linearly in CPU time with respect to matrix dimension. Finally, we have presented numerical results to demonstrate the effectiveness and speed of our preconditioned Crank-Nicolson method. Future work includes extending this new approach to problems with more complex PDE constraints, boundary control problems, and problems with additional algebraic constraints on the state and control variables.