Low‐rank solution of an optimal control problem constrained by random Navier‐Stokes equations

We develop a low‐rank tensor decomposition algorithm for the numerical solution of a distributed optimal control problem constrained by two‐dimensional time‐dependent Navier‐Stokes equations with a stochastic inflow. The goal of optimization is to minimize the flow vorticity. The inflow boundary condition is assumed to be an infinite‐dimensional random field, which is parametrized using a finite‐ (but high‐) dimensional Fourier expansion and discretized using the stochastic Galerkin finite element method. This leads to a prohibitively large number of degrees of freedom in the discrete solution. Moreover, the optimality conditions in a time‐dependent problem require solving a coupled saddle‐point system of nonlinear equations on all time steps at once. For the resulting discrete problem, we approximate the solution by the tensor‐train (TT) decomposition and propose a numerically efficient algorithm to solve the optimality equations directly in the TT representation. This algorithm is based on the alternating linear scheme (ALS), but in contrast to the basic ALS method, the new algorithm exploits and preserves the block structure of the optimality equations. We prove that this structure preservation renders the proposed block ALS method well posed, in the sense that each step requires the solution of a nonsingular reduced linear system, which might not be the case for the basic ALS. Finally, we present numerical experiments based on two benchmark problems of simulation of a flow around a von Kármán vortex and a backward step, each of which has uncertain inflow. The experiments demonstrate a significant complexity reduction achieved using the TT representation and the block ALS algorithm. Specifically, we observe that the high‐dimensional stochastic time‐dependent problem can be solved with the asymptotic complexity of the corresponding deterministic problem.


Summary
We develop a low-rank tensor decomposition algorithm for the numerical solution of a distributed optimal control problem constrained by two-dimensional time-dependent Navier-Stokes equations with a stochastic inflow. The goal of optimization is to minimize the flow vorticity. The inflow boundary condition is assumed to be an infinite-dimensional random field, which is parametrized using a finite-(but high-) dimensional Fourier expansion and discretized using the stochastic Galerkin finite element method. This leads to a prohibitively large number of degrees of freedom in the discrete solution. Moreover, the optimality conditions in a time-dependent problem require solving a coupled saddle-point system of nonlinear equations on all time steps at once. For the resulting discrete problem, we approximate the solution by the tensor-train (TT) decomposition and propose a numerically efficient algorithm to solve the optimality equations directly in the TT representation. This algorithm is based on the alternating linear scheme (ALS), but in contrast to the basic ALS method, the new algorithm exploits and preserves the block structure of the optimality equations. We prove that this structure preservation renders the proposed block ALS method well posed, in the sense that each step requires the solution of a nonsingular reduced linear system, which might not be the case for the basic ALS. Finally, we present numerical experiments based on two benchmark problems of simulation of a flow around a von Kármán vortex and a backward step, each of which has uncertain inflow. The experiments demonstrate a significant complexity reduction achieved using the TT representation and the block ALS algorithm. Specifically, we observe that the high-dimensional stochastic time-dependent problem can be solved with the asymptotic complexity of the corresponding deterministic problem.

INTRODUCTION
We consider the numerical simulation of a stochastic optimal control problem (SOCP). More precisely, our goal in this work is to efficiently solve the optimal control of an incompressible flow problem governed by time-dependent Navier-Stokes equations with random inputs. This problem can be computationally challenging due to the nonlinear constraint. 1 The computational complexity associated with the problem further stems from the uncertain inputs. 2,3 A viable solution approach to optimization problems with stochastic constraints employs the spectral stochastic Galerkin finite element method (SGFEM). In particular, the steady-state Navier-Stokes equations were solved with SGFEM in References 4-7. However, this intrusive approach * leads to the so-called curse of dimensionality, in the sense that the number of expansion coefficients of the discrete solution grows exponentially in the number of random variables and quickly becomes intractable for direct calculation. 2,8,9 It is worth pursuing computationally efficient ways to simulate optimization problems with stochastic constraints using SGFEMs since the Galerkin approximation yields the best approximation with respect to the energy norm, as well as a favorable framework for error estimation. 10 In order to cope with the curse of dimensionality, we exploit the underlying mathematical structure of the discretized optimality system. Since the weak solution of the parametrized PDE is often a holomorphic function with respect to the parameters, 11 a suitable family of orthogonal polynomials is the mostly used ansatz. One approach to reduce the curse of dimensionality is to sparsify explicitly the set of polynomials, by bounding, for example, the total degree. [4][5][6][12][13][14] Alternatively, one can consider the full Cartesian space of polynomials with bounded individual degree, but never store the expansion coefficients explicitly. Instead, one approximates the tensor of coefficients by a low-rank decomposition. We develop a low-rank technique based on recent advances in numerical tensor methods 15,16 for the efficient solution of the SOCP. Our aim in this article is to lift the curse of dimensionality inherent in the SOCP and allow for the efficient simulations of the model on an average desktop computer. Such simulations would enhance the understanding of the underlying physical model as the computed data can then be used for the quantification of the statistics of the system response.
In the last decade, the use of SGFEM discretization with low-rank approximations to solve high-dimensional and parametrized problems was studied extensively (see References 2,7,[17][18][19] and the references therein). In particular, References 18 and 19 approximate the solution in the canonical tensor format. Later on, compression with the tensor train and hierarchical tensor formats was carried out. On the other hand, Reference 19 focuses on the convergence rate of approximations by finite sums of rank-1 tensors solution of the problems. Adaptive tensor-train method presented in Reference 17 is aimed at minimizing the error resulting from the numerical approximation of the solution of the SGFEM discretization of parameterized PDEs. In Reference 2, the authors consider the numerical solution of Stokes-Brinkman optimal control problem with random inputs. By employing the SGFEM discretization and the tensor-train format, the article efficiently approximates the numerical solution of the high-dimensional linear system by significantly reducing the storage requirements of the original problem. In the present article, the authors extend the ideas developed in Reference 2 to the more difficult case of optimization problems constrained by Navier-Stokes equations with random inputs. Moreover, we prove well posedness of the approach suggested in Reference 2 under the problem formulation under consideration.
Alternative approaches to tackle optimization problems with stochastic constraints include stochastic collocation schemes, [20][21][22] as well as Monte Carlo methods. 23 These methods are essentially sampling based and nonintrusive. However, for optimization problems, the SGFEM exhibits superior performance compared with the stochastic collocation method. 9 This is due to the fact that, unlike SGFEM, the nonintrusivity property of the stochastic collocation method is lost when moments of the state variable appear in the cost functional or when the control is a deterministic function. On the other hand, Monte Carlo methods are relatively straightforward to implement. However, they generally converge rather slowly and do not exploit the regularity with respect to the parameters that the solution might have. 22 Low-rank tensor methods for optimal control problems with uncertain inputs have also been considered in Reference 24, where the authors focus mainly on a solver for a stochastic optimal control problem with a low-dimensional but pointwise constrained control. Such problems can be tackled via, for instance, semismooth Newton algorithms. [25][26][27] In our work, however, we do not consider the case of state-or control-or mixed control-state-constrained problems. [28][29][30] The rest of the article is organized as follows. First, we present in Section 2 the PDE-constrained optimization problem that we would like to solve, as well as the necessary mathematical concepts and notation on which we shall rely in the rest * SGFEM techniques are "intrusive" in the sense that the software packages for the associated deterministic problems cannot be directly reused, since all spatial and stochastic degrees of freedom are coupled. of our discussion. Next, we proceed to Section 3 to discuss the SGFEM discretization of the problem. Section 4 introduces the TT decomposition, the (block) ALS algorithms, and the preconditioner, which we use to tackle the high-dimensional saddle point systems arising from the SGFEM discretization of the optimization problem. Finally, in Section 5, we present numerical experiments to illustrate the performance of the low-rank approach.

PROBLEM STATEMENT AND MATHEMATICAL DESCRIPTION
This article investigates the distributed optimal control of an incompressible flow problem constrained by time-dependent Navier-Stokes equations with uncertain inflow in an open bounded spatial domain D ⊂ R 2 with a piecewise-Lipschitz boundary Υ = D and an outward normal vector ⃗ n. We denote the spatial coordinates as x = (x 1 , x 2 ) ∈ D. Let (Ω,  , P) be a complete probability space, where Ω is the set of outcomes,  ⊆ 2 Ω is the -algebra of events, and P ∶  → [0, 1] is a uniform probability measure. Moreover, we consider a finite time interval (0, T) and seek a control function u( with the norm defined as and u, v, and p are constrained, P-almost surely, through the nonstationary Navier-Stokes equations: Here, we let Υ be decomposed into disjoint parts corresponding to inflow Υ in , solid wall Υ w , and outflow Υ out . The time derivative is denoted by t . Naturally, we assume that ||v|| < ∞, ||u|| < ∞ and ||p|| < ∞. Each of the functions v, u, p is a random field, since so is the inflow boundary condition (x, , t) ∶ Υ in × Ω × (0, T) → R 2 , specified below. The regularization parameter in (1) balances between minimization of the vorticity and penalization of the control magnitude. The cost  (u, v) is a deterministic functional with stochastic arguments u, v, and the parameter is a deterministic kinematic viscosity.
Our domains of interest are the flow around a cylinder illustrated in Figure 1 and the backward facing step shown in Figure 2 (cf Reference 31). These are standard benchmarks and illustrate many of the difficulties that arise when solving Stokes and especially Navier-Stokes control problems. 32,33 Furthermore, in our experiments, we use a zero initial condition and apply instead an exponentially growing stochastic inflow ] (see Figures 1 and 2), where (x, ) is a random field on Υ in × Ω. The mean field of is set to the corresponding benchmark function associated with each of the domains in Figures 1 and 2. The (square integrable) remainder is assumed to have a convergent Fourier expansion, parametrized by independent identically distributed (i.i.d.) random variables { 1 , 2 , …}. For concreteness, we assume uniformly distributed ∼  (−1, 1), although we can generalize this to any zero-mean i.i.d. probability measure. In particular, for the obstacle domain ( Figure 1) we assume a parabolic mean inflow, The parameter governs the rate of decay of the stochastic Fourier expansion (SFE), and hence it can be referred to as the smoothness of the random field. The offset 1∕2 is chosen to make the decay rate of the expansion (3) similar to that of the Karhunen-Lòeve expansion of typical one-dimensional covariance functions, for example, Matérn. 17 In computational practice, we truncate (3) after m ∈ N terms such that the error is sufficiently small: In the backward step domain problem (Figure 2), we use a similar benchmark inflow field 33 In both Equations (4) and (5), the random vector ∶= ( 1 , 2 , … , m ) ∶ Ω → Γ ⊂ R m is characterized by the joint probability density function ( ) = 1 2 m on the Cartesian joint image Γ = [−1, 1] m . The expectation of any measurable function f( ) is then given by the integral The assumption that (x, ) admits a stochastic representation as above allows us to transform the stochastic optimal control problem into a deterministic problem that now depends on the parameter vector ∶= ( 1 , 2 , … , m ) ∈ R m . For brevity, since we consider m as an a priori model parameter, we always denote m by in the rest of the article. Parametrization of (1) and (2) using (4), (5) leads to the following finite dimensional deterministic optimal control problem: and the corresponding boundary and initial conditions from (2). For the numerical simulation of the SOCP given by (6) and (7), we will adopt the so-called optimize-then-discretize (OTD) strategy, where we first build an infinite dimensional Lagrangian and then consider its variation with respect to state, pressure, control, and two Lagrange multipliers that can be identified as the adjoint velocity and pressure . 34,35 We observe that the optimization problem under consideration is nonlinear due to the nonlinearity of the convective term (v ⋅ ∇)v. Both Newton and Picard iterations have shown to be good iterative solvers to tackle these nonlinear equations. 31 Since the Picard iteration has a larger radius of convergence compared with the Newton iteration, our choice in this contribution is the Picard iteration. We apply the so-called Karush-Kuhn-Tucker procedure 31(chap8.2) to (6) and (7) to obtain the following linear optimality system: 36 where v denotes the velocity from the previous Picard iteration. Having solved this system, we update v = v and so on until convergence.

STOCHASTIC GALERKIN DISCRETIZATION OF THE SOLUTION
A finite dimensional system is obtained by making (8)-(10) Galerkin-orthogonal to a finite polynomial subspace in the parameters , the so-called generalized polynomial chaos, 37 as well as a finite element basis in x. In many variables, the polynomial chaos can be introduced in different ways. The most straightforward approach is to use a Cartesian product of univariate subspaces. Choosing a maximal individual degree d, we construct a space where j is a univariate polynomial of degree j , and j is the total index of the multivariate polynomial, constructed from individual indices of univariate polynomials, Note that j = 0, … , d m − 1, where d m is indeed the cardinality of the Cartesian space. We assume that the polynomials j ( ) are orthogonal with respect to the given probability measure, that is, E[ j j ′ ] = 0 for any j ≠ j ′ , comprising j and j ′ , respectively. In particular, we use Legendre polynomials that are orthogonal with respect to the uniform measure. Similarly, Jacobi polynomials can be used for beta-distributed random variables, or Hermite polynomials can be used for the Gaussian measure, as introduced originally by Norbert Wiener 38 in the context of turbulence modeling. The convergence with respect to the degree d can be established by the standard analysis of the univariate polynomial approximation. 39 Note that this requires a sufficient smoothness of the solution. It holds specifically for a flow with minimized vorticity as intended by (1), but the convergence rate may deteriorate for a turbulent solution.
In this article, we cope with the exponential growth of the cardinality d m using low-rank tensor decompositions (see Section 4) for approximating directly the discrete expansion coefficients of the solution in a given full Cartesian basis. This allows us to get rid of the curse of dimensionality, since we never store the full tensor of coefficients explicitly, but only its low-rank factors. Another choice, which partially alleviates the curse of dimensionality, is the total degree polynomial space. The cardinality of the total degree space is (m + d)!∕(m!d!), which, for a fixed d, depends polynomially on m. However, this cardinality might still be rather large, which, together with a complicated sparsity pattern of the Galerkin matrix, makes the numerical solution expensive. By contrast, iterative tensor product methods (see Algorithm 1) require subsequent solution of small reduced systems of equations on each tensor factor, which can be more efficient than assembling and solving a single but large Galerkin system for the total degree space. 40 For the spatial discretization, we use the stable P 2 -P 1 Taylor-Hood finite element ansatz, 33 consisting of biquadratic for the velocity, and bilinear elements { k (x)} N p k=1 for the pressure. Specifically, we approximate the velocity functions v, u, in a subspace where H 1 0 (D) is the Sobolev space of functions with derivatives in L 2 (D), satisfying the homogeneous Dirichlet boundary conditions on Υ w . The pressure functions p, are approximated in a subspace Finally, the time derivative is discretized using the implicit Euler scheme on a uniform time grid {t n } N t n=1 , t n = n, = T∕N t . The solutions can then be approximated by the following expansions: and similarly for u ≈ u h , ≈ h and ≈ h , wherev andp are sought expansion coefficients. The weak variational formulation is obtained by projecting (8)-(10) onto the same spaces V h , Q h , combined with the polynomial subspace  d defined in (11). This gives the following optimality equations: unless stated otherwise.
To write (14)- (16) in a matrix form, we collect the expansion coefficients from (13) as well as total vectorsŵ Moreover, we need to introduce the following matrices resulting from the weak bilinear forms with the basis functions.
• The Laplace matrix for V h : • The partial discrete divergence operator: • The convection matrix depending on the given velocity component vectorv: • The weak derivative (adjoint) matrix depending on the given velocity: Now the system (14)- (16) can be expressed in the following compact matrix form on the total vectors (17) of solution coefficients: Since we use tensor product basis functions, each of the block matrices in A can be represented via Kronecker products. In particular, the forward and adjoint operators for the time-stochastic-space Navier-Stokes equations can be written as follows: and the mixing terms read where 2 I N t and I d m are identity matrices of the corresponding sizes, are the full spatial Stokes and mass matrices, respectively, andv,v are matrices derived from the nonlinear terms, evaluated at the coefficientsv of the previous Picard iterate v. These matrices can be also constructed in a Kronecker product form using (18) and (19), provided a compatible decomposition ofv is available. This will be introduced in the next section. The right-hand side vectorĝ depends on the boundary conditions. The stochastic expansion of (x, ) yields the following tensor form for the inflow function, where the vector̂1 is the discretized spatial function (x) of the inflow (4) or (5), ) for x k ∈ Υ in , and 0 otherwise, are the unit vectors resulting from expanding a constant and as given by (4) in  d , and the vector g t is the discretized time profile of the inflow, [g t ] n = 1 − exp(− n). Now we partition the PDE matrix into blocks, corresponding to inner and boundary degrees of freedom, or, more precisely, (B) denotes the nodes associated with the centers x k of biquadratic elements from V h belonging to the inflow boundary, x k ∈ Υ in , and (I) denotes all interior and outflow nodes. Then the usual finite element approach is employed: we eliminate IB and BI in the left-hand side of (20), replacing by II , and construct the right-hand side from (23) asĝ = IB̂.

TENSOR TRAIN DECOMPOSITION
Separation of the total index enumerating all degrees of freedom of, for example, the discrete solution in (20) back into individual indices corresponding to independent variables can be extended from (12) to the space and time variables as well. Each of the total vectors (17)ŷ = [ŷ(i)], where y stands for w, u, or , can also be seen as a (m + 2)-dimensional tensor with elementsŷ(k, j 1 , … , j d , n), provided that the total index i is defined consistently, The particular order of indices is not important for the concept in general. However, the actual numerical complexity of the low-rank decompositions introduced further might depend on the order, and certain approaches exist for determining it adaptively. 41,42 For our problem, the particular ordering chosen above is good enough due to the decay of the stochastic expansion of (x, ), and the corresponding decrease of the "influence" of for larger , as well as the largest dimensions corresponding to x and t being enumerated by the first and the last indices. To alleviate the curse of dimensionality, we representŷ by the so-called tensor train (TT) decomposition: 43 In practice, the exact equality (24) holds very rarely, and one seeks an approximation ofŷ by the right-hand side of (24). The auxiliary summation indices s 0 , … , s m are called rank indices, and their ranges r 0 , … , r m are called TT ranks. The factors on the right-hand side of (24) are called TT blocks. The trailing TT blocks are matrices, that is, y (m+1) ∈ R r m ×n t , and y (0) can be a matrix of size (2N v + N p ) × r 0 or 2N v × r 0 , depending on what vector of (17) is approximated. The middle TT blocks can be seen in three equivalent forms: • y ( ) ∈ R r −1 ×d×r is a three-dimensional tensor. Introducing the maximal TT rank r ∶= max =0,…,m r , we can conclude that the TT blocks contain at most mdr 2 + (2N v + N p + N t )r elements, that is, a linear amount w.r.t. the dimensionality m. The TT ranks depend on the particular tensor and approximation accuracy. In numerical practice, we aim for applications that allow a sufficiently accurate TT approximation with r being much smaller than the (d m ) cardinality of the original tensor.
• For a fixed j , y ( ) (j ) ∈ R r −1 ×r is a matrix. In fact, we can omit s in (24) and consider the right-hand side as a product of matrices, parametrized by k, j and n. For this reason, the TT decomposition is also known as the matrix product states 44,45 in quantum physics.
• For fixed s −1 , s , y ( ) s −1 ,s ∈ R d is a vector. Similarly, we can say that y (0) s 0 and y (m+1) s m are vectors. This allows us to rewrite (24) using Kronecker products:ŷ Note that the left-hand side is now a vector too, compatible with (17).
The latter notation (25) can be easily generalized to break the curse of dimensionality of matrices acting onŷ, such as those in (20). Namely, we introduce a matrix TT decomposition with some matrix TT ranks R 0 , … , R m . For example, each of the mixing matrices (22) is a particular case of the matrix TT decomposition with R 0 = … = R m = 1 and A (m+1) = I N t , A ( ) = I d for = 1, … , m, and A (0) being one of the leftmost block matrices in (22). However, (26) can be extended also to (21) with larger R 0 , … , R m , as we show in Section 4.1.
Equipped with (25) and (26), we can implement the so-called tensor arithmetics 43 efficiently. Linear combinations of vectors in the form (25), scalar, Hadamard, or matrix-vector products Aŷ, can be constructed in the TT format exactly, block by block, without ever expanding the Kronecker products explicitly. However, TT ranks of such exact representation can be still unnecessarily large for the desired accuracy, for example, TT ranks of Aŷ are equal to the products R r . In this case, the TT decomposition can be recompressed down to quasi-optimal TT ranks for a given threshold > 0 using the truncated singular value decompositions (SVD) applied to the matrix forms of TT blocks, with the total complexity being linear in m and polynomial in r and R ∶= max R . We refer to References 43 and 44 for details on these procedures.

TT format of the convection matrix
Since we can represent any tensor in a TT decomposition with sufficiently large TT ranks, we assume now that the expansion coefficients (13) of the velocity v h of the previous Picard iterate are written similarly to (25) as follows: (14), we can write the matrix element ofv, replacing v h by k (x) j ( ) and v′ by k ′ (x) j ′ ( ), as follows: where n is the current time step. Due to linearity of the TT format (27) and, using (18), we obtain Due to the Cartesian structure of (11), we can introduce a matrix of trilinear form such that Finally, we notice that F n v applies to each individual time step n, and hence the entire matrix is block diagonal w.r.t. the time dimension,̂v Note that this is exactly the matrix TT decomposition (26) with TT ranks R 0 = r 0 , … , R m = r m . Adding the two linear terms in (21), we can conclude that if the velocity coefficients are represented in the TT format (27) with the maximal TT rank r, the forward Navier-Stokes matrix can be represented in the matrix TT format with the maximal TT rank R = r + 2.
The right-hand sideĝ = IB̂i n (20) must be recomputed in every Picard iteration, since IB carries the corresponding (new) part IB . This can be achieved using the fast TT-structured matrix-vector product. 43 The adjoint convection matrix can be constructed similarly, using (19):

Alternating linear solver
The discretized Oseen equation given by (20) is a large linear system in y and needs to be solved keeping all the components in the TT format in order to keep the storage requirements low. A state of the art approach to this problem is to use alternating tensor product algorithms. 44,46,47 Given the linear system Ay = b, we iterate over = 0, … , m + 1 and seek only the elements of y ( ) in each step, while the other TT blocks are fixed. Again, since the TT format (24) is linear w.r.t. the elements of each TT block, there exists a matrix Y (37) such that y = Y ⋅ vec(y ( ) ), where vec(y ( ) ) is a vector of all elements of the TT block y ( ) . This renders Ay = b an overdetermined system AY ⋅ vec(y ( ) ) = b w.r.t. the elements of y ( ) . This system is resolved via a projection onto Y , such that y ( ) is computed from a smaller system Due to the linearity of the TT format, this method was called the alternating linear scheme (ALS). 47 However, the simple ALS algorithm can converge very slowly, and the TT ranks of the representation are fixed and cannot be adapted if it turns out that they were underestimated. More efficient extensions reduce the problem to computing two TT blocks y ( ) , y ( +1) simultaneously (the density matrix renormalization group [DMRG] 46,48 ) or augment the projection matrices Y with a partial TT approximation of the residual (the alternating minimal energy (AMEn) algorithm 49 ).
However, the standard ALS method is not immediately applicable to our matrix A from (20). First, the matrix A is indefinite, and hence its Galerkin projection can be degenerate. One can readily verify that Y ⊤ AY = 0.

Example 1. Consider
Instead of projecting the entire matrix A, we employ the so-called block TT format 50 and project each submatrix of A separately, which will be explained below.
Second, the velocity and pressure vectors have different sizes. Fortunately, this issue can be circumvented by decoupling of the momentum and kinematic equations, which is a well-established technique in computational fluid dynamics. 51 This stems from the Lagrange multiplier role of the pressure in the divergence-free constraint for the velocity. Moreover, the empirical Galerkin projection of the pressure gradient (which is implicitly realized in the ALS method) is often exactly zero for closed flows. 52,53 In general, this is not the case, and the pressure correction should be chosen judiciously 52 in the standard model reduction framework with a precomputed basis. However, the ALS algorithm, carrying out several iterations, can be seen as an "iterative" model reduction, where the basis v (0) is refined in the course of computations. This procedure can be also combined with the outer Picard iteration. This allows us to design a simpler Chorin-type approach 36 that is more suitable for the TT approximations.
First, fixing the velocity coefficients, the first row of (20) becomes an overdetermined equation on the pressure coefficients, andĝ v is a subvector ofĝ corresponding to the velocity elements only. Solving this system in the least squares sense gives the pressure Poisson equation, 51p and similarlŷ= where K * and M 1 are the "velocity" submatrices of * and 1 , respectively, constructed similarly to (31). Now we cast these pressure components to the right-hand sides of the corresponding velocity equations and obtain a KKT system with blocks of the same size, The consequence is 2-fold: all solution components are of the same size 2N v ⋅ d m N t now, and hence are suitable for the block storage introduced below, and moreover, the matrices K, K * , coming from the discretization of diffusion-convection equations, are positive definite.
Let us now focus on the block AMEn algorithm for solving (34). We denote the components of the solution byŷ , that is,ŷ 1 =v,ŷ 2 =û, andŷ 3 =̂. We approximate all components simultaneously by a TT format with all the same TT blocks except the th block for some = 0, … , m + 1, which actually carries the enumerator = 1, 2, 3: Without loss of generality, we can let s −1 = s m+1 = 1 to makeŷ ( ) s −1 ,s , well defined for = 0, m + 1. The block TT decomposition (35) can be initialized with placed in any TT block, but it can also be moved to a different TT block in the course of computations. 50 For example, suppose we need to replace into y ( +1) . We can introduce a matrix Y ( ) with elementsŶ with j replaced by k or n as appropriate and compute its SVD truncated up to a rank r ′ and/or an accuracy tolerance ||Σ − Σ r ′ || < ||Σ||. Next, we construct new TT blocks where s ′ = 1, … , r ′ . Note that these TT blocks constitute a block TT decomposition similar to (35), but with replaced by + 1. Similarly we can move into y ( −1) . Thus, in the th step of the ALS algorithm, we can always ensure that the th TT block carries as shown in (35). We can construct the so-called frame matrix where which can be seen as the block TT format (35) withŷ ( ) replaced by the identity. Note that (37) is a particular case of the matrix TT format (26), and hence it can be multiplied with the TT representations of K, M 1 , M 2 , or M 3 efficiently. 47 This allows us to generalize the ALS algorithm to a block ALS algorithm 2 by projecting the KKT system of momentum equations (34) block by block, which defines the elements ofŷ ( ) . Here, Â = Y T AY for A ∈ {K, M 1 , M 2 , M 3 } are the submatrices projected onto the frame matrix (37) in the Galerkin sense. Since K, K * are discretized diffusion-convection operators, we can prove that the block-reduced system (38) is nonsingular, and hence that, in contrast to the simple ALS algorithm, the block ALS method is well posed.

Theorem 1.
Suppose that the symmetric parts of K and K * are positive definite, K + K ⊤ > 0, K * + K * ⊤ > 0. Then the reduced matrix in (38) is invertible.
Proof. Due to the Poincaré theorem, the eigenvalues of an orthogonal projection of a symmetric matrix interlace with the eigenvalues of the original matrix. In particular, where min (⋅) is the minimal (real) eigenvalue of a matrix. We also use the fact that Y k is orthogonal, which is ensured by SVD (36) in the course of the ALS iteration † . So, the symmetric part ofK (as well as ofK * ) is positive definite. Moreover, by the same interlacing theorem we have thatM 1 ≥ 0 andM 2 > 0. Now we employ [ where ker(⋅) is the kernel of a matrix. The first condition is fulfilled sinceK is invertible. To verify the second criterion, consider a vector in the kernel of the constraints, which can be written as † Indeed, the new TT block y ( ) in (36) is produced from an orthogonal matrix of singular vectors U, and hence since u ⊤M 2 u > 0, while the first term is nonnegative. Together with positive semidefiniteness ofM 12 , this yields that w is not in its kernel. ▪ As Theorem 4.1 shows, the block ALS algorithm avoids mixing of different blocks in (34) and hence excludes Example 1. A rigorous convergence proof of ALS is a much more difficult problem, and usually only a local convergence can be established, 55 while the algorithm converges rapidly from a fairly general initial guess in practice. Nevertheless, we believe that the consistency provided by Theorem 1 is an important part of justification of the approach. Remark 1. In principle, the Chorin iteration (32)- (34) can be performed as is, but we observed a faster convergence if the TT blocks v (0) , (0) of the velocity variables are made divergence free. This can be satisfied efficiently by modifying only the reduced KKT system (38) for = 0, that is, we solve is the projected divergence matrix, andĝ p is the subvector ofĝ corresponding to pressure elements only and assemble the new TT blockŷ (0) = [vû̂] without the auxiliary pressure vectors p , (they vanish as the iteration converges). All other ALS steps, that is, = 1, … , m + 1, are carried out solving only the momentum equations (38).

Preconditioning
Although the reduced system is much smaller than the original problem, it might still be rather large for a "naive" treatment, for example, (39) is of size (3N v + 2N p )r 0 . We solve it using GMRES, together with a preconditioner introduced in References 2 and 56, which is based on the Schur complement approximation. The reduced matrix (39) admits a (straightforwardly verifiable) decomposition whereŜ = * + 1 −1 3 (M 2 ) −1T 3 is the Schur complement. We precondition (39) by the second (antitriangular) factor from (40). However, andŜ are still large and must be approximated.
The first matrix has the Kronecker form whereĈ,Ĥ s are partial projections of the Euler and convection (29) matrices onto ∑ s 1 ,…,s m Y (1) We approximate by the following Sylvester operator: where tr(Ĥ s )∕r 0 is an average of the eigenvalues ofĤ s . Now can be inverted by the Bartels-Stewart method, sinceĈ is small and can be easily Schur factorized. The Schur complementŜ is approximated by a factorized form, 56 where c > 0 is a normalization constant. Traditionally, it is proposed 2,56 to set c = √ , and this has proved to be the optimal choice for the tracking objective function. However, for the vorticity norm function, the optimal c might differ. We look for c that minimizes the residual of the system ( −1Ŝ )u = −1 f with a random right-hand side f after five GMRES iterations. Since it is difficult to differentiate the GMRES residual with respect to c, we employ the zero-order golden section optimization algorithm, 57 initialized with an interval log 10 c ∈ [log 10 √ − 6, log 10 √ + 6]. It is sufficient to carry out this procedure only in the first Picard iteration, since the optimal c does not change much in the latter iterations. We also approximate the factors in the brackets in (42) similarly to (41). The full procedure is summarized in Algorithm 1.

Flow around an obstacle
In the first experiment, we benchmark Algorithm 1 on the Karman vortex problem of simulating a flow in the obstacle domain shown in Figure 1.

Estimation of errors
The model is influenced by different sources of error. However, one is typically interested in the total error and the corresponding numerical cost. We begin with investigating different types of error, in order to equilibrate them in the complexity experiment. Throughout these experiments, the model parameters are fixed to the following values: viscosity = 10 −3 , final time T = 20, regularization parameter = 10 −2 , and the decay rate of the stochastic expansion = 2.5. Moreover, the polynomial degree in the stochastic Galerkin basis (11) is set to d = 8.

Spatial discretization error.
In order to determine how the solution converges with the spatial grid refinement, we further fix the number of stochastic variables m = 0 (ie, solve the deterministic problem), the number of time steps N t = 2048, and the TT truncation threshold = 10 −5 in Algorithm 1. We compute the mean square vorticity ||∇ × v|| 2 L 2 () and the mean square mass of the velocity, ||v|| 2 L 2 () for different numbers of spatial grid refinements n ref .
The results are shown in Table 1(left). We deduce the empirical convergence rate n ref = C ⋅ 2 −p⋅n ref using Runge's rule: computed with the refinement level n ref . We see that the largest relative error is attained for the vorticity and is of the form This agrees with the second order of approximation of the P 1 finite elements for twice differentiable functions. The number N v of the P 2 basis functions for each velocity component, the number N p of the P 1 basis functions for the pressure, as well as the total number of spatial degrees of freedom in the KKT system N x = 6N v + 2N p are shown in Table 2.

Time discretization error.
Now we fix n ref = 2 and vary the number of time steps in the Euler scheme instead. From Table 1(right), we confirm the first order of convergence of the implicit Euler scheme:  TT approximation error.
In this experiment, we fix n ref = 2, N t = 2048, and vary the TT approximation threshold in Algorithm 1 from 10 −3 to 10 −7 . We compare the velocity v and the control u , computed for the given tolerance , with the reference solutions v * , u * , computed with = 10 −8 . The results are shown in Figure 3(left). We see that the dominant error occurs in the control component. The least squares log-linear fit gives the following dependence: Stochastic parametrization error.
The exact stochastic inflow is an infinite dimensional random field. For computational purposes, we parametrize it by a truncated stochastic expansion of . This, however, introduces an additional error, corresponding to the truncated tail of the expansion. In Figure 3

Total error-performance test
Equipped with the quantitative error estimates (43) In Figure 4, we show the total CPU times vs the equilibrated error (realized by different n ref ), as well as the maximal TT rank of the solution.
We can observe that the CPU time grows proportionally to −2 . Qualitatively, this is the complexity of the deterministic problem: due to the second order of approximation in space, the number of spatial degrees of freedom for a two-dimensional space N x = (h −2 ) is of the order of the reciprocal of the discretization error, whereas the first order of approximation in time implies N t = ( −1 ). The total cost of the deterministic time-dependent problem is (N x N t ) = ( −2 ). This demonstrates that the tensor product methods can solve the stochastic time-dependent problem with the On the other hand, the cost of a straightforward solution of the stochastic problem is  (N x N t N ), where N is the number of degrees of freedom introduced in the stochastic variables. A traditional Monte Carlo approach would involve N = ( −2 ), and hence the total cost of the order of −4 . Quasi Monte Carlo or sparse polynomial chaos techniques can reduce N to ( −1 ), and the total cost to ( −3 ), which is still asymptotically larger than the cost of the TT approach.

Influence of the control
The reduction power of the tensor decompositions stems from their low storage, in particular, low TT ranks for the desired accuracy. The tensor ranks are hugely related to the structure of the solution. In particular, it is hard to expect low TT ranks for a turbulent flow, which develops for high Reynolds numbers in absence of the control. It is the inverse problem, designed in order to impose an additional structure, that allows to keep the TT ranks of the controlled solution low. In Figure 5, we compare mean flow streamlines in the forward problem (uncontrolled flow) and in the inverse problem. We see that the vorticity-minimizing control can efficiently enforce the laminar structure of the flow. The variance of the controlled flow velocity is shown in Figure 6, and the mean control is shown in Figure 7. As expected, the control is localized around the obstacle.
To investigate the influence of the regularization parameter on the flow, we increase the viscosity to = 10 −2 such that the optimal control problem with a large parameter = 10 2 becomes feasible. The velocity magnitudes for = ∞, = 10 2 , and = 10 −2 are shown in Figure 8. Notice how the flow becomes more laminar as more control is applied.

Influence of the model parameters
Finally, we investigate how the performance of the TT algorithm is affected by different model parameters. We vary the regularization parameter , the rate of decay in the stochastic expansion of of the inflow , and the viscosity . The computational times and TT ranks are shown in Figures 9,10, and 11, respectively. The unchanged parameters are set to their default values listed in the beginning of Section 5.1.1. Moreover, we set the spatial discretization level n ref = 2, and the other approximation parameters accordingly, as derived in Section 5.1.2. We see that the lowest cost is attained at intermediate values of the regularization parameter. For smaller , the velocity is driven closer to the stationary field. However, our scheme computes the difference between the stationary field and the actual velocity. Similarly to the solution error in iterative methods, the smaller this difference is in magnitude, the more complicated and oscillatory structure it has. This leads to larger TT ranks in the limit of small , and consequently larger computing cost.
On the other hand, when is large, the control becomes too weak to drive the solution away from the uncontrolled flow. The solution starts to manifest turbulent behavior, which increases the TT ranks as well. Moreover, a stronger influence of the nonlinear term leads to more Picard iterations, such that the total computational time increases further. The best scenario for the TT solver is therefore the balanced regularization, which corresponds to a sufficient, but not excessive control.
The rate of decay in the Fourier expansion (4) is governed by the covariance function of the random field. Highly correlated field can be approximated by fewer independent random quantities, which corresponds to a faster decay, that is, larger . Smaller means that more stochastic variables have a strong influence on the solution. Figure 10 shows that the computational cost is inversely proportional to the decay rate.
On the other hand, from Figure 11, we can observe that the TT scheme is quite robust with respect to the Reynolds number. In particular, the solution has the same TT ranks for the viscosity ranging from 10 −4 to 10 −2 . The computational time grows logarithmically with the Reynolds number. This is mainly due to the GMRES solver for (39), which needs more iterations for more convection-dominated problems.

Backward step domain with uncertain inflow
Additionally, we test our solver on another benchmark problem, the backward step domain (see Figure 2). As most results for this problem are qualitatively very similar to the flow around an obstacle, we only report a small number of test results. The inflow field at the leftmost boundary Υ in is given in (5). We choose the same other parameters as in the first experiment: viscosity = 10 −3 , final time T = 20, regularization parameter = 10 −2 , and the stochastic expansion decay rate = 2.5. From the vorticity and the mass of the deterministic solution computed for different spatial mesh refinement levels and numbers of time steps, we infer the following empirical convergence rates: Notice that the spatial discretization order is lower than 2, which can be attributed to the reentrant corner in the domain. The errors of the TT approximation and stochastic parametrization are shown in Figure 12. The fitted error expressions are shown in the corresponding plots. Now we can equilibrate all errors to the spatial discretization error for n ref = 1, 2, 3 and measure the computational complexity with respect to the total error (see Figure 13).
We see that the slope is higher than in the obstacle example. This is due to the slower convergence of the spatial discretization, whereas the TT rank still depends logarithmically on the accuracy. Higher order finite elements in space might make the scheme more efficient. Nevertheless, even the simple discretization demonstrates that the low-rank decompositions can perform uncertainty quantification with the same asymptotic cost as the solution of a deterministic problem. The mean velocities are shown in Figure 14. Again, we can see stabilization of the flow when a stronger control is applied.

CONCLUSIONS AND OUTLOOK
We demonstrated the applicability of low-rank tensor decompositions to the solution of two benchmark optimal control problems constrained by two-dimensional unsteady Navier-Stokes equations with stochastic inputs. This problem has a 3-fold challenge: a nonlinear time-dependent PDE, an optimization problem using a Lagrangian approach, and random inputs. As particular model problems, we consider the minimization of vorticity of a von Kármán flow around an obstacle, as well as a backward step domain problem. Each of the two models problems has an uncertain inflow condition. We discretize our problem using the tensor product stochastic Galerkin FEM, in which case the numbers of degrees of freedom coming from space, time, and all individual random variables multiply. The largest size of the problem to solve for n ref = 2, m = 8, and N t = 2048 in the full representation would be N x ⋅ 8 m ⋅ N t ≈ 10 15 , which exceeds our memory capacity by several orders of magnitude.
We avoid storing or computing such excessive amounts of data by approximating the solution in the TT decomposition. A crucial part of the scheme is an alternating linear scheme solver, which computes the TT factors directly. This algorithm required substantial modifications in order to be applicable to the SOCP: we preserve the saddle-point structure in the reduced model and split the pressure and the velocity components in order to equilibrate the sizes of the tensors involved in the alternating algorithm. The largest size of the reduced problem we actually needed to solve in Algorithm 1 for the TT rank r = 60 was N x ⋅ r ≈ 5 ⋅ 10 6 . This is still rather large and requires preconditioning. Nevertheless, this is significantly smaller than the complexity one could expect from sampling-based methods, let alone the full discrete problem. We observed an empirical complexity rate of ( −2 ) for the total error , which indicates that the stochastic time-dependent problem can be solved with the cost of a deterministic problem.
For future research, we plan to tackle more general nonlinear problems, as well as boundary and constrained controls. We expect difficulties in applying the tensor decompositions considered in this article to discontinuous functions, such as the indicator function of an active set. However, nonlinear problems involving smooth functions seem to be suitable for the low-rank approach. 60