Exterior products of operators and superoptimal analytic approximation

We give a new algorithm for the construction of the unique superoptimal analytic approximant of a given continuous matrix-valued function on the unit circle, making use of exterior powers of operators in preference to spectral or {\em Wiener-Masani} factorizations.


Introduction
In this paper we put forward a new algorithm for the computation of the superoptimal analytic approximation of a continuous matrix-valued function on the circle, a notion that arises naturally in the context of the classical "Nehari problem", and also in the "robust stabilization problem" in control engineering.
To explain the term "superoptimal", let us start from the elementary observation that a measure of the "size" of a compact operator T between Hilbert spaces is provided by the operator norm T of T . However, a single number can only ever provide a coarse measure of the size of a multi-dimensional object, and there is a well-developed classical theory [10] of 's-numbers' or 'singular values' of an operator or matrix, which provides much more refined information about an operator than the operator norm. Consider Hilbert spaces H, K and an operator T : H → K, and let j ≥ 0. The quantity s j (T ) is defined to be the distance, with respect to the operator norm, of T from the set of operators of rank at most j: In the setting of matrices T (that is, in the case that H and K are finite-dimensional), s j (T ) is often called the jth singular value of T . In this setting one can show that the singular values of T are precisely the eigenvalues of √ T * T . The largest singular value of T is the spectral radius of √ T * T , that is, T , and so clearly the set of all singular values of T contains much more information than the norm T alone. The use of s-numbers immediately gives rise to a measure of the error in an approximation of an operator-or matrix-valued function. Consider, for example, an m × n-matrix-valued function G on the unit circle T, and suppose we wish to approximate G by a matrix-valued function Q of a specified form (such as a rational function of a prescribed McMillan degree). It is natural to regard the difference G − Q as the "error" in the approximation, and to regard the quantities ) for j ≥ 0 as measures of how good an approximation Q is to G. We set , . . . , s ∞ j (G − Q), . . . ), and say that Q is a superoptimal approximation of G in a given class F of functions if s ∞ (G − Q) minimises s ∞ (G − Q) over Q in F with respect to the lexicographic ordering of the set of sequences of non-negative numbers.
The notion of superoptimality pertains to matricial or operator-valued functions, and is therefore particularly relevant to control engineering and electrical networks more generally, since in these fields one must analyse engineering constructs whose mathematical representations are typically matrix-valued functions on the circle or the real line. In particular, a primary application is to the problem of designing automatic controllers for linear time-invariant plants with multiple inputs and outputs. Such design problems are often formulated in the frequency domain, that is, in terms of the Laplace or z−transform of signals. By this means the problem becomes to construct an analytic matrix-valued function in a disc or half-plane, subject to various constraints. An important requirement is usually to minimize, or at least to bound, some cost-or penalty-function. In practical engineering problems a wide variety of constraints and cost functions arise, and the engineer must take account of many complications, such as the physical limitations of devices and the imprecision of models. Engineers have developed numerous ways to cope with these complications [9,7]. One of them, developed in the 1980s, is H ∞ control theory [8]. It is a wide-ranging theory, that makes pleasing contact with some problems and results of classical analysis; a seminal role is played by Nehari's theorem on the best approximation of a bounded function on the circle by an analytic function in the disc. Also important in the development of the theory was a series of deep papers by Adamyan, Arov and Krein [1], [2] which greatly extend Nehari's theorem and which apply to matrix-valued functions.
In this context the notion of a superoptimal analytic approximation arose very naturally. Simple diagonal examples of a 2 × 2-matrix-valued function G on T show that the set of best analytic approximants to G in the L ∞ norm typically comprises an entire infinitedimensional ball of functions, and so one is driven to ask for a stronger optimality criterion, and preferably one which will provide a unique optimum. The very term "superoptimal" was coined by engineers even before its existence had been proved in generality. The paper [24] proved that the superoptimal approximant does indeed exist, and moreover is unique, as long as the approximand G is the sum of a continuous function and an H ∞ function on the circle. In engineering examples G is usually rational and so continuous on the circle.
Let us first provide some preliminary definitions and then formulate the problem. Throughout the dissertation, C m×n denotes the space of m × n complex matrices with the operator norm and D, T denote the unit disc and the unit circle respectively.
L ∞ (T, C m×n ) is the space of essentially bounded Lebesgue measurable matrix-valued functions on the unit circle with essential supremum norm Also, C(T, C m×n ) is the space of continuous matrix-valued functions from T to C m×n . Naturally engineers need to be able to compute the superoptimal approximant of G.
Problem 1.2 (The superoptimal analytic approximation problem). Given a function G ∈ L ∞ (T, C m×n ), find a function Q ∈ H ∞ (D, C m×n ) such that the sequence s ∞ (G − Q) is minimized with respect to the lexicographic ordering.
In general, the superoptimal analytic approximant may not be unique. However, it has been proved that if the given function G belongs to H ∞ (D, C m×n ) + C(T, C m×n ), then Problem 1.2 has a unique solution. The following theorem, which was proved by V.V. Peller and N.J. Young in [24], asserts what we have just stated. The existence proof in [24] can in principle be turned into an algorithm, but into a very computationally intensive one. The construction is recursive, and at each step of the recursion one must augment a column-matrix function to a unitary matrix-valued function on the circle with some special properties. Computationally this step requires a spectral factorization of a positive semi-definite matrix-valued function on the circle. There are indeed algorithms for this step, but they involve an iteration which may be slow to converge and badly conditioned, especially if some function values have eigenvalues on or close to the unit circle.
It is certainly desirable to avoid the matricial spectral factorization step if it is possible to do so. Our aim in this project was to devise an algorithm in which the iterative procedures are as few and as well-conditioned as possible. Iteration cannot be completely avoided; even in the scalar case, optimal error is the norm of a certain operator, and the best approximant is given by a simple formula involving the corresponding Schmidt vectors. Thus one has to perform a singular value decomposition. In the case that the approximand G is of type m × n one must expect to solve min(m, n) successive singular value problems. However, from the point of view of numerical linear algebra, singular value decomposition is regarded as a fast, accurate and well-behaved operation. In this paper we describe an algorithm that is, in a sense, parallel to the construction of [26] and that in addition to the spectral factorisation of scalar functions, requires only rational arithmetic and singularvalue decompositions. Several engineers have developed alternative approaches [14], [28] based on state-space methods. These too are computationally intensive.
We believe that the present method, which makes use of exterior powers of Hilbert spaces and operators, provides a conceptual approach to the construction of superoptimal approximants which is a promising basis for computation. The theoretical justification of the algorithm we present in this paper is lengthy and elaborate. However, the implementation of the algorithm should be straightforward. It will be very interesting to see whether it leads to an efficient numerical method in the future.
Here is our algorithm. Full notation and details will follow in Section 4.2. This algorithm provides a solution AG to Problem 1.2. By computing the value of each t k at every step, we obtain each term s ∞ k (G− AG) of the sequence s ∞ (G− AG). First we need the notion of a Hankel operator and the definitions of some long-established standard function spaces; for a more detailed account of these spaces see [15,Chapter V].
the left hand side of this inequality defining a norm on H 2 (D, E), with respect to which H 2 (D, E) is a Hilbert space. If E is separable then every function f ∈ H 2 (D, E) has a radial limit at almost every point of T, by a theorem of Fatou [15,Chapter V], and the map that takes a function f ∈ H 2 (D, E) to its radial limit function embeds H 2 (D, E) isometrically in L 2 (T, E). In this paper we shall only envisage the case that E is separable, and so we can always regard H 2 (D, E) as a closed subspace of L 2 (T, E). The operators P + , P − on L 2 (T, E) are the operators of orthogonal projection onto the closed subspaces H 2 (D, E) and H 2 (D, E) ⊥ of L 2 (T, E).
The following lemma is elementary.
Lemma 1.8. Let T ∈ L(H, K) be a compact operator and let x ∈ H, y ∈ K be such that (x, y) is a Schmidt pair for T corresponding to s = T . Then x is a maximizing vector for T, y is a maximizing vector for T * , and x H = y K . is called: (i) inner if Θ(e it ) is an isometry from C n to C m for almost every e it on T; (ii) outer if ΘH 2 (D, C n ) = {Θf : f ∈ H 2 (D, C n )} is dense in H 2 (D, C m ) with respect to the L 2 (T, C m ) norm; (iii) co-outer if its transpose Θ T is outer.
The following is a brief summary of our algorithm. A full account of all the steps, with definitions and justifications will be given in Section 4.
Algorithm: For a given G ∈ H ∞ (D, C m×n ) + C(T, C m×n ), the superoptimal analytic approximant AG ∈ H ∞ (D, C m×n ) can be constructed as follows. i) Step 0. Let T 0 = H G be the Hankel operator with symbol G. Let t 0 = H G . If t 0 = 0, then H G = 0, which implies G ∈ H ∞ (D, C m×n ). In this case, the algorithm terminates, we define r to be zero and the superoptimal approximant AG is given by AG = G. Let t 0 = 0. The Hankel operator H G is a compact operator and so there exists a Schmidt pair (x 0 , y 0 ) corresponding to the singular value t 0 = H G of H G . By the definition of a Schmidt pair (x 0 , y 0 ), x 0 ∈ H 2 (D, C n ), y 0 ∈ H 2 (D, C m ) ⊥ are non-zero vector-valued functions such that H G x 0 = t 0 y 0 , H * G y 0 = t 0 x 0 . The functions x 0 ∈ H 2 (D, C n ) andzȳ 0 ∈ H 2 (D, C m ) admit the inner-outer factorizations for some scalar outer factor h 0 ∈ H 2 (D, C) and column matrix inner functions ξ 0 ∈ H ∞ (D, C n ), η 0 ∈ H ∞ (D, C m ). Then, x 0 (z) C n = |h 0 (z)| = y 0 (z) C m almost everywhere on T. (1.2) We write equations (1.1) as Then ξ 0 (z) C n = 1 = η 0 (z) C m almost everywhere on T.
(1.4) There exists a function Q 1 ∈ H ∞ (D, C m×n ) which is at minimal distance from G; any such function satisfies (1.7) X 1 is a closed linear subspace of H 2 (D, ∧ 2 C n ). Y 1 is a closed linear subspace of H 2 (D, ∧ 2 C m ) ⊥ . Choose any function Q 1 ∈ H ∞ (D, C m×n ) which satisfies the equations (1.5). Define the operator for all x ∈ H 2 (D, C n ), (1.8) where P Y 1 is the projection from L 2 (T, ∧ 2 C m ) on Y 1 . We show that T 1 is well-defined.
Definition 1. 10. Let E be a Hilbert space. We say that a collection {γ j } of elements of L 2 (T, E) is pointwise orthonormal on T if, for almost all z ∈ T with respect to Lebesgue measure, the collection of vectors {γ j (z)} is orthonormal in E.
Theorem 1.11. Let G ∈ H ∞ (D, C m×n ) + C(T, C m×n ). Let T i , x i , y i , h i , for i ≥ 0, be defined by the algorithm above. Let r be the least index j ≥ 0 such that T j = 0. Then r ≤ min(m, n) and the superoptimal approximant AG is given by the formula Wedge products, and in particular pointwise wedge products, along with their properties are studied in detail in Section 3.

History and recent work
The Nehari problem of approximating an essentially bounded Lebesgue measurable function on the unit circle T by a bounded analytic function on the unit disk D, has been attracting the interest of both pure mathematicians an engineers since the middle of the 20th century. The problem was first formulated and studied from the viewpoint of scalarvalued functions, and, in the years that followed, from the operator-valued perspective also, which motivated research into the superoptimal approximation problem.
The Nehari problem in the scalar case first appeared in the paper of Nehari [16]. Given an essentially bounded complex valued function g on T, one seeks its distance from H ∞ with respect to the essential supremum norm, and wishes to determine for which elements of H ∞ this distance is attained. It is also of interest to know whether the distance is attained at a uniquely determined function. Such problems have been studied in detail by Nehari [16], Sarason [31] and Adamjan, Arov and Krein in [1] and [2]. These authors proved that the distance of g from H ∞ is equal to the norm of the Hankel operator H g with symbol g. Moreover, if H g has a maximizing vector in H 2 , then the bounded analytic complex-valued function q that minimizes the essential supremum norm g − q L ∞ is uniquely determined and can be explicitly calculated (see, for example, [35, p. 196]). Furthermore, if the essential norm H g e is less than H g , or if H g has a maximizing vector, then g has a unique best approximant.
Pure mathematicians and engineers started seeking analogues of those results for matrixand operator-valued functions. These generalizations are not only mathematically interesting, but are essential for applications in engineering, and especially in control theory.
There has accordingly been an explosion of research in this field since 1980, on the part of both pure mathematicians and engineers.
Page [17] and Treil [32] gave various extensions of the results of Adamjan, Arov and Krein to operator-valued functions. Page proved that for operator-valued mappings T ∈ Hilbert spaces and L(E 1 , E 2 ) denotes the Banach space of bounded linear operators from E 1 to E 2 . Treil extended the Adamjan, Arov and Krein theorem in [2] to an operatorvalued analogue.
However, in the matrix-valued setting there are typically infinitely many functions that minimize the L ∞ norm of the error function. This fact is simply illustrated by the following example. Let G(z) = diag{z, 0}, for z ∈ T. The norm of H G in this case is easily seen to be 1, and hence all matrix-valued functions Q ∈ H ∞ (D, C 2×2 ) of the form Q(z) = diag{0, q(z)}, where q ∈ H ∞ and q H ∞ ≤ 1, minimize the norm G − Q ∞ , yielding the error 1. However, if one goes on to minimize in turn the essential suprema of both singular values of G(z) − Q(z) over Q ∈ H ∞ (D, C 2×2 ), one finds that such a minimum occurs uniquely when q(z) is equal to 0. This type of example suggests that the enhanced approximation criterion based on successive singular values generates the "very best" amongst the best approximants to G by an element of H ∞ (D, C 2×2 ).
Such reflections led to the formulation of a strengthened approximation problem, the superoptimal approximation problem. For G ∈ L ∞ (T, C m×n ) one defines, for j = 0, 1, 2, . . . , and , where s j (G(z)) denotes the j-th singular value of the matrix G(z). In [34] N.J. Young introduced a stregthened notion of optimal analytic approximation, subsequently called superoptimal approximation. Given a G as above, find a Q ∈ H ∞ (D, C m×n ) such that the sequence s ∞ (G − Q) is lexicographically minimised. This criterion obviously constitutes a considerable strengthening of the notion of optimality, as one needs to determine a Q ∈ H ∞ (D, C m×n ) that not only minimizes G − Q L ∞ , but minimises the L ∞ norm of all the subsequent singular values s j (G(z) − Q(z)) for j ≥ 0.
A good starting point for the superoptimal approximation problem of matrix functions is [24]. As we have said, the problem is to find, for a given is lexicographically minimized. Peller and Young proved some requisite preparatory results on "thematic factorizations", on the analyticity of the minors of unitary completions of inner matrix columns and on the compactness of some Hankel-type operators with matrix symbols. These results provided the foundation for their main theorem, namely that if G belongs to H ∞ (D, C m×n ) + C(T, C m×n ), then there exists a unique Q ∈ H ∞ (D, C m×n ) such that the sequence s ∞ (G − Q) is lexicographically minimized as Q varies over H ∞ (D, C m×n ); moreover for this Q, the singular values s j (G(z) − Q(z) are constant almost everywhere for z ∈ T, for j = 0, 1, 2, . . . .
Later, in [26] Peller and Young presented a conceptual algorithm for the computation of the superoptimal approximant. Their algorithm is based on the theory developed in [24]. Also in [26], the algorithm was applied to a concrete example of a rational 2 × 2 matrixvalued function G in H ∞ (D, C 2×2 ) + C(T, C 2×2 ) and the superoptimal approximant AG was calculated by hand.
Additionally, Peller and Young in [27] studied superoptimal approximation by meromorphic matrix-valued functions, that is, matrix-valued functions in H ∞ that have at most k-poles for some prescribed integer k. They modified the results of [24] and established a uniqueness criterion in the case that the given matrix-valued function is in H ∞ + C and has at most k-poles. In addition, they provided an algorithm for the calculation of the superoptimal approximant.
One can extend the above results to operator-valued functions on the circle; the operatorvalued superoptimal approximation problem was studied by Peller in [22]. He generalized the notions of [24] and proved that there exists a unique superoptimal approximant in H ∞ (B) for functions that belong to H ∞ (B) + C(C), where B denotes the space of bounded linear operators and C(C) denotes the space of continuous functions on the circle taking values in the space of compact operators.
Very badly approximable functions, that is, functions that have the zero function as a superoptimal approximant, were studied in the years that followed and a considerable amount of work was published. Peller and Young's paper [24] provided the motivation for the study of this problem, where they were able to algebraically characterise the very badly approximable matrix functions of class H ∞ (D, C m×n ) + C(T, C m×n ). Their results were extended in [23] to the case of matrix functions G for which H G e is less than the smallest non-zero superoptimal singular value of G. Very badly approximable matrix functions with entries in H ∞ + C were completely characterised in [25].
Recent work in [4] by Baratchart, Nazarov and Peller explores the analytic approximation of matrix-valued functions in L p of the unit circle by matrix-valued functions from H p of the unit disk in the L p norm for p ≤ 2. They proved that if a given matrixvalued function Ψ ∈ L p (T, C m×n ) is a respectable matrix function, then its distance from H p (D, C m×n ) is equal to H Ψ , and they obtained a characterisation of that distance also in the case Ψ is a weird matrix-valued function. Furthermore, they established the notion of p-superoptimal approximation and illustrated the fact that every n × n rational matrix function has a unique p-superoptimal approximant for 2 ≤ p < ∞. For the case p = ∞ they provided a counterexample.
In a more recent paper of Condori [6], the author considered the relation between the sum of the superoptimal singular values of admissible functions in L ∞ (T, C m×n ) and the superoptimal analytic approximation problem in the space L ∞ (T, S m,n p ), where S m,n p denotes the space of m × n matrices endowed with the Schatten-von Neumann norm · S m,n p . He illustrated the fact that if Φ ∈ L ∞ (T, C n×n ) is an admissible matrix function of order k, then Q ∈ H ∞ (D, C n×n ) is a best approximant function under the L ∞ (T, (S n 1 ))norm and the singular values s j ((ϕ − Q)(z)) are constant almost everywhere on T for j = 0, 1, . . . , k − 1 if and only if Q is a superoptimal approximant to Φ, ess sup z∈T s j ((Φ − Q)(z)) = 0 for j ≥ k, and the sum of the superoptimal singular values of Φ is equal to where m, n > 1, 1 ≤ k ≤ min(m, n) and the supremum is taken over all Ψ ∈ H 1 0 (D, C n×m ) for which Ψ L 1 (T,C n×m ) ≤ 1 and rankΨ(ζ) ≤ k almost everywhere on T.

Exterior powers of Hilbert spaces and operators
In this section we recall the well-established notion of the wedge product of Hilbert spaces. It is surprisingly hard to find a description of this theory in standard text-books, and so we present a succinct account of the theory. We define an inner product on the p-fold wedge product of Hilbert spaces, introduce the notion of pointwise wedge product of matrix-valued functions on D, we study its properties and we formulate a concise theory specifically for multiplication, block diagonal and creation operators. Towards the end of the section, we examine in detail the characteristics of the pointwise orthogonal complement and pointwise linear span. Basic definitions and properties of exterior products can be found in a S. Winitzki's book [33].
3.1. Exterior powers. In this subsection, we first present some results concerning the action of permutation operators on tensors, then we recall the definition of antisymmetric tensors and we define an inner product on the space of all antisymmetric tensors. In the following E denotes a Hilbert space. We shall assume known the notion of the algebraic tensor product of vector spaces, which is precisely explained in [11].
Definition 3.1. ⊗ p E is the algebraic p-fold tensor product of E and is spanned by tensors of the form Definition 3.2. An inner product on ⊗ p E is defined on elementary tensors by x 1 ⊗ x 2 ⊗ · · · ⊗ x p , y 1 ⊗ y 2 ⊗ · · · ⊗ y p ⊗ p E = p! x 1 , y 1 E · · · x p , y p E , for any x 1 , . . . , x p , y 1 , . . . , y p ∈ E, and is extended to ⊗ p E by sesqui-linearity.
Definition 3.4. Let S p denote the symmetric group on {1, . . . , p}, with the operation of composition. For σ ∈ S p we define for any x i j ∈ E and λ i ∈ C.
Remark 3.5. (S p , •) is a group so, for every permutation σ ∈ S p , there exists σ −1 ∈ S p such that σ • σ −1 = id = σ −1 • σ, where id ∈ S p is the identity map on {1, . . . , p}. Then, if ǫ σ denotes the signature of the permutation σ, Proposition 3.6. Let E be a Hilbert space, and let p be any positive integer. Then, for any σ ∈ S p , S σ is a linear operator on the normed space (⊗ p E, · ), which extends to an isometry S σ on (⊗ p H E, · ). Furthermore, S σ is a unitary operator on ⊗ p H E. Proof. It is easy to check that S σ is linear. For any elementary tensors w = x 1 ⊗ x 2 ⊗ · · · ⊗ x p , v = y 1 ⊗ . . . y p by the definition of the inner product on ⊗ p E, Hence S * σ = S σ −1 , and therefore S * σ S σ = S σ −1 S σ = I, the identity operator on ⊗ p E, and therefore S σ is an isometric linear self-map of ⊗ p E.
Definition 3.10. Let E be a Hilbert space. For x 1 , . . . , x p ∈ E, define x 1 ∧ x 2 ∧ · · · ∧ x p to be the orthogonal projection of the elementary tensor x 1 ⊗ x 2 ⊗ · · · ⊗ x p onto ∧ p E, that is Remark 3.11. For any tensor of the form the orthogonal projection of u onto ∧ p E is given by Theorem 3.12. For all u ∈ ⊗ p H E, Proof. Let u ∈ ⊗ p H E. Then, for any σ ∈ S p , u = ǫ σ S σ (u) + (u − ǫ σ S σ (u)), and so Since the first sum on the right hand side is clearly antisymmetric, it suffices to show that is orthogonal to the set of antisymmetric tensors, in other words, that if v ∈ ∧ p E then v, For every w = σ∈Sp ǫ σ S σ (u) ∈ ⊗ p H E and for every τ ∈ S p , we have Proof. By Theorem 3.12, we have Corollary 3.15. Let E be a Hilbert space. Suppose x, y ∈ E, and x,y are orthogonal in E, that is, x, y E = 0. Then Proof. By Proposition 3.14, If x is orthogonal to y in E, the off-diagonal entries are zero and thus Lemma 3.16. Suppose {u 1 , · · · , u n } is an orthonormal set in C n . Then for j = 1, . . . , n−1 and for every x ∈ C n , Proof. Let x ∈ C n . We may write By Proposition 3.14, By assumption, and hence If, for k = 1, · · · , j we multiply the k-th column of the determinant by u k , x C n and subtract it from the (j + 1)-th column, we find that x, u i C n u i 2 C n , the latter equality by Pythagoras' theorem.
Moreover, we define a norm on Definition 3.18. Let E be a Hilbert space. We define the multilinear operator and extend to E × · · · × E p−times by multilinearity.
Proposition 3.20. Let E be a Hilbert space. Then the multilinear mapping Observe that the matrix is Hermitian, and so its determinant is real. By Hadamard's inequality, Moreover, by the Cauchy-Schwartz inequality, , which proves that Λ is a bounded multilinear operator, hence is continuous.
Suppose E is a separable Hilbert space with an orthonormal basis (e i ). In what follows, we describe an orthonormal basis for the space ∧ p E.
Next we study some properties of wedge products of bounded linear operators.
is a bounded linear operator for i = 1, . . . , p. We define the operator on algebraic tensor products T 1 ⊗ · · · ⊗ T p : H 1 ⊗ · · · ⊗ H p → K 1 ⊗ · · · ⊗ K p on elementary tensors by and we extend T 1 ⊗ · · · ⊗ T p to H 1 ⊗ · · · ⊗ H p by linearity. Alternative notations for T 1 ⊗ · · · ⊗ T p are ⊗ p i=1 T i , and (in the case that T i = T for each i) ⊗ p T . Proposition 3.23. Let (H i , ·, · H i ) and (G i , ·, · G i ) be Hilbert spaces, and let T i : H i → G i be bounded linear operators for all i = 1, . . . , p. Then, the operator T 1 ⊗ · · · ⊗ T p above has a continuous extension Proofs of this fact are given in [ Then u = ǫ σ S σ u for all σ ∈ S p . Therefore, for any σ ∈ S p , Thus, for u ∈ ∧ p H, (⊗ p T )u is an antisymmetric tensor in ⊗ p H K, that is, a member of ∧ p K.
Proposition 3.27. Let E be a Hilbert space. Then the multilinear mapping Λ p : ⊗ p E → ∧ p E is continuous for the norms of E and ∧ p E.
Proof. It suffices to show that Λ p is bounded. Let x 1 , . . . , x p ∈ E and suppose x j ≤ 1 for j = 1, . . . , p. Then Now combine equation (3.1) with the inequality of Hadamard which states that the modulus of a determinant is no greater than the product of the norms of the columns. Since | x i , x j | ≤ 1 for each i, j, by the Cauchy-Schwarz inequality, each column of the determinant in equation (3.1) has norm at most p 1 2 . We therefore obtain Hence the p-linear operator Λ p is bounded. Thus Λ p is a continuous operator.
Lemma 3.28. Let H, K be Hilbert spaces and S, T : H → K be bounded linear operators. Then, for any positive integer p, Proof.

3.2.
Pointwise wedge products. For the purposes of this paper we need to consider the wedge product of mappings defined on the unit circle or in the unit disk that take values in Hilbert spaces. To this end we introduce a notion of pointwise wedge product and we study its properties.
ii) H p (D, E) to be the normed space of analytic E-valued maps f : D → E such that s inequality] Let f ∈ L p (T, C) and let g ∈ L q (T, C), where p, q > 1 are such that 1 Proposition 3.34. Let E be a Hilbert space and let 1 p Proof. By Proposition 3.14, for all z ∈ T, (x∧y)(z) 2 3) By Definition 3.32, Now by Hölder's inequality, (3.5) By inequalities (3.4) and (3.5), x∧y ∈ L 1 (T, ∧ 2 E) and the inequality (3.2) holds.
The proof is straightforward. It follows from Proposition 3.14 and Hadamard's inequality.
Remark 3.37. Let E be a finite-dimensional Hilbert space. For 1 ≤ p ≤ ∞, we will regard x ∈ H p (D, E) as a column-vector valued function on D or T and x * as the row-vector valued function, x * (z) = x(z) * , for all z ∈ D.
Example 3.38. If E = C n , and if for all z ∈ T, then x * (z) = x 1 (z) · · · x n (z) .
Definition 3.39. Let E be a Hilbert space. If x ∈ H 2 (D, E) and y ∈ H ∞ (D, E), then y * x ∈ L 2 (T, E) is given by (y * x)(z) = x(z), y(z) E almost everywhere on T.
Definition 3.43. Let E be a separable Hilbert space. Let ξ ∈ H ∞ (D, E). We define the pointwise creation operator exist almost everywhere on T and define functionsξ ∈ L ∞ (T, E) andf ∈ L 2 (T, E) respectively, which satisfy the relations Then the radial limits lim r→1 (ξ(re iθ ) ∧ f (re iθ )) exist for almost all e iθ ∈ T and define functions in L 2 (T, ∧ 2 E).
Proof. By Proposition 3.27, the bilinear operator Λ : E × E → ∧ 2 E is a continuous operator for the norms of E and ∧ 2 E. By Remark 3.44, the functions ξ ∈ H ∞ (D, E) and f ∈ H 2 (D, E) have radial limit functionsξ ∈ L ∞ (T, E) andf ∈ L 2 (T, E). Also, by Proposition 3.41, ξ∧f ∈ H 2 (D, ∧ 2 E). Hence lim r→1 ξ(re iθ ) ∧ f (re iθ ) −ξ(e iθ ) ∧f (e iθ ) ∧ 2 E = 0 almost everywhere on T and we conclude that This shows that the radial limits exist almost everywhere on T and, by Lemma 3.34, define functions in L 2 (T, ∧ 2 E). Hence one can consider (C ξ f )(z) = (ξ∧f )(z) to be defined for either all z ∈ D or for almost all z ∈ T.  In what follows we are going to use the same notation for both f andf .
Definition 3.48. Let E be a separable Hilbert space. Let F be a subspace of L 2 (T, E) and let X be a subset of L 2 (T, E). We define the pointwise linear span of X in F to be the set We define the pointwise orthogonal complement of X in F to be the set Our next aim is to show that POC(X, F ) is a closed subspace of F. We are going to need the following Lemmas.
Lemma 3.49. Let E be a Hilbert space and let x ∈ L 2 (T, E). The function ϕ : L 2 (T, E) → C given by Proof. Consider any function g 0 ∈ L 2 (T, E). Given ǫ > 0, we are looking for a δ > 0 such that Note that For each e iθ ∈ T, by the reverse triangle inequality, the integrand satisfies By the Cauchy-Schwarz inequality, For the given ǫ > 0, choose δ equal to ǫ x L 2 (T,E) + 1 , and consider g ∈ L 2 (T, E) such that By equations (3.10) and (3.11), Hence ϕ is a continuous function. Proof. V is a linear subspace of zH 2 (D, C m ) since for λ, µ ∈ C, ψ, k ∈ V and for almost all z ∈ T, Now suppose that the sequence of functions (g n ) ∞ n=1 in V converges to a function g. We need to show that g ∈ V. Since g n ∈ V for all n ∈ N, we have g n (z), η 0 (z) C m = 0 for almost all z ∈ T. (3.12) Consider the function ϕ : H 2 (D, C m ) → C given by Then, by equation (3.12), we have ϕ(g n ) = 1 2π 2π 0 | g n (e iθ ), η 0 (e iθ ) C m | dθ = 0 for almost all e iθ ∈ T.
Note that by Fatou's theorem, for each function f ∈ H 2 (D, C m ), the radial limit exists almost everywhere and defines a function in L 2 (T, C m ). This way, H 2 (D, C m ) can be identified with a closed subspace of L 2 (T, C m ). Hence by Lemma 3.49, ϕ is a continuous function on H 2 (D, C m ), and so lim n→∞ 1 2π for almost all e iθ ∈ T. Thus | g(e iθ ), η 0 (e iθ ) C m | = 0 for almost all e iθ ∈ T, and, hence, g ∈ V. We have proved that V is a closed subspace of zH 2 (D, C m ).

Superoptimal analytic approximation
In this section we present our main result, which is an algorithm for the superoptimal analytic approximation of a matrix-valued function on the circle. In Subsection 4.1 we recall certain known results and Peller and Young's algorithm (Theorem 4.19). In Sub-     In other words this class consists of functions on the circle which belong to H ∞ + C and have the property that their complex conjugates belong to H ∞ + C as well.
We shall also need the class of functions of vanishing mean oscillation, as described, for example, in [20, Appendix 2, Section 5].  It is therefore not surprising that the spaces QC and VMO are closely related. In fact   Next we describe some properties that a space X of equivalence classes of scalar functions on the circle may possess [24,Page 330]. Define the non-linear operator A = A (m,n) on the space of m × n functions G ∈ H ∞ (D, C m×n ) + C(T, C m×n ) by saying that A (m,n) G is the unique superoptimal approximation in H ∞ (D, C m×n ) to G.
We say that X is hereditary for A if, for every scalar function g ∈ X, the best analytic approximation Ag of g belongs to X.
(α1) X contains all polynomial functions and X ⊂ VMO; (α2) X is hereditary for A; (α3) if f ∈ X thenzf ∈ X and P + f ∈ X; (α4) if f, g ∈ X ∩ L ∞ then f g ∈ X ∩ L ∞ ; (α5) if f ∈ X ∩ H 2 and h ∈ H ∞ then Thf ∈ X ∩ H 2 . The relevance of these properties is contained in the following statement, which is [24,Lemma 5.3]. Recall that a function f ∈ L ∞ is said to be badly approximable if the best analytic approximant to f is the zero function. In view of Nehari's Theorem, f is badly approximable if and only if f ∞ = H f . Lemma 4.10. Let X satisfy (α1) to (α5)and let ϕ ∈ X be an n × 1 inner function. Let ϕ c be an n × (n − 1) function in H ∞ such that [ϕφ c ] is unitary-valued a.e. on T and has all its minors on the first column in H ∞ . Then ϕ c ∈ X.
for some scalar outer function h, some scalar inner ϕ, and column-matrix inner functions v 0 , w 0 . Moreover there exist unitary-valued functions V, W of types n × n, m × m respectively, of the form where α, β are inner, co-outer functions, quasi-continuous functions of types n × (n − 1), m × (m − 1) respectively, and all minors on the first columns of V, for some F ∈ H ∞ (D, C (m−1)×(n−1) )+C(T, C (m−1)×(n−1) ) and some quasi-continuous function u 0 given by with |u 0 (z)| = 1 almost everywhere on T.
In the statement of the lemma, in saying that an m × n matrix-valued function α is co-outer we mean that each column of α is in H ∞ m and α T H 2 m is dense in H 2 n . (In [15,Page 190], such a function α is said to be *-outer).
Proof of Lemma 4.11. We shall use a modified version of [24, Theorem 0.2].  Proof. By Nehari's Theorem, ϕ − Q L ∞ = t and, by hypothesis, , so that ϕ = Q and the statement of the theorem is trivially true. We may therefore assume t > 0. Thus H * ϕ H ϕ v = t 2 v, and so v is a maximising vector for H ϕ . We can assume that v is a unit vector in H 2 (D, C n ), and then w is a unit vector in H 2 (D, C m ) ⊥ and is a maximising vector for H * ϕ . We have The inequlities must hold with equality throughout, and therefore Again, the inequalities hold with equality throughout, and in particular First we construct V and W with the properties (4.1) to (4.4). By equation (4.6), v(z) = w(z) almost everywhere, and so the column-vector functions v,zw in H 2 have the same (scalar) outer factor h. This property yields the inner-outer factorizations Next we show that u 0 given by equation By Theorem 4.12, (G − Q)v = t 0 w and by the factorizations (4.2) we have and by equations (4.3) and (4.5) Because t 0 = H G , it follows that |u 0 | = 1 almost everywhere, and from Nehari's Theorem Hence which implies that u 0 is badly approximable. The (1, 1) entries of equation (4.4) are we have (assuming, as we may, that v and w are unit vectors), It follows that the inequalities hold with equality, and so Taking complex conjugates in the last equation we have Thus, by equation (4.2), Recall that u 0 =zφh/h, and sov To complete the proof of Lemma 4.11, all that remains is to show that α, β are quasicontinuous and F ∈ H ∞ + C. This will follow from Lemma 4.10 above.

Definition 4.13. We say that a unitary-matrix-valued function V is a thematic completion of a column-matrix inner function
is a unitary matrix for almost all z ∈ T and such that all minors on the first column of V are analytic.
Remark 4.14. By Theorem 1.1 of [24], every column-matrix inner function has a thematic completion. Thematic completions are not unique, for if V = v 0ᾱ is a thematic completion of v 0 , then so is v 0ᾱ U for any constant (n − 1)-square unitary matrix U. However, by Corollary 1.6 of [24], the thematic completion of v 0 is unique up to multiplication on the right by a constant unitary matrix of the form diag{1, U } for some constant (n − 1)-square matrix U, and so it is permissible to speak of "the thematic completion of v 0 ". Furthermore, by Theorem 1.2 of [24], thematic completions have constant determinants almost everywhere on T, and hence α, β are inner matrix functions. Observe that, as we showed above, if the column v 0 belongs to VMO, then the thematic completion of v 0 is quasi-continuous. Similarly, if the column w 0 belongs to VMO, then the thematic completion of w 0 is quasi-continuous. Thus α, β are inner, co-outer, quasi-continuous functions of types n × (n − 1) and m × (m − 1) respectively. . Let m, n > 1, let G ∈ H ∞ (D, C m×n ) + C(T, C m×n ), let H G = t 0 and let Q 1 ∈ H ∞ (D, C m×n ) be at minimal distance from G, so that in the notation of Lemma 4.11, The latter equation combined with equation (4.9), yields which proves the inclusion. Conversely, suppose q ∈ H ∞ (D, C m−1×n−1 ) and By ( [24], Lemma 1.5), there exists a function Q ∈ H ∞ (D, C m×n ) such that Then Hence equality holds in equation (4.10).
is the scalar outer factor of x 0 ∈ H 2 (D, C n ), and let  Lemma 4.11, that is, 24], p. 337). Let α ∈ QC of type m × n, where m ≥ n, be inner and co-outer. There exists A ∈ H ∞ (D, C n×m ) such that Aα = I n . Here I n denotes the n × n identity matrix. Theorem 4.19 gives the algorithm for the superoptimal analytic approximant constructed in [26].

) respectively and all minors on the first columns of
where P N j is the orthogonal projection onto N j . If λ j = 0 set r = j and terminate the construction. Otherwise let χ j , ψ j be a Schmidt pair for Γ j corresponding to the singular value λ j . Let K j+1 be the range of the orthogonal projection of K j onto the pointwise orthogonal complement of χ 0 , · · · , χ j in L 2 (T, C n ). Let N j+1 be the projection of N j onto the pointwise orthogonal complement of ψ 0 , · · · , ψ j in L 2 (T, C m ). Let Q j+1 ∈ H ∞ (D, C m×n ) be chosen to satisfy, for 0 ≤ k ≤ j, (4.14) Then each Γ j is a compact operator, Q j with the above properties does exist, the construction terminates with r ≤ min(m, n) and We shall derive a similar formula for the superoptimal analytic approximant AG, by making use of exterior products of Hilbert spaces.

4.2.
Algorithm for superoptimal analytic approximation. In this section we consider the superoptimal analytic approximation problem for a function G ∈ H ∞ (D, C m×n )+ C(T, C m×n ). We first state the algorithm for the solution of Problem 1.2; later we shall prove the claims that are made in this description of the algorithm. We will assume here the result of Peller and Young [24] that Problem 1.2 has a unique solution (see Theorem 1.3). For convenience, we give citations of the steps in this paper where the corresponding claims are proved.
In this subsection we shall give a fuller and more precise statement of the algorithm for AG outlined in the Introduction, Section 1, in preparation for a subsequent formal proof of Theorem 10.12, which asserts that if entities r, t i , x i , y i , h i for i = 0, . . . , r − 1, are generated by the algorithm, then the superoptimal approximant is given by equation The proof will be by induction on r, which is the least index j ≥ 0 such that T j = 0, where T 0 = H G , T 1 , T 2 , . . . is a sequence of operators recursively generated by the algorithm.
Step 0. Let t 0 = H G . If t 0 = 0, then H G = 0, which implies G ∈ H ∞ (D, C m×n ). In this case, the algorithm terminates, we define r to be zero and the superoptimal approximant AG is given by AG = G, in agreement with the formula contained in the statement of Theorem 10.12 (since the sum on the right hand side of equation (4.16) is empty, and therefore by convention is interpreted as being zero). Otherwise, t 0 > 0. By Theorem 4.2 and Lemma 4.11, H G is a compact operator and so there exists a Schmidt pair (x 0 , y 0 ) corresponding to the singular value t 0 of H G . By the definition of the Schmidt pair (x 0 , y 0 ) corresponding to t 0 for the Hankel operator are non-zero vector-valued functions such that for some scalar outer factor h 0 ∈ H 2 (D, C) and column matrix inner . Then We write equations (4.17) as By equations (4.18) and (4.19), Step 1. Let and thereforeη where P Y 1 is the projection from L 2 (T, ∧ 2 C m ) on Y 1 . By Corollary 7.2 and Proposition 8.1, T 1 is well-defined. If T 1 = 0, then the algorithm terminates, we define r to be 1 and, in agreement with Theorem 10.12, the superoptimal approximant AG is given by the formula

and the solution is
is a Schmidt pair for T 1 corresponding to t 1 . Let h 1 be the scalar outer factor of ξ 0∧ v 1 and let where I C n and I C m are the identity operators in C n and C m respectively. Then, by Proposition 5.1, (4.27) Define By equations (4.26) and (4.28), ξ 1 (z) C n = 1 = η 1 (z) C n almost everywhere on T.
Step 2. Define Note that, by Proposition 6.2, X 2 is a closed linear subspace of H 2 (D, ∧ 3 C n ), and, by Proposition 7.3, Y 2 is a closed linear subspace of H 2 (D, ∧ 3 C m ) ⊥ . Now consider the operator T 2 : X 2 → Y 2 given by where P Y 2 is the projection from L 2 (T, C m ) on Y 2 . By Corollary 7.4 and Proposition 8.1, T 2 is well defined, that is, it does not depend on the choice of Q 2 ∈ H ∞ (D, C m×n ) satisfying equations (4.27). If T 2 = 0, then the algorithm terminates, we define r to be 2 and, according to Theorem 10.12, the superoptimal approximant AG is given by the formula If T 2 = 0, then let t 2 = T 2 . By Theorem 9.25, T 2 is a compact operator and hence there is a Schmidt pair for T 2 corresponding to T 2 = t 2 .
Recursive step. Suppose we have constructed (4.36) Note that, by Proposition 6.1, X j+1 is a subset of H 2 (D, ∧ j+2 C n ), and, by Proposition If T j+1 = 0, then the algorithm terminates, we define r to be j + 1, and, according to Theorem 10.12, the superoptimal approximant AG is given by the formula Otherwise, we define t j+1 = T j+1 > 0. By Theorem 9.1, T j+1 is a compact operator and hence there exist v j+1 ∈ H 2 (D, is a Schmidt pair for T j+1 corresponding to the singular value t j+1 .
This completes the recursive step. The algorithm terminates after at most min(m, n) steps, so that r ≤ min(m, n) and, in accordance with Theorem 10.12 the superoptimal approximant AG is given by the formula almost everywhere on T These orthonormality properties will be needed for the justification of the main algorithm.
are orthonormal in C n and C m respectively for almost every z ∈ T.
Since the function G belongs to H ∞ (D, C m×n ) + C(T, C m×n ), by Hartman's theorem, the Hankel operator with symbol G, denoted by H G , is a compact operator, and so there exist functions x 0 ∈ H 2 (D, C n ), y 0 ∈ H 2 (D, C m ) ⊥ such that (x 0 , y 0 ) is a Schmidt pair corresponding to the singular value t 0 = H G = 0. By Lemma 4.11, x 0 ,zȳ 0 admit the inner-outer factorizations for column matrix inner functions ξ 0 ∈ H ∞ (D, C n ), η 0 ∈ H ∞ (D, C m ) and some scalar outer factor h 0 ∈ H 2 (D, C). By Theorem 4.12, x 0 (z) C n = |h 0 (z)| = y 0 (z) C m almost everywhere on T. (5.1) Hence (iii) of Proposition 5.1 holds for {ξ i (z)} j i=0 in the case that j = 0. Let T 1 be given by equation (4.24). By the hypothesis (4.33), T 1 is a compact operator, and if T 1 = 0, then there exist v 1 ∈ H 2 (D, C n ) and Let h 1 be the scalar outer factor of ξ 0∧ v 1 . We define and Then, for z ∈ D, Note that by equation (5.2), almost everywhere on T. Note that, by equation (5.3), for almost every z ∈ T, the last equality following from the pointwise linear dependence of the vectors ξ 0 and ξ 0 ξ * 0 v 1 on T. Moreover, since h 1 is the scalar outer factor of ξ 0∧ v 1 , for almost every z ∈ T, we have By Lemma 3.16, Hence, for almost every z ∈ T, and thus Consequently, {ξ 0 (z), ξ 1 (z)} is an orthonormal set in C n for almost every z ∈ T. Hence (iii) of Proposition 5.1 holds for {ξ i (z)} j i=0 in the case that j = 1. Recursive step: Suppose the entities in equations (4.33) have been constructed and have the stated properties. Since by the inductive hypothesis T j is a compact operator, is a Schmidt pair for T j corresponding to T j = t j . By Proposition 6.1, ξ 0∧ · · ·∧v j is an element of H 2 (D, ∧ j+1 ). Let h j be the scalar outer factor of ξ 1∧ ξ 2∧ · · ·∧ξ j−1∧ v j . We define and so, for i = 0, . . . , j − 1, almost everywhere on T. Note that by the inductive hypothesis, for i, k = 0, 1, · · · , j − 1 and for almost all z ∈ T, Thus, for i = 0, . . . , j − 1, almost everywhere on T, and hence, by induction on j and for all integers j = 0, . . . , r − 1, {ξ 0 (z), · · · , ξ j−1 (z), ξ j (z)} is an orthogonal set in C n for almost all z ∈ T. Let us show that almost everywhere on T. Notice that, for i = 0, · · · , j−1, the vectors ξ i (z) and ξ i (z) v j (z), ξ i (z) C n are pointwise linearly dependent in C n almost everywhere on T. Thus for i = 0, · · · , j − 1, Next, we shall show that ξ j (z) C n = 1 for almost all z ∈ T. Recall that h j is the scalar outer factor of ξ 1∧ ξ 2∧ · · ·∧ξ j−1∧ v j , and therefore By the inductive hypothesis, {ξ 0 (z), · · · , ξ j−1 (z)} is an orthonormal set in C n for almost all z ∈ T, hence, by Lemma 3.16, almost everywhere on T, and hence, by induction on j, {ξ 0 (z), · · · , ξ j−1 (z), ξ j (z)} is an orthonormal set in C n for almost all z ∈ T, and for all integers j = 0, . . . , r − 1.
Next, we will prove inductively that the set Let T 1 be given by equation (4.24). T 1 is assumed to be a compact operator, and if Suppose h 1 is the scalar outer factor of ξ 0∧ v 1 . Let and let almost everywhere on T. Then,η By equation (5.11), η 0 (z) C m = 1 almost everywhere on T. Hence = 0 almost everywhere on T.
Recall that h 1 is the scalar outer factor of ξ 0∧ v 1 . By equation (5.6) and Proposition 9.14, Consequently, {η 0 (z),η 1 (z)} is an orthonormal set in C m for almost every z ∈ T. Hence (iii) of Proposition 5.1 holds for {η i } j i=0 in the case j = 1. Recursive step: Suppose the entities in equations (4.33) have been constructed and have the stated properties. Since by the inductive hypothesis T j is a compact operator, is a Schmidt pair for T j corresponding to T j = t j . By Proposition 6.1, Let h j be the scalar outer factor of ξ 0∧ ξ 1∧ · · ·∧ξ j−1∧ v j . We define Let us show that {η 0 (z), . . . ,η j (z)} is an orthonormal set in C m almost everywhere on T.
To complete the proof, we have to prove that η j (z) C m = 1 for almost all z ∈ T. Recall that h j is the scalar outer factor of ξ 0∧ ξ 1∧ · · ·∧ξ j−1∧ v j . By Proposition 10.10, almost everywhere on T, and hence, {η 0 (z), . . . ,η j (z)} is an orthonormal set in C m almost everywhere on T.
, and let j ≤ n − 1. Let the vectorvalued functions ξ 0 , ξ 1 , · · · , ξ j be constructed after applying steps 0, . . . , j of the algorithm above and be given by equations ( 4.41). Then is a Schmidt pair for the Hankel operator H G corresponding to the singular value H G . By Lemma 4.11, x 0 , y 0 admit the inner-outer factorizations Let us now consider the case where j = 1. By definition, and, by the inductive hypothesis, T 1 : X 1 → Y 1 given by equation (4.24) is a compact operator. Suppose T 1 = 0 and let (ξ 0∧ v 1 ,η 0∧ w 1 ) be a Schmidt pair corresponding to We define Note that ξ 0 and ξ 0 ξ * 0 v 1 are pointwise linearly dependent on D, since ξ * 0 v 1 is a mapping from D to C. Thus, for all x ∈ H 2 (D, C n ) and z ∈ D, we have and by substituting the value of x 1 , we get is analytic on D. By Proposition 3.42, since ξ 0 and ξ 1 are pointwise orthogonal on T, Hence, Recursive step: suppose we have constructed vector-valued functions ξ 0 , . . . , ξ j−1 , η 0 , . . . , η j−1 , spaces X j , Y j and a compact operator T j : X j → Y j after applying steps 0, . . . , j of the algorithm from Subsection 3.2.1 satisfying is a Schmidt pair for T j corresponding to T j . Define Then, for all x ∈ H 2 (D, C n ) and all z ∈ D, Recall that, for i = 0, . . . , j − 1, by the algorithm from Subsection 3.2.1, By equation (6.3), for all z ∈ D, .
is analytic on D. By Proposition 3.42, since ξ 0 , ξ 1 , . . . , ξ j are pointwise orthogonal on T, Thus, for every x ∈ H 2 (D, C n ), and the claim has been proved.
Proposition 6.2. In the notation of Propostition 6.1, Consider a vector-valued function w ∈ H 2 (D, C n ). For all z ∈ D, we may write w as Then, for all w ∈ H 2 (D, C n ) and for all z ∈ D, C m = 1 for almost every e iθ ∈ T. Therefore, for any w ∈ Ξ 0 , we have , since w is pointwise orthogonal to ξ 0 almost everywhere on T. Thus the mapping We may write ψ as Then, for all ψ ∈ H 2 (D, C n ) and for almost all z ∈ T, The reverse inclusion holds by the definition of Ξ j , hence ξ 0∧ · · ·∧ξ j∧ H 2 (D, C n ) = ξ 0∧ · · ·∧ξ j∧ Ξ j . Consequently, in order to prove the proposition it suffices to show that ξ 0∧ · · ·∧ξ j∧ Ξ j is a closed subspace of H 2 (D, ∧ j+2 C n ). By Corollary 3.50, Ξ j is a closed subspace of H 2 (D, C n ), being a finite intersection of closed subspaces. For any f ∈ Ξ j , we get Note that f and ξ i are pointwise orthogonal almost everywhere on T, and, by Proposition 5.1, {ξ 0 (z), . . . , ξ j (z)} is an orthonormal set for almost every z ∈ T. Hence is a surjective mapping, thus Ξ j and ξ 0∧ · · ·∧ξ j∧ Ξ j are isometrically isomorphic. Therefore, since Ξ j is a closed subspace of H 2 (D, C n ), the space ξ 0∧ · · ·∧ξ j∧ Ξ j is a closed subspace of H 2 (D, ∧ j+2 C n ). Hence Proof. As in Proposition 6.1, one can show that By virtue of the fact that complex conjugation is a unitary operator on L 2 (T, C m ), an equivalent statement is that η 0∧ zH 2 (D, C m ) is a closed subspace of zH 2 (D, for almost all z ∈ T} be the pointwise orthogonal complement of η 0 in zH 2 (D, C m ). Consider g ∈ zH 2 (D, C m ). We may write g as for every z ∈ D. Then, for all g ∈ zH 2 (D, C m ) and for all z ∈ D, The reverse inclusion is obvious, hence To prove the proposition, it suffices to show that Consider the mapping Notice that, by Proposition 5.1, η 0 (e iθ ) 2 C m = 1 for almost every e iθ ∈ T. Then, for any υ ∈ V, we have , since υ is pointwise orthogonal to η 0 almost everywhere on T. Thus the mapping C η 0 : V → η 0∧ V is an isometry. Note that by Corollary 3.50, V is a closed subspace of zH 2 (D, C m ). Furthermore, is a surjective mapping, thus V and η 0∧ V are isometrically isomorphic. Therefore, since V is a closed subspace of zH 2 (D, C m ), the space η 0∧ V is a closed subspace of zH 2 (D, ∧ 2 C m ).
Proof. By Proposition 3.46, H 2 (D, ∧ 2 C m ) can be identified with a closed subspace of L 2 (T, ∧ 2 C m ), thus we have Now the assertion follows immediately from Proposition 7.1.
Recall that because of pointwise linear dependence of η i and η i η * izw j+1 on D. By Proposition 10.10, Observe that, by Proposition 3.35, for every x ∈ H 2 (D, C m ), is analytic on D. By Proposition 3.42, for all x ∈ H 2 (D, C m ), since η 0 , · · · , η j are pointwise orthogonal on T, Hence, for every x ∈ H 2 (D, C m ), Taking complex conjugates, we infer that Let us prove that Y j+1 is a closed linear subspace of H 2 (D, ∧ j+2 C m ) ⊥ . Since complex conjugation is a unitary operator on L 2 (T, C m ), an equivalent statement to the above is that be the pointwise orthogonal complement of η 0 , · · · , η j in zH 2 (D, C m ). Consider f ∈ zH 2 (D, C m ). We may write f as Then, for all f ∈ zH 2 (D, C m ) and for almost all z ∈ T, .
The reverse inclusion holds by the definition of V j , hence Consequently, in order to prove the proposition it suffices to show that η 0∧ η 1∧ · · ·∧η j∧ V j is a closed subspace of zH 2 (D, ∧ j+2 C m ). By Corollary 3.50, V j is a closed subspace of zH 2 (D, C m ), being a finite intersection of closed subspaces. For any f ∈ V j , we get Note that f and η i are pointwise orthogonal almost everywhere on T and, by Proposition 5.1, {η 0 (z), . . . , η j (z)} is an orthonormal set for almost every z ∈ T. Hence is a surjective mapping, thus V j and η 0∧ · · ·∧η j∧ V j are isometrically isomorphic. Therefore, since V j is a closed subspace of zH 2 (D, C m ), the space η 0∧ · · ·∧η j∧ V j is a closed subspace of zH 2 (D, ∧ j+2 C m ). Hencē Corollary 7.4. Let 0 ≤ j ≤ m − 2. The orthogonal projection Proof. By Proposition 3.46, H 2 (D, ∧ j+2 C m ) can be identified with a closed subspace of L 2 (T, ∧ j+2 C m ), thus we have Now the assertion follows immediately from Proposition 7.3.
8. T j is a well-defined operator Let the functions ξ i , η i be defined by equations ( 4.41), that is, for i = 0, · · · , j and let Then, the operators T i : X i → Y i , i = 0, · · · , j, given by are well-defined and are independent of the choice of Q i ∈ H ∞ (D, C m×n ) satisfying equations (8.2).
Proof. By Corollary 7.4, the projections P Y i are well-defined for all i = 0, · · · , j. Hence it suffices to show that, for all i = 0, 1, · · · , j, T i maps a zero from its domain to a zero in its range and that T i does not depend on the choice of Q i , which satisfies equations (8.2). For i = 0, the operator T 0 is the Hankel operator H G . If f 0 ≡ 0, then H G f 0 = 0 and, moreover, H G is independent of the choice of any Q ∈ H ∞ (D, C m×n ) as H G−Q = H G . Thus, T 0 is well-defined.
For i = 1, let (x 0 , y 0 ) be a Schmidt pair for the compact operator H G corresponding to t 0 = H G , where x 0 ∈ H 2 (D, C n ) and y 0 ∈ H 2 (D, C m ) ⊥ . By Lemma 4.11, x 0 ,zȳ 0 admit the inner-outer factorisations are inner vector-valued functions and h 0 ∈ H 2 (D, C) is scalar outer. The spaces X 1 and Y 1 are given by the formulas The operator T 1 : X 1 → Y 1 is given by for all x ∈ H 2 (D, C n ), where Q 1 ∈ H ∞ (D, C m×n ) satisfies equations (8.2).
Suppose that ξ 0∧ x = 0 for some x ∈ H 2 (D, C n ). Then x and ξ 0 are pointwise linearly dependent in C n on D. Therefore there exist non-zero maps β, λ : D → C such that for all z ∈ D. By assumption, Q 1 ∈ H ∞ (D, C m×n ) satisfies equations (8.2). Thus, for all z ∈ D, By equations (8.1) and (8.3), for all z ∈ D. By equations (8.4) and (8.5), we get, for all z ∈ D, Therefore, by equations (8.1), for all z ∈ D, Hence, by Definition 3.30,η 0 and (G − Q 1 )x are pointwise linearly dependent in C m on z ∈ D, and soη Consequently T 1 maps zeros to zeros.
Recall our initial assumption was that Q 1 , Q 2 satisfy equations (8.6) and (8.7), consequently, To conclude with, we have proved that, if Q 1 , Q 2 ∈ H ∞ (D, C m×n ) satisfy equations (8.6) and (8.7), then that is, T 1 is independent of the choice of Q 1 . Thus T 1 is a well-defined operator.
Recursive step: suppose that functions , spaces X i , Y i and compact operators T i : X i → Y i are constructed inductively by the algorithm for all i = 0, . . . , j.
Let us prove that T j : X j → Y j , given by equation (4.37), is well-defined for all 0 ≤ j ≤ min(m, n) − 2. Note, by Corollary 7.4, the projection P Y j is well-defined. We will prove that T j maps zeros from its domain to zeros to its range and T j is independent of the choice of Q j that satisfies equations (8.2).
is a Schmidt pair for T j corresponding to t j = T j . Suppose ξ 0∧ ξ 1∧ · · ·∧ξ j−1∧ x = 0. Then x(z) is pointwise linearly dependent on ξ 0 (z), ξ 1 (z), · · · , ξ j−1 (z) in C n for almost all z ∈ T. This means there exist maps λ i , ν : T → C , i = 0, · · · , j − 1 which are non-zero almost everywhere on T and are such that By Theorem 1.3, there exists a function Q j ∈ H ∞ (D, C m×n ) that lexicographically minimizes By equations (8.1), Then, for almost all z ∈ T, Therefore for all x ∈ H 2 (D, C n ), η 0 (z), · · · , η j−1 (z) and ((G − Q j )x)(z) are pointwise linearly dependent in C m almost everywhere on T. Hencē Consequently, T j maps a zero from its domain to a zero in its range.
For the operator T j to be well-defined, it remains to prove T j is independent of the choice of Q j ∈ H ∞ (D, C m×n ) which satisfies equations (8.9). Let Q 1 , Q 2 ∈ H ∞ (D, C m×n ) satisfy We would like to prove that, for all x ∈ H 2 (D, C n ), The latter equality holds if and only if, for all x ∈ H 2 (D, C n ), which is equivalent to the assertion thatη 0∧ · · ·∧η j−1∧ (Q 2 −Q 1 )x is orthogonal toη 0∧ · · ·∧η j−1∧ q for all x ∈ H 2 (D, C n ) and for all q ∈ H 2 (D, C m ) ⊥ . Equivalently η 0∧ · · ·∧η j−1∧ (Q 2 − Q 1 )x,η 0∧ · · ·∧η j−1∧ q L 2 (T,C m ) = 0 for all x ∈ H 2 (D, C n ) and for all q ∈ H 2 (C m ) ⊥ . Set Ax = (Q 2 − Q 1 )x, x ∈ H 2 (D, C n ). By Proposition 3.14, Notice that Ax and q are orthogonal in L 2 (T, C m ) and, by Proposition 5.1, {η i (z)} j−1 i=0 is an orthonormal sequence in C n almost everywhere on T. Also, for all i = 0, · · · , j − 1, by equations (8.10), and so T j is independent of the choice of Q j that satisfies equations (8.9). Thus we have proven that the operator T j is well-defined. 9. Compactness of the operators T 1 and T 2 Here we use notations from the algorithm of Section 4.2 to prove the compactness of the operator T j given by equation (4.37). The proof requires several steps. Let us first prove that the operator T 1 is compact.
Recall that since G ∈ H ∞ (D, C m×n )+C(T, C m×n ), by Hartman's theorem, the operator T 0 = H G is compact and hence there exist x 0 ∈ H 2 (D, C n ) and y 0 ∈ H 2 (D, C m ) ⊥ such that (x 0 , y 0 ) is a Schmidt pair for H G corresponding to the singular value H G = t 0 .
By Lemma 4.11, x 0 ,zȳ 0 admit the inner-outer factorizations where α 0 , β 0 are inner, co-outer, quasi-continuous functions of types n×(n−1), m×(m−1) respectively and all minors on the first columns of V 0 , W T 0 are in H ∞ . Furthermore every )) and some quasi-continuous function u 0 with |u 0 (z)| = 1 almost everywhere on T.
Recall that Our first endeavour in this subsection is to prove the following Theorem.

3)
and let the maps U 1 : H 2 (D, C n−1 ) → K 1 , U 2 : H 2 (D, C m−1 ) ⊥ → L 1 be given by The following diagram is commutative: To prove Theorem 9.1, the following steps are needed.
Lemma 9.2. In the notation of Theorem 9.1, the Hankel operator H G has a maximizing vector x 0 of unit norm such that ξ 0 , which is defined by ξ 0 = x 0 h 0 , is a co-outer function. Proof. Choose any maximizing vector x 0 . By Lemma 4.11, x 0 has the inner-outer factorization x 0 = ξ 0 h 0 , where h 0 is a scalar outer factor. Then, the closure of ξ T 0 H 2 (D, C n ), denoted by clos(ξ T 0 H 2 (D, C n )), is a closed shift-invariant subspace of H 2 (D, C), so, by Beurling's theorem, clos(ξ T 0 H 2 (D, C n )) = ϕH 2 (D, C) for some scalar inner function ϕ. Hencē C). Thus, if ξ T 0 = (ξ 01 , · · · , ξ 0n ), we haveφξ 0j ∈ H ∞ (D, C) for j = 1, · · · , n, and so, ϕξ 0 ∈ H ∞ (D, C n ). Hence Hence ϕx 0 ∈ H 2 (D, C n ) is a maximizing vector for H G , and ϕx 0 is co-outer. Thenφ x 0 x 0 is a co-outer maximizing vector of unit norm for H G . Proof. Let us first show that Let Q be a best H ∞ approximation to G. Then the function Q satisfies the equation Taking complex conjugates in equations (9.1), we get Recall, by equation (4.5) (with ϕ = 1), u 0 =zh 0 h 0 . By Lemma 4.11, u 0 ∈ QC, hence u 0 ∈ H ∞ + C. Note u 0 = zh 0 h 0 , and hence Since H ∞ + C is an algebra and (G − Q) T , η 0 ∈ H ∞ , it follows that ξ 0 ∈ H ∞ + C, thus The conclusion that there exists a function A ∈ H ∞ (D, C n ) such that A T ξ 0 = 1 now follows directly from Lemma 4.18.
Lemma 9.4. In the notation of Theorem 9.1, let ξ 0 ∈ H ∞ (D, C n ) be a vector-valued inner, quasi-continuous function and let be a thematic completion of ξ 0 as described in Lemma 4.11, where α 0 is an inner, co-outer, quasi-continuous function of order n × (n − 1) and all minors on the first column of V 0 are analytic. Then, , which implies that g ∈ α T 0 H 2 (D, C n ). Hence H 2 (D, C n−1 ) ⊆ α T 0 H 2 (D, C n ). For the reverse inclusion, note that since α 0 is co-outer, then α T 0 H 2 (D, C n ) is dense in H 2 (D, C n−1 ). Hence, α T 0 H 2 (D, C n ) ⊆ H 2 (D, C n−1 ). Thus α T 0 H 2 (D, C n ) = H 2 (D, C n−1 ).
Proof. Let g ∈ V * 0 POC({ξ 0 }, L 2 (T, C n )). Equivalently, g can be written as g = V * 0 f for some f ∈ L 2 (T, C n ) such that f (z) ⊥ ξ 0 (z) for almost all z ∈ T. This in turn is equivalent to the assertion that g = V * 0 f for some f ∈ L 2 (T, C n ) such that (V * 0 f )(z) ⊥ (V * 0 ξ 0 )(z) for almost all z ∈ T, since V 0 (z) is unitary for almost all z ∈ T. Note that, by the fact that V 0 is unitary-valued almost everywhere on T, we get almost everywhere on T, (9.5) and so where 0 (n−1)×1 denotes the zero vector in C n−1 .
for almost every z ∈ T, is equivalent to the statement g ∈ L 2 (T, C n ) and for almost all z ∈ T, or equivalently, g ∈ 0 L 2 (T, C n−1 ) .
(iii) We have to prove that diagram (9.4) commutes. Recall, by Lemma 4.17, the left hand square commutes, so it suffices to show that that the right hand square, namely also commutes. That is, we would like to prove that, for all x ∈ K 1 , Thus, for x ∈ K 1 , Hence to prove the commutativity of diagram (9.9), it suffices to show that, for all x ∈ K 1 , and so, for all x ∈ K 1 ,η 0∧ P L 1 (G − Q 1 )x ∈ Y 1 . Let us show that, for x ∈ K 1 , for x ∈ K 1 and for anyx ∈ H 2 (D, C n ) such that ξ 0∧x = ξ 0∧ x. By Lemma 9.11, Then equation (9.10) is equivalent to the equation for any x ∈ K 1 . By Lemma 9.7, equation (9.11) holds if and only if the function x](z),η 0 (z) C mη 0 (z) (9.12) belongs to H 2 (D, C m ).
By Lemma 9.8, there exists a function ψ ∈ L 2 (T, C m ) such that Hence, to prove that the function defined by equation (9.12) belongs to H 2 (D, C m ), we have to show that , and so β 0 β * 0 ψ ∈ H 2 (D, C m ) as required, proving that diagram (9.9) commutes.
(v) Since diagram (9.4) is commutative and U 1 , U 2 , (ξ 0∧ ·) and (η 0∧ ·) are unitaries, In what follows, we will prove an analogous statement to Theorem 9.1 for T 2 . To this end, we need the following results.
Lemma 9.12. In the notation of Theorem 9.1, v 1 ∈ H 2 (D, C n ) and w 1 ∈ H 2 (D, C m ) ⊥ are such that (ξ 0∧ v 1 ,η 0∧ w 1 ) is a Schmidt pair for the operator T 1 corresponding to T 1 . Then (i) there exist x 1 ∈ K 1 and y 1 ∈ L 1 such that (x 1 , y 1 ) is a Schmidt pair for the operator Γ 1 ; (ii) for any x 1 ∈ K 1 and y 1 ∈ L 1 such that the pair (x 1 , y 1 ) is a Schmidt pair for Γ 1 corresponding to Γ 1 .
Lemma 9.13. Suppose (ξ 0∧ v 1 ,η 0∧ w 1 ) is a Schmidt pair for T 1 corresponding to t 1 . Let (ii) Recall that, by Lemma 4.17, the maps for all χ ∈ H 2 (D, C n−1 ) and all ψ ∈ H 2 (D, C m−1 ) ⊥ , are unitaries. By the commutativity of the diagram (9.4), By Part (i), x 1 ∈ K 1 and y 1 ∈ L 1 and by Proposition 5.1, Thus, by Lemma 9.12, (x 1 , y 1 ) is a Schmidt pair for the operator Γ 1 corresponding to t 1 = Γ 1 , that is, To prove that the pair (x 1 ,ŷ 1 ) is a Schmidt pair for H F 1 corresponding to H F 1 = t 1 , we need to show that H F 1x 1 = t 1ŷ1 , and H * F 1ŷ 1 = t 1x1 . By equations (9.20) and (9.17), we have Let us show that H * F 1ŷ 1 = t 1x1 . By equations (9.20) and (9.17), we have Therefore (x 1 ,ŷ 1 ) is a Schmidt pair for H F 1 corresponding to H F 1 = t 1 .
, ϕ(e iθ ) C n−2 = 0 almost everywhere on T. Therefore Let us state certain identities that are useful for the next statements.
(iii) the following diagram commutes Proof. (i) It follows from Lemma 4.16. (ii) Follows from Propositions 9.21 and 9.23. (iii) By Proposition 8.1, T 2 is well-defined and is independent of the choice of Q 2 ∈ H ∞ (D, C m×n ) satisfying equations (9.47). We can choose Q 2 which minimises (s ∞ 0 (G − Q), s ∞ 1 (G − Q)), and therefore satisfies equations (9.47). By Lemma 4.17 and Theorem 4.12, the left hand side of diagram (9.48) commutes. Let us show the right hand side also commutes. A typical element of K 2 is of the formᾱ 0ᾱ1 x where x ∈ H 2 (D, C n−2 ). Then, by equation (9.46), Note that, by equation (9.28), every ) satisfies the following equation for some F 2 ∈ H ∞ (D, C (m−2)×(n−2) ). Hencẽ Hence, by equation (9.49), In order to prove the commutativity of diagram (9.48), we need to show and thatη is orthogonal to Y 2 if and only if η 0∧η1∧ P L ⊥ 2 (β 0 β 1 F 2 x),η 0∧η1∧ g L 2 (T,∧ 3 C m ) = 0 for every g ∈ H 2 (D, C m ) ⊥ . (9.50) By Proposition 9.24, there exists a Φ ∈ L 2 (T, C m ) such that Then, by Proposition 3.14, assertion (9.50) is equivalent to the following assertion 1 2π for every g ∈ H 2 (D, C m ) ⊥ , which in turn, by Proposition 5.1, is equivalent to the assertion The latter statement is equivalent to the assertion for every g ∈ H 2 (D, C m ) ⊥ , which in turn is equivalent to the statement that 1 2π is orthogonal to H 2 (D, C m ) ⊥ , which occurs if and only if Observe that, since β 0 is inner, then by equation (9.42), we get Moreover, since W 0 is unitary-valued, we have I C m −η 0 η T 0 = β 0 β * 0 , and so, . Hence to prove the commutativity of diagram (9.48) it suffices to show that . Note that, by assertions (9.51), β * 1 β * 0 Φ ∈ H 2 (D, C m−2 ), and so β 0 β 1 β * 1 β * 0 Φ ∈ H 2 (D, C m ). Thus diagram (9.48) commutes.
Consider some Q 3 ∈ Ω 1 , so that, according to equation (9.74), (9.75) Observe and so equation (9.75) yields which is equivalent to the following equations By Theorem 4.12 applied to H F 2 , if (x 2 ,ŷ 2 ) is a Schmidt pair for H F 2 corresponding to t 2 = H F 2 , then, for anyQ 2 which is at minimal distance from F 2 , we have By equations (9.76) and (9.77), Recall that, by equations (9.59) and (9.63), Hence, by equation (9.78), we obtain Since, by Theorem 9.25, M β 0 β 1 is unitary, the latter equation yields Moreover, in view of equations (9.76), (9.77) and (9.80), equation (9.79) implies By Theorem 9.25, Mᾱ 0ᾱ1 is unitary, hence the latter equation yields and therefore the assertion has been proved.

Compactness of the operator T j+1
At this point, the reader is able to distinguish the method of proving the compactness of the operators T 1 and T 2 . We would like to apply a similar method to show that the operator T j as given in equation (4.37) is compact. Suppose we have applied steps 0, . . . , j of the superoptimal analytic approximation algorithm from Subsection 4.2 to G, we have constructed Let the spaces X j , Y j be given by and consider the compact operator T j : X j → Y j given by for all x ∈ H 2 (D, C n ). Let (ξ 0∧ ξ 1∧ . . .∧ξ j∧ v j ,η 0∧η1∧ . . .η j∧ w j ) be a Schmidt pair for the operator T j corresponding to t j = T j , let h j ∈ H 2 (D, C) be the scalar outer factor of ξ 0∧ ξ 1∧ . . .∧ξ j∧ v j , let Let, for i = 0, 1, . . . , j − 1, be unitary-valued functions, as described in Lemma 4.11 (see also Proposition 9.29 forṼ 2 andW T 2 ), u i =zh i h i are quasi-continuous unimodular functions, and There exist unitary-valued functionsṼ j ,W j of the form where F j+1 ∈ H ∞ (D, C (m−j−1)×(n−j−1) ) + C(T, C (m−j−1)×(n−j−1) ), u j =zh j h j are quasicontinuous unimodular functions for all i = 0, . . . , j, and Proof. Suppose we have applied steps 0, . . . , j of the algorithm from Subsection 4.2 and we have proved that the following diagram commutes where the maps Mᾱ 0 ···ᾱ j−1 , M β 0 ···β j−1 , (ξ (j−1)∧ ·) : K j → X j and (η (j−1)∧ ·) : L j → Y j are unitaries. Let (ξ (j−1)∧ v j ,η (j−1)∧ w j ) be a Schmidt pair for the compact operator T j . Then x j ∈ K j , y j ∈ L j are such that (x j , y j ) is a Schmidt pair for Γ j corresponding to t j = Γ j , and (x j ,ŷ j ) is a Schmidt pair for H F j corresponding to t j = H F j , wherê We would like to apply Lemma 4.11 to H F j and the Schmidt pair (x j ,ŷ j ) to find unitaryvalued functionsṼ j ,W j such that, for everyQ j ∈ H ∞ (D, C (m−j)×(n−j) ) which is at minimal distance from F j , a factorisation of the form ). For this purpose we find the inner-outer factorisations ofx j andzȳ j . By inductive hypothesis (see Lemma 9.28 for j = 2), we have x j (z) C n−j = |h j (z)| and ŷ j (z) C m−j = |h j (z)| (10.8) almost everywhere on T. Equations (10.8) imply that h j ∈ H 2 (D, C) is the scalar outer factor of bothx j andzȳ j . By Lemma 4.11,x j ,zȳ j admit the inner-outer factorisationŝ whereξ j ∈ H ∞ (D, C n−j ) andη j ∈ H ∞ (D, C m−j ) are vector-valued inner functions. By equations (10.7) and (10.9), we deduce that We would like to show that α T j−1 · · · α T 0 ξ j , β T j−1 · · · β T 0 η j are inner in order to apply Lemma 4.11 and obtainṼ j andW j as required. We havê is analytic. Further, since β i (z) are isometries for all i = 0, . . . , j − 1, β T j−1 (z) · · · β T 0 (z)zȳ j (z) C m−j = β T j−1 (z) · · · β T 0 (z)zw j (z) C m−j = |h j (z)| almost everywhere on T, and therefore β T j−1 (z) · · · β T 0 (z)η j (z) C m = 1 almost everywhere on T, that is, β T j−1 · · · β T 0 η j is inner.
By Lemma 4.11, there exist inner, co-outer, quasi-continuous functions α j , β j of types (n − j) × (n − j − 1), (m − j) × (m − j − 1) respectively such that V j = α T j−1 · · · α T 0 ξ jᾱj ,W T j = β T j−1 · · · β T 0 η jβj are unitary-valued and all minors on the first columns ofṼ j ,W j are in H ∞ . Moreover, every functionQ j ∈ H ∞ (D, C (m−j)×(n−j) ), which is at minimal distance from F j , satisfies for some F j+1 ∈ H ∞ (D, C (m−j−1)×(n−j−1) ) + C(T, C (m−j−1)×(n−j−1) ) and for the quasicontinuous unimodular function u j =zh j h j . By Lemma 4.15, the set where B(t j ) is the closed ball of radius t j in L ∞ (T, C (m−j)×(n−j) ). By inductive hypothesis we have proved that the set of all level j superoptimal error functions E j satisfies The following statement asserts that any function Q j+1 ∈ Ω j necessarily satisfies equations (4.34).
Proof. By the recursive step of the algorithm from Subsection 4.2, every Q j+1 ∈ H ∞ (D, C m×n ) that minimises Hence it suffices to show that Q j+1 satisfies Notice that, by the inductive step, the following diagram commutes where the maps Mᾱ 0 ···ᾱ j−1 , M β 0 ···β j−1 , (ξ (j−1)∧ ·) : K j → X j and (η (j−1)∧ ·) : L j → Y j are unitaries, and F j ∈ H ∞ (D, C (m−j)×(n−j) ) + C(T, C (m−j)×(n−j) ). By equation (10.10), the set of all level j − 1 superoptimal error functions h i are quasi-continuous unimodular functions for all i = 0, . . . , j−1. Consider some Q j+1 ∈ Ω j−1 , so that, according to equation (10.12), 13) whereQ j ∈ H ∞ (D, C (m−j)×(n−j) ) is at minimal distance from F j . Let B j = β 0 · · · β j and let A j = α 0 · · · α j . By equations (10.3), we have which, combined with equation (10.13), yields SinceQ j is at minimal distance from F j , Note that, if (x j ,ŷ j ) is a Schmidt pair for H F j corresponding to t j , then, by Theorem 4.12, In view of equation (10.14), the latter equations imply By the commutativity of diagram (10.11), x j − t j y j = 0, and since, by the inductive hypothesis, M B j−1 is a unitary map, we get and therefore Q j+1 satisfies the required equations. .
Theorem 10.12. Let G ∈ H ∞ (D, C m×n ) + C(T, C m×n ). Let T i , x i , y i , h i , for i ≥ 0, be defined by the algorithm from Subsection 4.2. Let r be the least index j ≥ 0 such that T j = 0. Then r ≤ min(m, n) and the superoptimal approximant AG is given by the formula Proof. First observe that, if T 0 = H G = 0, then this implies G ∈ H ∞ (D, C m×n ), and so AG = G.

Application of the algorithm
Let us now apply the new algorithm to the example Peller and Young solved in [26]. . Then and Hence the (1, 1) entry of t 1 y 1 x * 1 /|h 1 | 2 is equal to The algorithm stops after at most min(m, n) steps, hence in this case after 2 steps. The unique analytic superoptimal approximant AG is given by the formula Calculations yield which is the unique superoptimal analytic approximant of the function G in Problem 9.1.