Consistent Inversion of Noisy Non-Abelian X-Ray Transforms

For $M$ a simple surface, the non-linear and non-convex statistical inverse problem of recovering a matrix field $\Phi: M \to \mathfrak{so}(n)$ from discrete, noisy measurements of the $SO(n)$-valued scattering data $C_\Phi$ of a solution of a matrix ODE is considered ($n\geq 2$). Injectivity of the map $\Phi \mapsto C_\Phi$ was established by [Paternain, Salo, Uhlmann; Geom. Funct. Anal. 2012]. A statistical algorithm for the solution of this inverse problem based on Gaussian process priors is proposed, and it is shown how it can be implemented by infinite-dimensional MCMC methods. It is further shown that as the number $N$ of measurements of point-evaluations of $C_\Phi$ increases, the statistical error in the recovery of $\Phi$ converges to zero in $L^2(M)$-distance at a rate that is algebraic in $1/N$, and approaches $1/\sqrt N$ for smooth matrix fields $\Phi$. The proof relies, among other things, on a new stability estimate for the inverse map $C_\Phi \to \Phi$. Key applications of our results are discussed in the case $n=3$ to polarimetric neutron tomography, see [Desai et al., Nature Sc. Rep. 2018] and [Hilger et al., Nature Comm. 2018].

1. Introduction 1.1. Non-Abelian X-ray transforms. Our object of study is the non-abelian Xray transfom, a mapping from a matrix-valued field Φ defined in the interior of a Riemannian surface with boundary (M, g, ∂M ), to its scattering data C Φ , defined at the influx boundary ∂ + SM of M , given by where T M is the tangent bundle of M , and ν x denotes the outward unit normal at x ∈ ∂M .
We will assume that the surface M is simple in the sense that it is (topologically) a disk, it has no conjugate points, and a strictly convex boundary. Strictly convex domains in the plane (and small perturbations of them) are examples of simple surfaces. In this context, all unit-speed geodesics 1 in M exit M in finite time. This fact allows us to identify ∂ + SM with the space of geodesics on M , by associating to any (x, v) ∈ ∂ + SM the unique geodesic γ passing through (x, v).
Let Φ ∈ C c (M, C n×n ) be a continuous map from M → C n×n that is compactly supported in the interior M int of M . Given a unit-speed geodesic γ : [0, T ] → M with endpoints γ(0), γ(T ) ∈ ∂M , we may define the scattering data of Φ on γ to be C Φ (γ) := U (0), where U : [0, T ] → C n×n satisfies the linear system of ODE'ṡ U + Φ(γ(t))U = 0, U (T ) = id.
This problem, backward in time for convention here, is well-posed and leads to a unique definition of U (0), containing cumulated information about Φ along the geodesic γ. Note that when Φ is scalar, we obtain log U (0) = T 0 Φ(γ(t)) dt, which is the classical X-ray/Radon transform of Φ along the curve γ. Considering the collection of all such data makes up the scattering data (or non-Abelian X-ray transform) of Φ, viewed here as a map C Φ : ∂ + SM → C n×n , and we are concerned with the problem of recovering Φ from C Φ . Inverting Abelian and non-Abelian X-ray transforms are examples of inverse problems in integral geometry, an active field permeating several tomographic imaging methods, see e.g. the recent topical review [20].
While the problem of inverting the non-linear mapping Φ → C Φ remains open in this generality, injectivity results have recently been obtained, either by adding curvature conditions on the manifold, or by fixing a Lie group G (realised as matrices, for simplicity) and its Lie algebra g, in turn asking whether a g-valued field Φ can be recovered from its G-valued scattering data C Φ . In this paper, we will mainly use the Lie groups SO(n) = {U ∈ R n×n , U T U = id}, U (n) = {U ∈ C n×n , U * U = id} and SU (n) = U (n) ∩ {det = 1}, and their Lie algebras so(n) = {A ∈ R n×n , A T + A = 0}, u(n) = {A ∈ C n×n , A * + A = 0} and su(n) = u(n) ∩ {tr = 0}. Above, 'T ', ' * ', 'det' and 'tr' refer to matrix 'transpose', 'conjugate transpose', 'determinant' and 'trace', respectively. Note the inclusions SO(n) ⊂ SU (n) ⊂ U (n). (1.1) The state of the art on this question can be written as follows: Theorem 1.1. Let (M, g) be a simple surface. The map Φ → C Φ is injective in the following cases: (a) G = U (n) [30]; (b) G = GL(n, C) and M has negative curvature [31].
Earlier injectivity results have been obtained by several authors, cf. [12,28,29] and references therein, particularly when (M, g) is a domain in the Euclidean plane.
The absence of concrete reconstruction formulas for the inverse map C Φ → Φ when n ≥ 2, and the challenge of dealing with physical experiments such as those arising in polarimetric neutron tomography (see Section 1.2), where N discrete and noisy measurements D N ∼ P N Φ of C Φ ∈ SO(3) are made (see Section 1.3 for details), motivate the main contribution of this article, which is to present a rigorous statistical algo-rithmΦ(D N ) that allows to recover Φ. The implementation ofΦ(D N ) is detailed in Section 4, and our main theoretical result is the statistical analogue of the injectivity result Theorem 1.1, namely the frequentist consistency of the algorithm in the large sample limit, which somewhat informally can be stated as follows: Theorem 1.2. Suppose the data D N is generated from the probability distribution P N Φ 0 where Φ 0 : M → so(n) is any smooth and compactly supported matrix field Φ 0 . Then we have that, as sample size N → ∞, and in P N Φ 0 -probability, See Theorem 3.2 in Section 3 for a fully rigorous statement of this result, which in fact requires significantly weaker hypotheses on Φ 0 , and also specifies an explicit rate of convergence in the last limit.
The proof of the previous theorem relies on ideas from Bayesian nonparametric statistics [38,14] and on new 'quantitative versions' of the injectivity result in Theorem 1.1 which are of independent interest and stated in Section 2.
1.2. Polarimetric neutron tomography (PNT). The basic problem in PNT consists in finding a magnetic field from spin measurements of neutrons [21,8,9,19]. In this case the explicit relation is is the magnetic field. In the case of PNT one assumes that the underlying surface M is just the disc in the plane (by slicing with 2D discs one can solve the 3D problem).
The details of the experiment of polarimetric neutron tomography may be found, e.g., in [9]. Here we give a description that is suitable for our purposes. The data produced by the experiment is the orthogonal matrix (3), where C Φ (x, v) is the scattering data described above. The significance of this in terms of spin, is a follows: if a neutron travelling along the ray determined by (x, v) enters the magnetic field with a spin s in ∈ S 2 (S 2 denotes the Euclidean unit sphere in R 3 ), it exits the field with spin s out = C −1 Φ (x, v)s in ∈ S 2 (for an ensemble of polarized neutrons in a magnetic field it can be shown that they behave like a particle with a classical magnetic moment). The magnetic field B is defined in 3D space, but the experiment makes measurements on a 2D plane and produces a global reconstruction by slicing. The geometry of the experiment is thus a 2D parallel beam geometry which is easily converted into fan-beam geometry as considered above. The question is then how to manipulate the spin to produce the orthogonal matrix. This is done with an ingenious sequence of spin flippers and rotators placed before and after the magnetic field being measured. The material containing the magnetic field can also be rotated so as to produce parallel beams from different angles. After the spin has been manipulated it goes through an analyser; this device is essentially a spin filter that only lets those neutrons with vertically aligned spin go through. The neutron count is then measured with a detector that produces an intensity reading. The spin of the entering beam is perfectly aligned with the spin of the analyser, so that the intensity measurement is actually a measurement of the angle of rotation of the spin due to the magnetic field. The key relation is given by [21,Equation 1] (1.2) where A is the attenuation of the medium, I 0 is the intensity of the incoming beam and ϕ is the angle by which the spin has rotated. The use of the spin flipper allows the measurement of and from this one deduces that cos ϕ = I − I I + I which then becomes an entry of our matrix C T Φ (x, v). By rotating by π/2 and flipping (rotation by π) one can thus produce the entire orthogonal matrix as data. In other words, if {e 1 , e 2 , e 3 } is the canonical basis of 3-space, cos ϕ gives C T Φ (x, v)e i · e j for all i, j and hence all the entries. In some situations, where the attenuation of the medium is known, the use of spin flippers is not necessary and can be calibrated out. Assuming an additive Gaussian noise in the intensities I, equation (1.2) approximately produces an additive Gaussian noise in the entries of the matrix C Φ which is precisely the noise model we adopt below.
As in the articles [9,10] our approach reconstructs 3D magnetic fields of arbitrary direction and distribution. This provides a method able to investigate samples without imposing any a priori knowledge of the magnetic field orientation, and requires understanding of the full non-linear inverse problem. The recent preprint [10] introduces a modified Newton-Kantorovich type algorithm for the solution of the nonlinear problem, a Newton-type algorithm where the inversion of the Jacobian at each iteration only uses the differential of the map Φ → C Φ at the base point Φ 0 ≡ 0.
As pointed out in [10], the algorithm appears to work well for small enough fields (or large enough velocities of neutrons), but may fail due to "phase wrapping" when the field is large enough. Our approach does not exhibit this problem.
1.3. The statistical observation scheme. Consider a simple surface M as above with influx boundary ∂ + SM , and a matrix valued map Φ : M → g and scattering data Here we take G = SO(n) for some n ≥ 2, with corresponding Lie algebra g = so(n), the set of skew-symmetric matrices. Recall that in the key application to PNT from the previous subsection, M is the flat disk and n = 3. We could take G = SU (n) and g = su(n) just as well, but for sake of conciseness prefer to avoid a complex-valued statistical noise model in what follows.
To describe the statistical observation setting, let λ be the uniform distribution (volume element) on ∂ + SM (see (1.5) below for a precise definition), and consider 'design' random variables These draws represent a randomised choice of the geodesics for which experiments are performed -they have to be 'equally spaced' throughout 'geodesic space' ∂ + SM in a statistical sense. For each resulting measurement of C Φ ((X i , V i )) the statistical observational error arising in the experiment is modelled by independent Gaussian matrix noise. More precisely let random variables that are independent of the (X i , V i )'s, and let E i = (ε i,j,k ) be the random n × n noise matrix which adds a Gaussian noise variable in each matrix entry to C Φ ((X i , V i )). Our observations then consist of the sequence of N random n × n matrices The variables Y i,j,k are all independent, and even i.i.d. for j, k fixed. Conditionally on (X i , V i ) = (x i , v i ) they are multivariate normal random variables with diagonal covariance and (vectorised) mean C φ (x i , v i ) j,k . Note that while C Φ (x, y) takes values in SO(n), the Y i are not in SO(n) (or even U (n)) as we have not constrained E i at all -this is in line with the physical experiments for PNT described in Section 1.2 where statistical errors arise from noisy measurements of each matrix entry of C Φ (x, v). For the theory we will assume that the noise variance σ 2 > 0 is fixed and known -in practice it can be replaced by the estimated sample variance of the Y i,j,k 's. To fix notation: The joint law of the random variables ( for the full data vector. The corresponding expectation operators are obtained by replacing 'P ' by 'E' in the preceding expressions. The dependence on σ 2 will be suppressed in the notation.
1.4. Some geometric background and basic notation. We conclude this section by introducing some more basic notation that will be used throughout.
Our background geometry is a simple surface with boundary (M, g, ∂M ). By 'simple', we mean (i) M is non-trapping (in the sense that every maximal geodesic in M has finite length), (ii) M has no conjugate points and (iii) ∂M is strictly convex (i.e. ∂M has positive definite second fundamental form). We denote by SM the unit tangent bundle of M , namely Its boundary ∂SM := {(x, v) ∈ SM : x ∈ ∂M } can be split into 'influx' and 'outflux' boundary, depending on whether the tangent vector points inside or outside, namely we define, for ν x is the outer unit normal at x ∈ ∂M , The manifolds M , ∂M , SM and ∂ + SM all carry natural volume elements, allowing us to define L 2 spaces below. Specifically, the Riemannian metric g induces an area form dx on M and restricts to a metric on ∂M . The unit sphere bundle SM carries the volume element dΣ 3 = dx dv where dv is the length element in the unit circle S x ⊂ T x M . Finally the boundary ∂SM of SM carries the area form dΣ 2 = ds dv where dv is as above and ds is the arclength (w.r.t. the metric g) along the boundary. Its restriction to ∂ + SM will be denoted by The spaces C n and C n×n will be equipped with the canonical Hermitian inner product ·, · and induced norm | · |. For elements in C n×n , this corresponds to the Frobenius norm |A| 2 F := tr(A * A) = n i,j=1 |A i,j | 2 , which is U (n)-invariant in the sense that for any U ∈ U (n) and A arbitrary, |AU | F = |U A| F = |A| F . Given (N, h) a d-dimensional Riemannian manifold (either M , ∂M , SM , ∂ + SM , or ∂SM as explained above), one may adapt the usual function spaces to C n -or C n×n -valued functions as follows: L 2 (N, C n×n ), L ∞ (N, C n×n ) with norms One may differentiate functions using partial derivatives {∂ y j } d j=1 in coordinate charts, or equivalently, using {T j } d j=1 a global basis of smooth vector fields on N which pairwise commutes (it will be useful to adopt the latter viewpoint in later sections). Given a d-index α = (α 1 , . . . , α d ), one may define |α| = α 1 + · · · + α d and T α = T α 1 1 · · · T α d d . The metric h equips N with a distance function d h (x, y), and for β ≥ 0, we can thus define Hölder spaces C β (N, C n×n ) with norm with the second term removed when β is an integer. We will also use L 2 -based Sobolev spaces H s (N, C n×n ) with norm for s ∈ N, and defined by interpolation otherwise (see, e.g., [1, Ch. 7]). As above, when clear from the context, the domain and/or codomain will be dropped from the notation. In the following sections, spaces of functions with codomain SO(n), SU (n) or their Lie algebras will make use of the same topology of the corresponding spaces of C n×n -valued functions. The c-subscript attached to a space of maps defined on M denotes the linear subspace of those maps that vanish identically outside of a compact subset of the interior M int of M .

Theoretical results for the deterministic inverse problem
When discrete measurements of the forward data C Φ are corrupted by statistical noise, the injectivity result Theorem 1.1 is not useful to reconstruct Φ from the observations, and we will discuss in the next section how to develop statistical methods that consistently solve this statistical inverse problem. The proofs that substantiate these methods are based on quantitative versions of Theorem 1.1 -stability estimates -as well as continuity properties of the forward map, and we describe in this section the analytical results we obtain.
The results to follow hold when the codomain of the matrix fields is the largest of the three compact Lie groups introduced before Theorem 1.1, namely U (n) (with Lie algebra u(n)), see Eq. (1.1).
Theorem 2.1. Let (M, g) be a simple surface. Given two matrix fields Φ and Ψ in

1)
and where the constants C 1 , C 2 only depend on (M, g).
The proof of Theorem 2.1 initially follows the approach for obtaining L 2 → H 1 stability estimates for the geodesic X-ray transform I as presented in [34,Theorem 3.4.3]. Our starting point is the pseudo-linearisation formula is a geodesic X-ray transform with suitable weights, see Lemma 5.5. To prove Theorem 2.1 it suffices to show that To this end, we use the energy identity (Pestov Identity) developed in [30] for matrix weights arising for connections and matrix fields. The presence of the weights produces additional terms in the identity that need to be controlled to obain the estimate above and this is where most of the work lies. The main idea for controlling them comes from [30] where a connection with the right curvature is artificially introduced to control these terms. The connection is later removed by using (scalar) holomorphic integrating factors whose existence is guaranteed by the microlocal properties of the normal operator associated to the geodesic X-ray transform acting on functions. Taming these integrating factors has a cost which is reflected in the constant c(Φ, Ψ) given in (2.1).
The proof of Theorem 3.2 requires also direct estimates in all Sobolev scales. These are less sophisticated in nature than the stability estimate above, and hold under less restrictive assumptions. Recall that (M, g) is said to be non-trapping if there is no geodesic with infinite length (any simple manifold is non-trapping). We show Theorem 2.2. Let (M, g) be a non-trapping surface with strictly convex boundary. For any integer k ≥ 0 and for every Φ, Ψ ∈ C k c (M, u(n)), the following continuity estimates hold: where by we mean that the inequality holds with some constant that only depends on M , g and k.
In fact in the proof of Theorem 3.2 we shall use instead of Theorem 2.1 the following corollary of the previous two results: where C is independent of Φ or Ψ.

Bayesian inversion of non-Abelian X-ray transforms
The main goal of this section is to introduce a method to infer the matrix field Φ ∈ C(M, so(n)) from discrete observations D N of the scattering data C Φ described in Section 1.3. We follow the general paradigm of Bayesian inverse problems advocated by A. Stuart [35,7] which is also related to the paradigm of Bayesian numerical analysis [11,3] in the noiseless case (σ = 0). The idea is to start from a Gaussian process prior Π for the parameter Φ and to use Bayes' theorem to infer the best posterior guess for Φ given data D N .
We will state a theorem that shows that the posterior mean fieldsΦ N = E Π [Φ|D N ] corresponding to a flexible class of Lie-algebra valued Gaussian process priors Π for Φ consistently recover the 'true' Φ 0 in the frequentist large sample limit as N → ∞, when noisy experiments have been performed under P N Φ 0 in the model (1.3). In fact we will provide a stochastic convergence rate to zero of the recovery error that is algebraic in inverse sample size 1/N . Note that probabilistic consistency under P N Φ 0 entails approximate uniformity of the design (X i , V i ) and rules out adversarial designs. Fixed (non-random) design (x i , v i ) that is sufficiently 'equally spaced' could be considered as well in the theory that follows, but the random design case allows for a slightly cleaner, probabilistic treatment of the numerical discretisation error.
The proof of Theorem 3.2 below provides a template to establish rigorous statistical guarantees for the Bayesian approach to other non-linear inverse problems as well. See Section 5.4 for more discussion.
To introduce the Bayesian approach more concisely, consider a prior Π for a vector field (B 1 , . . . , Bn) by prescribing a Borel probability measure on the space ×n j=1 C(M ) wheren = n(n − 1) 2 = dim (so(n)).
The natural isomorphism between ×n j=1 C(M ) and the space C(M, so(n)) of continuous functions from M to so(n) in turn generates a prior Π for Φ by forming a so(n)-valued field from the B i 's. For instance in the case n = 3 so that alson = 3, relevant in PNT, we construct Π from Then we make the Bayesian model assumption that The posterior distribution arises from a dominated family of probability measures (see (5.33) below) and is hence given by for any Borel set A in C(M, so(n)). Here is, up to additive constants, the log-likelihood function of the observations.
While what precedes was not specific to the choice of a particular prior, the main theorem to follow will hold for priors arising from certain so(n)-valued Gaussian processes. These will be constructed from a Gaussian base prior Π from which the coordinates B j of ×n j=1 C(M ) will be drawn independently. In fact we will require draws from Π to have β-Hölder continuous sample paths on M almost surely. We refer, e.g., to [16, Sections 2.1 and 2.6] for the basic definitions of Gaussian measures and processes and their reproducing kernel Hilbert spaces (RKHS). We will assume it is known that Φ 0 is supported in a given compact subset M 0 of the interior M int of M . Consider a random function f ∼ Π and let ζ be a C ∞function that equals one on M 0 and is compactly supported in M int . Define a new random function and denote its law in C(M ) by Π B = Π B,N . Then let B 1 , . . . , Bn be random functions on M drawn as i.i.d. copies from Π B , and let the prior Π = ⊗n j=1 Π B for Φ be the resulting centred Gaussian product probability measure in the space C(M, so(n)) ×n j=1 C(M ) (see (3.1) for n = 3). Shrinking the prior towards the origin in a Ndependent way as in (3.4) is crucial in our proofs, see Remark 3.5 for discussion. The following theorem gives a bound for the convergence rate of the posterior mean ] towards the true field Φ 0 in L 2 (M )-loss, under the law P N Φ 0 of the observations. Note that this mean (expected value) is understood in the usual sense of Bochner integrals and henceΦ takes values in C(M, so(n)) -for fixed data vector Y i , (X i , V i ) and since for C Φ ∈ SO(n) the norms C Φ L ∞ are bounded by a fixed constant, this expected value exists almost surely by (3.2) and a basic application of Fernique's theorem (see [ The proof is given in Section 5.4. We note that in the proof we establish in particular that the random posterior measure Π(·|(Y i , (X i , V i )) N i=1 ) on C(M, so(n)) concentrates with probability approaching one in a N −η -diameter L 2 (M )-ball centred at Φ 0 , see Theorem 5.16.
We conclude this section with several remarks.
In the proof (see (5.55)) we show that any is permitted in the previous theorem.
with support in M 0 , and if we take priors Π which verify Condition 3.1 for large enough α, β and H = H α (M ) (possible by Remark 3.4), then we can make η as close to 1/2 as desired, and it is easy to show that η = 1/2 cannot be improved upon by any algorithm. So at least for smooth Φ 0 the recovery guarantee from Theorem 3.2 is (near-) optimal. In the 'low regularity case' where α is not large, our bound for η may not be optimal. A conjecture for the optimal value for η can be obtained from the much simpler linear and Abelian case (n = 1) corresponding to the classical Radon transform, which is treated in [27, Example 2.5], where the exponent η = α 2α + 3 is attained, which can be shown to be optimal in this special case.
flat' (Euclidean) geometry, relevant in PNT. For arbitrary α > 0 we can then take for Π the restriction to D of a stationary Gaussian process on R 2 with appropriate Matérn covariance function k α (see [14, p.313] and Section 4 below). This gives a Gaussian prior on C(D) with RKHS H equal to the space of restrictions to D of elements of H α (R 2 ). This space is well known to co-incide with H α (D), and the sample paths of this process lie in the separable subspace The preceding construction works for any smooth bounded domain D in R 2 . In particular a simple surface M is diffeo-morphic to a disc and the Sobolev spaces H α (D) and H α (M ) co-incide with equivalent norms -the Matérn prior can thus be used even when M equals D equipped with a different Riemannian metric. Alternatively one can embed M isometrically into a larger closed compact (boundary-less) manifold S and use the orthonormal basis of eigenfunctions {e k } of the Laplace-Beltrami operator on S to generate Gaussian random series f S (x) = k σ k g k e k (x), g k ∼ i.i.d. N (0, 1), x ∈ S, which after restriction to M and for suitable choice of σ k > 0, generate Gaussian priors Π with any prescribed Sobolev space H α (M ) as RKHS.

Remark 3.5. [Rescaled Gaussian
Priors.] While the use of Gaussian process techniques [4,13,22] in the proof of Theorem 3.2 is inspired by previous work in [38,37] and also [15] for 'direct' problems, the inverse setting poses several challenges, particularly in the non-linear case. In our proofs we show how these challenges can be overcome by shrinking common Gaussian process priors towards the origin as in (3.4) -the shrinkage enforces the necessary additional 'a-priori' regularisation of the posterior distribution to permit the use of our stability estimates. While similar re-scaled priors have been shown to work in some 'direct' settings before (they appear as special cases of the rescaled priors studied in [37], see their Theorem 3.2), in our setting they play a crucial role: Without re-scaling the exponential growth in the C 1 -norms of Φ of the constant (2.1) would render our stability estimate useless in the proofs.
Remark 3.6. [Related literature on Bayesian non-linear inverse problems.] There are only very few results about statistical guarantees for the Bayesian approach to non-linear inverse problems currently available. We mention [39,26,25] where nonlinear inverse problems of elliptic and parabolic type are studied. The results therein however only hold for specific 'uniformly bounded wavelet' type priors -while these are useful to develop a first theoretical understanding of Bayesian inversion algorithms, they posit very strong a priori assumptions on the parameter of interest and the efficient computability of the resulting posterior distribution is also unclear.
The recent reference [27] obtains convergence rate results for optimisation based MAP-estimates (see Section 4.2 for a brief discussion of those) in a general class of nonlinear inverse problems. For non-convex forward maps as the ones relevant here, these MAP-estimates cannot reliably be computed, and at any rate they behave possibly quite differently than the posterior mean, which is a Bochner integral with respect to an infinite-dimensional and possibly highly non-Gaussian posterior distribution that cannot be studied by ideas from variational analysis or optimisation. In the proof of Theorem 3.2 we develop new techniques that allow to prove convergence rates for such algorithms -see Section 5.4 for more details.
Remark 3.7. [Towards Uncertainty Quantification.] Theorem 3.2 also serves as a starting point to prove more refined Bernstein-von Mises theorems that entail that the posterior distribution is approximated in a suitable infinite-dimensional space by a canonical Gaussian measure. For a non-linear elliptic inverse problem a first result of this kind was recently proved in [25], and for the linearisation of the non-linear problem considered here, such results were obtained in [24]. In principle, joining the ideas of [25,24] with the techniques of the present paper, one can conjecture that Bernstein-von Mises theorems should also hold true for the case of non-Abelian X-ray transforms, but this question will be pursued elsewhere.

Implementation of the algorithm
In this section, we present some numerical reconstructions of an su(2)-valued matrix field Φ from its noisy scattering data C Φ ∈ SU (2). In this case, Φ is generated by three real-valued components where we have defined for basis of su(2) The approach presented easily adapts to any so(n)-, su(n)or u(n)-valued field (including the so(3)valued case of polarimetric neutron tomography, a close cousin of the present case), with some minor Lie group specific modifications to be made for an accurate computation of forward data.  The metric is isotropic, written as g = e 2λ(x,y) id, with scalar function λ given by Such an example can be seen to be non-trapping, have no conjugate points and a strictly convex boundary, i.e. (M, g) is simple. The case of Euclidean geometry would correspond to λ ≡ 0. Geodesic (data) space, modelled as ∂ + SM is parameterised in fan-beam coordinates (β, α) ∈ (0, 2π) × (−π/2, π/2) (with uniform probability measure dλ = dβ dα/(2π 2 )). Below we will draw N geodesics uniformly at random, characterised by N initial conditions (α i , β i ) ∈ ∂ + SM , 1 ≤ i ≤ N , and our statistical algorithm will require numerical evaluation of the forward data C Φ (α i , β i ) which we now describe: Out of each data point (α i , β i ), we first compute a geodesic using a forward scheme with stepsize h to solve a discretisation of the systeṁ with initial condition x(0) = cos β i , y(0) = sin β i and θ(0) = β i + π + α i , until the geodesic exits the domain. This produces a discretised geodesic Once such a geodesic is computed, we must then discretise the matrix ODĖ (The problem here is forward in time unlike that given in the introduction, though since Φ is u(n)-valued, this amounts to computing the conjugate transpose of C Φ , which leads to the same problem.) To discretise the above ODE, we denote U (i,j) := U (γ i (t j ),γ i (t j )) and implement the scheme ). In fact the code implements a predictor-corrector variant of this scheme for improved accuracy on the computation of the exponentials.
The use of matrix exponentials in (4.1) (compared to standard forward-marching schemes) ensures that the matrix solution U numerically remains in SU (2), and the computation of these exponentials can be done via an explicit formula, namely: for A = a σ 1 + b σ 2 + c σ 3 and denoting |a| : (Note that the formula above would need to be adapted if a Lie algebra g different from su (2) is of interest.) The evaluation of Φ (i,j−1) is done by barycentric combination of the values of Φ at the three vertices of the triangle containing (x i (t j−1 ), y i (t j−1 )). After implementing (4.1), the scattering data C Φ (γ i ) is nothing but U (i,J i ) (in fact, the other values U (i,j) for j < J i are not kept in memory after computation). The magnetic field Φ we will use in the experiments below as well as its noiseless scattering data C Φ are visualised Fig. 2.
As we will use Monte-Carlo Markov Chains (MCMC) in the following section, let us mention that once the mesh is fixed, some computations are done prior to the MCMC, namely, all geodesics as well as the triangle indices and barycentric weights along them.

4.2.
Statistical estimation through MCMC. Given data as in (1.3), a common approach to inverse problems would be to compute a Tikhonov regulariser which minimises a penalised least squares fit functional (with, e.g., Sobolev-norm penalty) over the space of all matrix fields Φ : M → g where g is the Lie algebra describing the constraint on the co-domain of Φ. The map Q N is not convex, and efficient computation of the global minimiser will generally not be possible.
The optimiser of the functional (4.2) can be shown to correspond to a posterior mode, or 'maximum a posteriori estimate (MAP)', of a Gaussian process prior Π on C(M, g) with RKHS equal to H α (see [6] for a general result of this kind). Instead , which in our non-linear setting is different from the MAP estimate.
For Gaussian priors, MCMC algorithms such as the preconditioned Crank-Nicholson (pCN) method (see [5]) are available to sample from the posterior distribution. To introduce the algorithm, note that as in (3.3), the log-likelihood function given the data (Y i , (X i , V i )) N i=1 equals, up to additive constants, One then approximates the posterior mean Ns Ns n=0 Φ n of a Markov chain (Φ n ) of length N s as follows: Let Π be a Gaussian prior for Φ; initialise Φ n = 0 for n = 0, then repeat: (1) Draw Ψ ∼ Π and for δ > 0 define the proposal p Φn : The algorithm is terminated at n = N s and requires evaluation of (Φ n ) and thus of the scattering data C Φn (X i , V i ) for every Φ n and (X i , V i ). For g = su(2) relevant in the simulations that follow, this can be done as described in Section 4.1.
The invariant measure of the Markov chain {Φ n } equals the posterior distribution Π(·|D N ), and under certain conditions that are compatible with our setting, [18] derived dimension-free spectral gaps which imply that the distribution of Φ n mixes rapidly towards Π(·|D N ). The approximation of E Π [Φ|D N ] by Φ = 1 Ns Ns n=0 Φ n can thus be expected to compare to the one of the standard central limit theorem, with corresponding non-asymptotic error guarantees, see Section 4 in [18]. (2) as in Section 4.1 and for each B i choose an independent Matérn prior (cf. Remark 3.4) with parameters (ν, ), which on functions on the mesh (i.e., vectors in R Nv ) uses the with K ν the modified Bessel function of the second kind. The constant ν controls the Sobolev regularity while controls the characteristic lengthscale of the samples. We draw N geodesics at random according to the uniform law for (α, β) (some samples on ∂ + SM of size N = 200, 400, 800 are visualised Fig. 4), and then generate synthetic data as explained in Section 4.1 for the magnetic field Φ 0 displayed in Fig. 2, adding Gaussian noise N (0, σ 2 ) to each matrix entry of C Φ 0 . We then implement the pCN algorithm to approximately compute the posterior The stepsize δ is adjusted so that after 'burn-in', the acceptance rate of proposals stabilises around 25%. Once the chain is computed we visualise Φ = 1 Ns Ns n=0 Φ n -examples of outcomes corresponding to increasing data set are given in Fig. 5, illustrating the improvement in 'reconstructions' as the number N of measurement points increases.

Proofs
5.1. Geometric preliminaries. Let (M, g) be a compact oriented two dimensional Riemannian manifold with smooth boundary ∂M . As before SM will denote the unit circle bundle which is a compact 3-manifold with boundary given by We let X be the geodesic vector field, i.e. the infinitesimal generator of the geodesic flow of M . Since M is assumed oriented there is a circle action on the fibers of SM with infinitesimal generator V called the vertical vector field. It is possible to complete the pair X, V to a global frame of T (SM ) by considering the vector field X ⊥ := [X, V ]. There are two additional structure equations given by X = [V, X ⊥ ] and [X, X ⊥ ] = −κV where κ is the Gaussian curvature of the surface. Using this frame we can define a Riemannian metric on SM by declaring {X, X ⊥ , V } to be an orthonormal basis and the volume form of this metric will be denoted by dΣ 3 . The fact that {X, X ⊥ , V } are orthonormal together with the commutator formulas implies that the Lie derivative of dΣ 3 along the three vector fields vanishes.
Given functions u, v : SM → C n we consider the inner product for (x, v) ∈ ∂SM , the following formula (known as Santaló's formula) holds for any f ∈ L 1 (SM ): where ϕ t is the geodesic flow.
We now discuss the manifold ∂ + SM and its geometry. One may define a natural frame on ∂ + SM , given by V := V | ∂ + SM and T := (µ ⊥ X + µX ⊥ )| ∂ + SM (horizontal differentiation along the tangent direction) where µ ⊥ := V µ. It is easily seen that [V, T ] = 0 and that these two vector fields are orthonormal for the metric on ∂SM induced by the metric defined on SM . In particular (T, V ) is an orthonormal frame for ∂ + SM and we may define H s (∂ + SM ; ·) with respect to that frame. We now prove a useful lemma that will simplify later calculations. Proof of Lemma 5.1. For (x, v) ∈ ∂ + SM \S∂M and t ∈ (0, τ (x, v)), we define two vector fields on SM int Since the map (x, v, t) → ϕ t (x, v) is smooth and injective for (x, v) ∈ ∂ + SM \S∂M and t ∈ (0, τ (x, v)), this defines global, smooth sections of T (SM int ), and so that X, P T , P V pairwise commute. Via direct computation of the differential of the flow (see e.g. [23,Sec. 4.2]), one may obtain the following expressions on SM int where a, b : SM → R satisfy Xa Xb ] | ∂ + SM = id, and where for h : ∂ + SM → C, one defines h ψ : SM → C though the relation One further notices that the definition of P V , P T extends by continuity to ∂(SM ), with the appropriate restrictions claimed in the statement of the lemma.
We start with the following basic estimates.
Lemma 5.2 (Work-horse lemma). Let (M, g) be a non-trapping surface with strictly convex boundary and Φ ∈ C(M, u(n)). Suppose F ∈ C(SM, C n×n ) and consider the unique continuous solution G : SM → C n×n to XG + ΦG = F on SM with G| ∂ − SM = 0. Then there exists a constant C 1 (M, g) such that The constant C 1 can be chosen as C 1 = max(τ ∞ , √ C 0 ), with τ ∞ the diameter of M and C 0 the constant given in (5.3).
Proof. It is easy to check that where U Φ is the unique solution U to XU +ΦU = 0 on SM with U | ∂ + SM = id. Taking Frobenius norm, using U (n)-invariance and the fact that that U Φ is unitary, we get Upon bounding the right-hand side crudely by τ ∞ F L ∞ , this immediately implies (5.4). On to the L 2 estimates, applying Cauchy-Schwarz yields for all (x, v) ∈ SM where γ x,v is the maximal geodesic passing through (x, v). Now fix (x, v) ∈ ∂ + SM and integrate the inequality above along the geodesic flow ϕ t (x, v) to arrive at Multiplying both sides by µ, integrating w.r.t. dΣ 2 and using Santaló's formula yields (5.5).
We now prove the main result on forward estimates, Theorem 2.2. We shall follow the model proof of [33,Theorem 4.2.1] which shows that the standard X-ray transform I maps H s to H s .
Proof of Theorem 2.2. As a preliminary identity, given Φ and Ψ two skew hermitian matrix fields, consider the two U (n)-valued solutions U Φ , U Ψ such that XU Φ +ΦU Φ = 0 with boundary condition U Φ | ∂ − SM = id. It is immediate to find that the relation Similarly, combining the observation with (5.6) yields (2.2), and we can also obtain, using (5.5), To prove the C 1 continuity estimate, consider the function W : The following identity is immediate: In addition, since Φ and Ψ are compactly supported in M int , the functions U Φ , U Ψ equal the identity matrix in a neighbourhood of ∂ − SM and in particular, W | ∂ − SM = 0.
Using estimates (5.4)-(5.5)-(5.6) and U (n)-invariance of Frobenius norms gives: Combining this fact with (5.8), we arrive at and a similar bound for P V (U Φ − U Ψ ) L 2 . Obtaining a similar estimate for T (C Φ − C Ψ ), we arrive at Similar arguments using sup norms everywhere yield To proceed to higher-order derivatives, if P α = P α 1 V P α 2 T is a derivative of order |α|, where the right-hand-side involves derivatives of Φ of order at most |α|, and derivatives of U Φ −U Ψ of order at most |α|−1. Combining the estimates of Lemma 5.2 and an induction on k (whose formulation also involves control on P α 1 V P α 2 T (U Φ − U Ψ ) L 2 (SM ) for all α 1 + α 2 ≤ k, and where the commuting frame {X, P V , P T } avoids the proliferation of terms due to non-trivial commutators) proves the Proposition for higher-order derivatives.

5.3.1.
Setting, main results and proofs of Theorem 2.1 and Corollary 2.3. Before considering the non-linear inverse problem, we must establish a stability estimate for a linear inverse problem, that of reconstructing a function f ∈ C ∞ c (M, C n ) from its attenuated X-ray transform, where the attenuation is matrix-valued. Namely, given Φ a smooth skew-hermitian matrix field with compact support on M , we define I Φ f := u| ∂ + SM , where u : SM → C n is the unique solution to the problem Note that in this context, the function u is smooth on SM .
The injectivity of such a transform was proved in [30], and we now provide a stability estimate for it. Theorem 5.3. Let (M, g) be a simple Riemannian surface with boundary and Φ a smooth, skew-hermitian matrix field with compact support in M int . Then for any f ∈ C ∞ c (M ), we have the following stability estimate Remark 5.4 (Dependence of C 1 , C 2 ). The constants C 1 , C 2 only depend on the geometry of (M, g). The constant C 1 blows up like (β − 1) −1 , where β is the terminator constant of (M, g). This is one of the ways that this stability estimate ceases to hold as one approaches non-simplicity. The main other quantity appearing in C 1 , C 2 is w ∞ , the sup norm of the integrating factor defined below. The behavior of such a quantity, while finite on any simple surface, remains to be better understood.
On to the non-linear stability estimate, injectivity of the operator Φ → C Φ restricted to u(n)-valued fields was initially proved in [30], and Theorem 2.1 upgrades this result with a stability estimate. While the remaining sections will focus on the proof of Theorem 5.3, we now explain how this result implies Theorem 2.1. The main additional ingredient needed is a pseudolinearization identity, relating scattering data to attenuated X-ray transforms: Lemma 5.5 (Pseudo-linearization). Let (M, g) be a non-trapping surface with strictly convex boundary. For any Φ, Ψ ∈ C(M, C n×n ), the following relation holds where I Θ(Φ,Ψ) : L 2 (M, C n×n ) → L 2 (∂ + SM, C n×n ) is an attenuated X-ray transform with matrix field Θ(Φ, Ψ), an endomorphism of C n×n with pointwise action and thus by the definition of the attenuated X-ray transform, W | ∂ + SM = I Θ(Φ,Ψ) (Φ − Ψ). Since we also have by construction W | ∂ + SM = C Φ C −1 Ψ − id, identity (5.10) follows.
Proof of Theorem 2.1. Appealing to the pseudo-linearization (5.10), one may notice that if Φ, Ψ are skew-hermitian, then the field Θ(Φ, Ψ) is skew-hermitian when viewed as an endomorphism of C n×n . Moreover, since the entries of Θ(Φ, Ψ) are linear in the entries of Φ and Ψ, we directly have that with C a universal constant. Then relation (5.10), together with Theorem 5.3 immediately implies . This shows Theorem 2.1 when Φ, Ψ ∈ C ∞ c (M int , u(n)). Since all quantities involved above do not depend on derivatives of Φ, Ψ of order higher than 1, and C 1 , C 2 are independent of Φ, Ψ, approximating Φ, Ψ ∈ C 1 c (M int , u(n)) by sequences in C ∞ c (M int , u(n)) (and using Theorem 2.2) will yield the same stability estimate for C 1 matrix fields.
We also cover the proof of Corollary 2.3, based on the previous result and the forward estimate Theorem 2.2.

Proof of Corollary 2.3. It is enough to show that
To show this, we write at the pointwise level: To control first derivatives, take P = V or T , we have using triangle inequality and submultiplicativity. Squaring, taking the sup norm of |P C Ψ | F and integrating on ∂ + SM , we obtain . Combining the estimates for P = V and P = T we arrive at . Now using the forward estimate (2.3) with k = 1 and Φ ≡ 0 (thus C Φ = id), we deduce that H 1 , and taking squareroots yields (5.11) (using that √ 1 + x 2 /(1 + x) is uniformly bounded for x ∈ [0, ∞)).

5.3.2.
Proof of Theorem 5.3 -Main outline. As in [30], the main method of proof involves an energy identity (or Pestov identity), based on integrations by parts on SM . To do this, let us recall that with the inner product (u, v) defined in (5.1), and upon also denoting (u, v) ∂SM := ∂SM uvdΣ 2 , the following integrations by parts formulas holds for u, v ∈ C ∞ (SM, C n ): We will also use extensively the harmonic decomposition on the fibers of SM . Namely, the space L 2 (SM, C n ) decomposes orthogonally as a direct sum where H k is the eigenspace of −iV corresponding to the eigenvalue k. A function u ∈ L 2 (SM, C n ) has a Fourier series expansion Of special interest are the operators η ± := 1 2 (X ± iX ⊥ ), (5.13) with the property that η ± (Ω k ) ⊂ Ω k±1 for all k ∈ Z. For more details on the operators η ± and the Fourier expansion we refer to [17] where these tools were first introduced.
Definition 5.6. A function u : SM → C n is said to be holomorphic if u k = 0 for all k < 0. Similarly, u is said to be antiholomorphic if u k = 0 for all k > 0.
To control the terms involving the matrix field, one must introduce an artificial connection as we will see below. This first requires that we derive a Pestov identity for X-ray transforms with connection A and matrix 2 field Φ. Namely, given a skew hermitian pair (A, Φ) on the bundle M × C n and f ∈ C ∞ (M, C n ), we define I A,Φ f = u| ∂ + SM , where u is the unique solution to the problem While previous Pestov identities have been derived in [30], the present one accounts for nonzero boundary terms, and in particular reflects more precisely how the stability constant degrades as (M, g) approaches non-simplicity. This is captured by the concept of terminator constant β Ter : given a simple surface (M, g), there exists a number β Ter > 1 such that for any β ∈ (1, β Ter ], there exists a smooth function r = r β : SM → R, solution to the Ricatti type equation Xr + r 2 + βκ = 0.
Theorem 5.7. Let (M, g) a simple surface with boundary, with terminator constant β Ter > 1, and (A, Φ) a skew-hermitian pair on the bundle M × C n . Then for any u ∈ C ∞ (SM, C n ) and β ∈ (1, β Ter ], the following identity holds: (5.14) In the identity above, (5.15) and r is a smooth function on SM which only depends on the surface. The quantity F A is the curvature of the connection A, which upon a judicious choice of connection, can have a controlled sign. To achieve this, consider the scalar Hermitian connection a := iϕid, where ϕ is a smooth 1-form such that dϕ = ω g (the area form of the metric g). We choose a specific ϕ of the form ϕ = dh for h a real-valued function satisfying d dh = 1 with Neumann condition dh(ν) = 0 at the boundary. The latter condition implies that ∇ T,sa u = T u for any real s. Then we have with η ± defined in (5.13), and i F a = −1.
By [30], we can construct a holomorphic scalar function w ∈ C ∞ (SM ) satisfying Xw = −iX ⊥ h. Without loss of generality, w can be chosen even. The condition on w 0 reads η − (w 0 − h) = 0, for which it is sufficient to use w 0 = h. With this choice of a and s ∈ R, in what follows, we will denote G s := X + sa + Φ and G = G 0 . With w as above, we have G s u = e sw G(e −sw u). Moreover, w (the complex-conjugate of w) is antiholomorphic and solves Xw = +iX ⊥ h, so also G s u = e −sw G(e sw u).
Lastly, we will denote Π ± the projection onto positive and negative harmonics. Namely, Π ± u = ±k>0 u k . We have the following commutators formulas, for any u ∈ C ∞ (SM ): The following lemma will help us controlling u by versions of u which are conjugated by special integrating factors.
Lemma 5.8. With the holomorphic function e sw and antiholomorphic function e −s w and any s, s ∈ R, we have in particular we get the equality u = u 0 + Π − (e −sw Π − (e sw u))) + Π + (e s w Π + (e −s w u)).

(5.16)
Proof. We only prove Π − u = Π − (e −sw Π − (e sw u))), and the rest is similar. It is enough to notice that for any holomorphic function f , we have Π − (f u) = Π − (f Π − u), as this amounts to saying that the negative harmonics of f u do not depend on the nonnegative harmonics of u. This is immediate since Then we compute immediately Π − (e −sw Π − (e sw u))) = Π − Π − (e −sw e sw u) = Π − u, hence the result.

Outline of proof of Theorem 5.3
The initial transport equation, projected onto the harmonic term of degree 0, reads so that, in particular, The crux is then to find how to bound the quantities on the right by the boundary values of u. Using a Pestov identity with a special connection sa defined as above (and its holomorphic integrating factor e sw ), we show how to control the first term using control over Π − (e sw u)) for s > 0. Similar work can be done, to control the second term using control over Π + (e −s w u) for s < 0.
We first derive in Sec. 5.3.3 the identity: Since (η + − sa 1 )(e −sw ) only has strictly positive harmonic terms, the first term in the right-hand side of (5.18) only depends on Π − (e sw u). Upon defining v s := Π − (e sw u), the identity (5.18) reads Denoting w ∞ = sup SM |w|, we straightforwardly obtain the estimate 20) and control on η + u −1 + 1 2 Φu 0 2 will be obtained after controlling each term in the last right hand side. We first control (e s(w−w 0 ) u) 0

(5.21)
We then control v s 2 and G s V v s 2 by boundary terms via Pestov identity and setting up an appropriate threshold on s. To do this, we consider the transport problem for v s , written as: We then use the Pestov identity (5.14) for v s , with F sa = isid and d sa Φ = X ⊥ Φ: Before choosing s appropriately, we need additional work (tedious as in [30]) on the term (Φv s , G s v s ). Taking into account boundary terms, and upon defining B ±1 := η ± Φ, we prove in Sec. 5.3.3 that with e x (v) defined in (5.30). The last term in the sum will move to the right-hand side of (5.22), while the other two need to be controlled with a large s. To achieve this, we prove in Sec. 5.3.3 the following: Lemma 5.9. There exists a universal constant C > 0 such that for all s ≥ C|Φ| C 1 , In particular, for s = C|Φ| C 1 + 1, identity (5.22) becomes In fact, we can throw out the first and third terms of the left-hand side and obtain The second term in the left-hand side controls v s L 2 directly, and we can write with C some constant independent of Φ.

Remaining ingredients. Pestov identity with boundary term for ray transforms with skew-hermitian pairs
Let A and Φ a skew-hermitian pair, and define We have the following structure equations In what follows, we will need to integrate by parts with boundary terms, and using (5.12), we obtain for G: Proof of Theorem 5.7. We first write a differential identity using the structure equations (5.26): where GΦf := G(Φf ). We record this here as Now, considering u smooth and supported up the boundary, we write We now arrange the four boundary terms using integration by parts in V and the formulas First notice that We then obtain We now simplify, using that V (A)(x, v) = A(x, v ⊥ ) and µ ⊥ X + µX ⊥ = T , The boundary terms then simplify into With this notation, the full Pestov identity takes the form To recover [30,Eq. (8)], we take the real part of the equality above, and notice that ( Since the last term is purely imaginary, the real parts of the other terms agree, and upon taking the real part of (5.28), we obtain (5.29) (Note that the second boundary term is purely real so the is just ornamental) We finally explain how the index form term GV u 2 −(V u, κV u) can be rewritten as the sum of a non-negative term and a boundary term. With β Ter as in the statement, and the function r = r β : SM → R solving Xr + r 2 + βκ = 0, we now compute, for any ψ ∈ C ∞ (SM, C n ) We arrive at and we may rearrange this as β( Gψ 2 − (κψ, ψ)) = Gψ − rψ 2 + (β − 1) Gψ 2 + (µ rψ, ψ) ∂SM .

Remaining estimates and lemmata
Proof of equality (5.18). We write, using Lemma 5.8 To rewrite the last term, from the equation G s (e sw u) = −e sw f , note the relation Then we have, for k > 0, where we used the transport equation in the last line. For k = 0, (G s V Π − (e sw u)) 0 = −i(η + + sa 1 )(e sw u) −1 .
Plugging this back into the equation for η + u −1 , we get We now write Using the last two computations, we arrive at (5.18).
Proof of estimate (5.21). The transport equation for e sw u projected onto the harmonic term of degree −1 reads: For our choice of connection, a −1 = −η − w 0 so the left side can be rewritten as (η − + sa −1 )(e sw u) 0 = e sw 0 η − (e −sw 0 (e sw u) 0 ) = e sw 0 η − (e s(w−w 0 ) u) 0 , hence we obtain We then rewrite the latter right-hand side in terms of G s V v s . Notice that and thus Upon deriving an estimate of the form and (5.21) follows.
Proof of (5.23). We first need to write an integration by parts for µ ± defined in (5.13). Using integrations by parts (5.12) we first derive an integration by parts for X ⊥ = XV − V X: for any u, w smooth on SM , (X ⊥ u, w) + (u, X ⊥ w) = (XV u, w) − (V Xu, w) + (u, XV w) − (u, V Xw) = (XV u, w) + (V u, Xw) + (Xu, w) + (u, XV w) We now compute, using that µ * Similarly, for the skew-hermitian connection considered, Now, using the fact that we now prove by induction the following claim: (5.32) The case n = 1 is proved above, and the induction step (n =⇒ n + 1) follows from the calculation Putting this equality back into (5.32) proves the induction. Now since v s ∈ H 1 (SM ), we have that lim n→∞ p n = 0, and thus (5.23) follows.
Proof of Lemma 5.9. The term that ultimately controls everything is The infinite sum in (5.24) can then be controlled by with C 1 a universal constant. As for the last term of the left-hand side of (5.24), we write where C 2 is a universal constant. Lemma 5.9 follows upon taking C = C 1 + C 2 .
The overall strategy we pursue here, which has also been used in some form in [39,26,25,27], is to show first that the Bayesian algorithm recovers the 'regression function' C Φ consistently in a natural statistical distance function, and to combine this with quantitative stability estimates for the inverse map C Φ → Φ in appropriate metrics. This exploits crucially that the estimated Bayesian regression outputs lie in the (non-linearly constrained) range of the forward map C Φ , so that the stability estimate applies to them. To make this approach work with 'unbounded' Gaussian priors is challenging, and our proofs proceed as follows: We first establish the posterior contraction Theorem 5.10 under general conditions, borrowing from Bayesian nonparametric theory (e.g., [14,Theorem 8.19] or [16,Theorem 7.3.3]), slightly strengthening the usual statement of such theorems to give explicit exponential bounds for the convergence rate to zero of certain posterior probabilities. When the regression functions C Φ are uniformly bounded, the usual Hellinger distance occurring in such contraction theorems is Lipschitz-equivalent to the standard L 2 -distance, and a version for SO(n)-valued matrix fields of such a result is given in Lemma 5.11. Then Lemma 5.13 uses results of [22] to show that the key small ball condition in Theorem 5.10 can be verified for the Gaussian priors from Condition 3.1 even after they have been shrunk towards zero, if the true matrix field Φ 0 belongs to the RKHS H. Next, Lemma 5.14 exploits fine properties [4,13] of infinite-dimensional Gaussian measures to show that such 'shrunk' priors charge 'sufficiently regular' matrix fields (effectively C β -balls) with probability close enough to one that the posterior distributions inherits these regularity properties. This is crucial to apply the 'forward' estimate Theorem 2.2 and the 'stability' estimate (2.4) in the proof of Theorem 5.16 -effectively the specific structure of our inverse problem enters only in this theorem and only through these two estimates. Finally, the exponential convergence to zero of the order e −(C+3)N δ 2 N obtained in Theorem 5.16 permits a 'quantitative uniform integrability argument' in Section 5.4.5 to deduce convergence of the whole posterior (Bochner-) mean towards the true matrix field Φ 0 .

5.4.1.
A general contraction theorem. Consider a collection P of probability density functions on some measurable space (X , A) with respect to a dominating measure µ, specifically in our measurement model (1.3) we take where X is equipped with its natural product Borel-σ algebra A, where dµ = dy × dλ with dy equal to Lebesgue measure on R n×n and λ given in (1.5). By the Gaussianity of the ε 1,j,k 's these probability densities are (5.33) Since the map (Φ, y, (x, v)) → p Φ (y, (x, v)) is jointly Borel-measurable from C(M ) × X to R (using (2.3) and that point-evaluation is · ∞ -continuous), the posterior distribution exists by standard arguments ( [14], p.7) and has form (3.2). In the proof of the following theorem we show in particular that the marginal density is positive on events of P N Φ 0 -probability approaching one, so that (3.2) is well-defined also in the frequentist setting where D N ∼ P N Φ 0 . We also define the Hellinger distance h on such densities by Denote by N (F, h, δ) the minimal number of Hellinger-balls of radius δ required to cover a set F of µ-densities on X . We then have the following Theorem 5.10. Consider a prior for Φ arising from a sequence Π = Π N of Borel probability measures on F ⊆ C(M, so(n)) and let Π( Suppose for some constant C > 0 the prior Π satisfies for all N large enough and that for some sequence F N ⊂ F of approximating sets for which we have the complexity bound The proof proceeds as in the proof of [16,Theorems 7 Moreover using (5.38) and [16,Theorem 7.1.4] with choices ε 0 = m δ N , any m < m, and log N (ε) = cN δ 2 N constant in ε > ε 0 , we deduce that for every k > 1 there exists m , m large enough such that we can find 'tests' (random indicator functions) Ψ N = Ψ N (D N ) for which for the event whose posterior probability we want to bound. Then by (3.2) and as N → ∞, By Markov's inequality, decomposinḡ , and using Fubini's theorem as well as we further bound the last probability as where we have used (5.37) and (5.41) with k and then m large enough.
The 'information-theoretic distance' h arises naturally in such posterior contraction theorems, see [14]. The following lemma, which adapts a result due to Birgé [2] to the setting of SO(n)-valued functions, shows that the Hellinger distance is Lipschitz equivalent to the standard L 2 -metric Lemma 5.11. For Φ ∈ C(M, so(n)), let C Φ : ∂ + SM → SO(n) be its non-Abelian X-ray transform. Then there exist positive constants c 0 = c 0 (n), c 1 = c 1 (n) such that Proof. Write for the 'Hellinger affinity'. By (5.33) and using the standard formula for the moment generating function of N (0, 1)-variables, the quantity ρ(p Φ , p Ψ ) equals By Jensen's inequality the last integral is greater than or equal to exp{− C Φ − C Ψ 2 L 2 /8} and using standard inequalities for 1 − e −z , z > 0, the right hand side of (5.43) follows. Next we notice that since C Φ (x, v), C Ψ (x, v) ∈ SO(n), their matrix entries are all bounded by one and we hence have We can thus proceed exactly as in the proof of [2, Proposition 1] to also deduce the left hand side inequality in (5.43).

5.4.2.
Verification of the prior mass condition. We now verify condition (5.36) in the last theorem for an explicit constant C > 0 and the Gaussian prior from Theorem 3.2. To do this we first show that one can reduce to checking small ball conditions for · L 2 (M ) -norms on the level of the original matrix parameter Φ.
Therefore, since E 1 ε ε 1,j,k = 0 and λ is the unit volume measure on ∂ + SM , where we have used the forward estimate (2.2). Thus if κ ≥ 2/C 2 1 the first inequality defining B N is verified for Φ ∈ B N (κ). To verify the second, note that all C Φ (x, v) ∈ SO(n) are bounded in · L ∞ (∂ + SM ) -norm by some fixed constant B = B(n). Thus for some constant c(n) > 0, where we have used that ε j,k ∼ i.i.d. N (0, 1) implies and again (2.2), so that the overall result follows from appropriate choice of κ > 0 We now turn to lower bound the small ball probabilities Π(B N (κ)) for the prior Π featuring in Theorem 3.2 where for the given α > 2 we will choose (5.45) δ N = N −α/(2α+2) so that Note that √ N δ N precisely equals the rescaling of the prior in (3.4). Let us recall the base RKHS H from Condition 3.1.
Lemma 5.13. Let Π = ⊗n j=1 Π B be the prior for Φ from Theorem 3.2, assume Φ 0 ∈ H is supported in M 0 and choose δ N as in (5.45). Let B N (κ) be as in Lemma 5.12. Then for every κ > 0 there exists a constant C = C (κ, α, Φ 0 H , n) such that for every N ∈ N, Π(B N (κ)) ≥ exp{−C N δ 2 N }. In particular, for B N as in (5.35) in Theorem 5.10, there exists a finite constant such that for every N ∈ N, where we have used that Φ 0 and hence B 0,j is supported in M 0 where ζ = 1 so that Indeed, since the simple surface M is diffeo-morphic to the unit disk, we can extend elements of H α M (M ) by zero to the torus (0, 1] 2 to define elements of H α ((0, 1] 2 ) with Sobolev-norm increased by at most a fixed multiplicative constant. The required bound is then given in [16, (4.184)].
To proceed, we can now use (5.45) and [22,Theorem 1.2] (with the value of α there equal to our 2/α) to lower bound the last small ball probability by (5.48) for constants c = c(α), c 0 = c 0 (κ, n, α). Combining what precedes proves the first inequality of the lemma with The second inequality (5.46) follows from the first and Lemma 5.12.
We note that the proof in fact shows that the constant C depends only on upper bounds for Φ 0 ∞ , Φ 0 H .

5.4.3.
Excess mass and complexity condition. Having determined the constant C in (5.36) for the Gaussian prior in Theorem 3.2, we now turn to verifying the remaining conditions (5.37) and (5.38) in Theorem 5.10 for a suitable choice of F N that will provide sufficient regularity of the posterior distribution to combine it with our stability estimates for the map Φ → C Φ .
Lemma 5.14. Let Π be the prior from Theorem 3.2 with α > β > 2 and let M be a compact subset of the interior of M such that ξ = 0 outside of M . Let δ N be as in (5.45) and assume N δ 2 N ≥ 1. For m > 0 define subsets of C(M, so(n)) as We first turn to F N,2 . By Condition 3.1 and definition of ξ, the vector field (f 1 , . . . ,fn) defines a Gaussian Borel random variable in a separable linear subspace S of ⊗n j=1 C β (M ). By the Hahn-Banach theorem its ⊗n j=1 C β (M )-norm can then be represented as a countable supremum (f 1 , . . . ,fn) ⊗n j=1 C β (M ) = sup t∈T |t(f 1 , . . . ,fn)| of bounded linear real functionals T = (t m : m ∈ N) defined on (S, · ⊗n j=1 C β (M ) ). We then apply a version of Fernique's theorem [13], concretely [16,Theorem 2.1.20], to the centred Gaussian process (X(t) := t(f 1 , . . . ,fn) : t ∈ T ) to deduce that for some fixed constant D > 0, where OH is the unit ball in ⊗n j=1H and where we definē By Borell's [4] isoperimetric inequality (see [16,Theorem 2.6.12]) the last probability is bounded below by (5.50) Φ Φ −1 (Πn(Ā N )) + m N where Φ = Pr(Z ≤ ·) is the cumulative distribution function of a N (0, 1) random variable Z. By the same arguments as those leading to (5.48) above, we havē Πn(Ā N ) ≥ exp{−c 2 2 N δ 2 N } for c 2 = c 2 (n, α), and using the basic inequality Φ −1 (u) ≥ − 2 log − u, 0 < u < 1, (see [14,Lemma K.6]) and monotonicity of Φ we can further lower bound the last quantity by b) To prove Part b), note first that to construct a δ N -covering of F N in · L 2 (M )distance it suffices, by definition of F N , to construct such a covering for a H α (M )-ball of radius m, so that (5.47) gives (5.51) log N (F N , · L 2 (M ) , δ N ) ≤ (A /δ N ) 2/α ≤ bN δ 2 N for some b = b(m, α) > 0. Lemma 5.11 and (2.2) imply that such a covering induces a (C 1 √ c 1 )δ N -covering of F N in the Hellinger distance h of log-cardinality at most bN δ 2 N . Since · L 2 (M ) is a norm and hence homogeneous, we can increase the constant from b to c = c(b, c 1 , C 1 , α) in (5.51) and obtain a δ n -covering for h. The desired inequality in Part b) follows.
Remark 5.15. We note that the introduction of the set F N,1 and the use of Borell's inequality in the previous Lemma could be avoided if one wishes to prove Theorem 3.2 only for any η > 0 (in this case a minor adaptation of Theorem 5.10 and of (5.51) can be shown to give a slightly worse rate δ N = N −β/(2α+2) in (5.52) below). We give this argument however to obtain our sharper bound for η in (5.55).

5.4.4.
Final contraction theorem. We now put everything together to establish a posterior contraction theorem for Φ and subsequently deduce Theorem 3.2 from it.
Then, to prove the second limit we will apply the stability estimate from Theorem 2.1 in the form (2.4) with Ψ = Φ 0 . By hypothesis we have Φ 0 C 1 (M ) Φ 0 C α (M ) < ∞; as a consequence for all Φ contained in the event in (5.52), the constants c(Φ, Φ 0 ) from (2.1) are uniformly bounded by a fixed constant that depends on m , Φ 0 C 1 (M ) and hence for those Φ's in view of Theorem 2.2 and since Φ 0 ∈ C α for α >β by hypothesis. Hence for such Φ's the combination of (5.53) and (5.54) gives Then by the inequalities of Jensen and Cauchy-Schwarz and it suffices to show that the second summand is stochastically O(η N ) as N → ∞.
Arguing as in the proof of Theorem 5.10 and using Lemma 5.13 implies that the sets A N from (5.40) with C from (5.46) satisfy P N Φ 0 (A N ) → 1 as N → ∞. Now Theorem 5.16, (3.2) and Markov's inequality imply where we have also used Fubini's theorem, (5.42), and that the Gaussian measure Π is supported in L 2 (M ) and hence integrates Φ 2 L 2 to a finite constant (see, e.g., [16, Exercise 2.1.5]).