On the extremal points of the ball of the Benamou-Brenier energy

In this paper we characterize the extremal points of the unit ball of the Benamou--Brenier energy and of a coercive generalization of it, both subjected to the homogeneous continuity equation constraint. We prove that extremal points consist of pairs of measures concentrated on absolutely continuous curves which are characteristics of the continuity equation. Then, we apply this result to provide a representation formula for sparse solutions of dynamic inverse problems with finite dimensional data and optimal-transport based regularization.


Introduction
The classical theory of Optimal Transport deals with the problem of efficiently transporting mass from a probability distribution into a target one. In the last thirty years, great advances in the understanding of the underlying theory have been achieved [2,46,49]. However, only recently these techniques are starting to be applied in order to solve computational problems in a great variety of fields, with logistic problems [8,18,19,20], crowd dynamics [37,38], image processing [29,35,39,41,44,47,48], inverse problems [16,32] and machine learning [5,27,28,40,45,52] being a few examples. In this paper we focus on the so-called Benamou-Brenier formula, which provides an equivalent dynamic formulation of the classical Monge-Kantorovich transport problem [31]. Introduced by Benamou and Brenier in [6], such formula allows to compute an optimal transport between two probability measures ρ 0 and ρ 1 on a closed bounded domain Ω ⊂ R d through the minimization of the kinetic energy among all the pairs (ρ t , v t ), where t → ρ t is a curve of probability measures on Ω, v t : Ω → R d is a time-dependent vector field and the pair (ρ t , v t ) satisfies distributionally the continuity equation (2) ∂ t ρ t + div(ρ t v t ) = 0 subjected to ρ t=0 = ρ 0 , ρ t=1 = ρ 1 .
The interest around the Benamou-Brenier formulation is motivated by its remarkable properties. First, it allows to compute an optimal transport in an efficient way [6] by means of a convex reformulation of (1), by introducing the momentum m t = ρ t v t . More precisely, denoting by X := (0, 1) × Ω the time-space cylinder, the Benamou-Brenier energy (1)  In addition, the dynamic nature of the Benamou-Brenier reformulation of optimal transport is at the core of many recent developments in the fields of PDEs, optimal transport and inverse problems. Indeed, the dynamic formulation allows to endow the space of probability measures with a differentiable structure [2,49], making possible the characterization of differential equations as gradient flows in spaces of measures [4,30,43] or the derivation of sharp inequalities [23,36,42]. Moreover, it motivated recent developments in unbalanced optimal transport theory [21,22,33,34], i.e., when the marginals are arbitrary positive measures. Finally, as the Benamou-Brenier energy provides a description of the optimal flow of the transported mass at each time t, which is a valuable information in applications, it was recently employed as a regularizer for variational inverse problems [12,13,16,29,35,50].
The goal of this paper is to characterize the extremal points of the unit ball of the Benamou-Brenier energy B at (3), and of a coercive version of it, which is obtained by adding the total variation of ρ to B. Both functionals are constrained via the continuity equation (2). Precisely, we introduce the functional (5) J α,β (ρ, m) := βB(ρ, m) + α ρ M(X) subjected to ∂ t ρ + div m = 0 , defined for all (ρ, m) ∈ M and α ≥ 0, β > 0. We then characterize the extremal points of the subset of M defined by We emphasize that we do not enforce initial conditions to the continuity equation in (5). To be more specific, we prove the following result (see Theorem 6).
Theorem. Let α ≥ 0, β > 0. The extremal points of the set C α,β are exactly given by the zero measure (0, 0) and the pairs of measures (ρ, m) such that where γ : [0, 1] → Ω is an absolutely continuous curve with weak derivativeγ ∈ L 2 , and such that a γ < +∞. If α = 0 the condition a γ < +∞ is satisfied if and only if γ is not constant.
We therefore show that the extremal points of the set C α,β are pairs of measures (ρ, m), with ρ concentrated on some absolutely continuous curve γ in Ω, and the density of m with respect to ρ is given byγ . Notice that such conditions are equivalent to the existence of a measurable field v : X → R d such that (6)γ(t) = v(t, γ(t)) for a.e. t ∈ (0, 1) , thus showing that γ is a characteristic associated to the continuity equation at (2) with respect to the field v. We prove the above Theorem in Section 3, with the aid of a probabilistic version of the superposition principle for positive measure solutions to the continuity equation (2) on the domain (0, 1) × Ω (see Theorem 7). We mention that the ideas behind such superposition principle are not new, and they were originally introduced in [3] for positive measures on (0, 1) × R d (see also [2,7,51]). The result of Theorem 7 allows to decompose any measure solution (ρ, m) of the continuity equation (4) with bounded Benamou-Brenier energy, as superposition of measures concentrated on absolutely continuous characteristics of (4), that is, curves solving (6) with v = dm/dρ. As a consequence, we show any pair of measures that is not of such a form can be written as a proper convex combination of elements of C α,β and thus it is not an extremal point. The opposite inclusion follows from the convexity of the energy and the properties of the continuity equation.
The interest in characterizing extremal points of the Benamou-Brenier energy is not only theoretical. It has been recently shown in [15] and [11] that in the context of variational inverse problems with finite-dimensional data, the structure of sparse solutions is linked to the extremal points of the unit ball of the regularizer. In the classical theory of variational inverse problems one aims to solve where U is the target space, R is a convex regularizer, A is a linear observation operator mapping to a finite-dimensional space and y is the observation. It has been empirically observed that the presence of the regularizer R is promoting the existence of sparse solutions, namely minimizers that can be represented as a finite linear combination of simpler atoms. While this effect has been well-understood in the case when U is finite dimensional, the infinite-dimensional case has been only recently addressed [11,15,24,25,54,53,55]. In particular, in [11,15], it has been shown that, under suitable assumptions on R and A, there exists a minimizer of (7) that can be represented as a finite linear combination of extremal points of the unit ball of R; namely the atoms forming a sparse solution are the extremal points of the ball of the regularizer. In view of the above discussion, in Section 4 we apply our characterization of the extremal points of the energy J α,β at (5) to understand the structure of sparse solutions for inverse problems with such transport energy acting as regularizer. We mention that the analysis is carried out for the case α > 0, as the functional J 0,β , corresponding to the rescaled Benamou-Brenier energy, lacks of compactness properties (see Remark 1). We verify that the assumptions needed to apply the representation theorems in [15] and [11] are satisfied by J α,β , and consequently we deduce the existence of a minimizer that is given by a finite linear combination of measures concentrated on absolutely continuous curves in Ω (see Theorem 10). As a specific application of Theorem 10 we consider the setting introduced in [16], where the regularizer J α,β is coupled with a fidelity term that penalizes the distance between the unknown measure ρ t computed at t 1 , . . . , t N ∈ (0, 1), and the observation at such times (see Section 4.2). This setting is relevant for applications, such as variational reconstruction in undersampled dynamic MRI. Employing the previous results we are able to prove the existence of a sparse solution represented with a finite linear combination of measures concentrated on absolutely continuous curves in Ω (see Corollary 12).
To conclude, we mention that characterizing the extremal points for a given regularizer has important consequences in devising algorithms able to compute a sparse solution. Notable examples have been proposed for the total variation regularizer in the space of measures [10,17] using so-called generalized conditional gradient methods (or Frank-Wolfe-type algorithms [26]). Inspired by the previous methods, and building on the theoretical results obtained in the present paper, we plan to develop numerical algorithms to compute sparse solutions of dynamic inverse problems with the optimal transport energy J α,β as a regularizer [12,13], effectively providing a numerical counterpart to the theoretical framework established in [16]. Finally, we remark that similar results to the ones presented in this paper can be obtained for unbalanced optimal transport energies. This has been recently achieved in [14], by introducing a novel superposition principle for measure solutions to the inhomogeneous continuity equation.

Mathematical setting and preliminaries
In this section we give the basic notions about the continuity equation, the Benamou-Brenier energy, and its coercive version J α,β anticipated in the introduction. We refer to [2,6,16] for a more detailed overview. For measure theoretical notions, we refer to the definitions in [1]. Given a metric space where the solutions of the continuity equation are intended in a distributional sense, that is, We remark that the above weak formulation includes no-flux boundary conditions for the momentum m on ∂Ω. Also, no initial and final data is prescribed in (8). Moreover, by standard approximation arguments, we can consider in (8) test functions in C 1 c (X) (see [2, Remark 8.1.1]). We now introduce the Benamou-Brenier energy. For this purpose, define the convex, lower semicontinuous and one-homogeneous map Ψ : Since Ψ is one-homogeneous, the above representation of B does not depend on λ. For some fixed α ≥ 0, β > 0, we consider the following functional where · M(X) denotes the total variation norm in M(X).
Remark 1. Note that in the definition of J α,β we add the total variation of ρ to the Benamou-Brenier energy. If α > 0 this choice enforces the balls of the energy J α,β to be compact in the weak* topology of M (see Lemma 4). As a consequence, the functional J α,β is a natural regularizer for dynamic inverse problems when the initial and final data are not prescribed [16]. We remark that, although in the case α = 0 the unit ball of the energy J 0,β is not compact, we can still characterize its extremal points. However, in this case, due to the lack of coercivity, J 0,β has limited use as a regularizer for dynamic inverse problems.
For a measure ρ ∈ M(X), we say that ρ disintegrates with respect to time if there exists a Borel family of measures We denote such disintegration with the symbol ρ = dt ⊗ ρ t . Further, we say that a curve of is continuous for each fixed ϕ ∈ C(Ω). The family of narrowly continuous curves will be denoted by C w ([0, 1]; M(Ω)). We also introduce C w ([0, 1]; M + (Ω)), as the family of narrowly continuous curves with values into the positive measures on Ω. We now recall several results about B, J α,β and measure solutions of the continuity equation (8), which will be useful in the following analysis. For proofs of such results, we refer the interested reader to Propositions 2.2, 2.4, 2.6 and Lemmas 4.5, 4.6 in [16].
Lemma 2 (Properties of B). The functional B defined in (9) is convex, positively one-homogeneous and sequentially lower semicontinuous with respect to the weak* topology on M. Moreover it satisfies the following properties: Lemma 3 (Properties of the continuity equation). Assume that (ρ, m) ∈ M satisfies (8) and Lemma 4 (Properties of J α,β ). Let α ≥ 0, β > 0. The functional J α,β is non-negative, convex, positively one-homogeneous and sequentially lower semicontinuous with respect to weak* convergence on M.

Characterization of extremal points
The aim of this section is to characterize the extremal points of the unit ball of the functional J α,β at (10) for all α ≥ 0, β > 0, namely, of the convex set To this end, let us first introduce the following set.
We remind that AC 2 ([0, 1]; R d ) denotes the space of absolutely continuous curves having a weak derivative in L 2 . We point out that by definition a γ > 0 for all choices of α ≥ 0, β > 0. Moreover the condition a γ < +∞ is always satisfied if α > 0. When α = 0 we instead have a γ < +∞ if and only if 1 0 |γ(t)| 2 dt > 0, that is, the set C 0,β does not contain constant curves. For the extremal points of C α,β we have the following characterization.
The proof of Theorem 6 is postponed to Section 3.2. In order to show the inclusion Ext(C α,β ) ⊂ {(0, 0)}∪C α,β we will make use of a superposition principle for measure solutions of the continuity equation (8). This result is not new, and it is proved in [2,Ch 8.2] for the case Ω = R d . In Section 3.1 we show that it also holds for bounded closed domains.
3.1. The superposition principle. Before stating the superposition principle in Ω, we introduce the following notation. Let Γ := γ : [0, 1] → R d : γ continuous be equipped with the supremum norm, i.e., γ ∞ := max t∈[0,1] |γ(t)|. For every fixed t ∈ [0, 1] let e t : Γ → R d be the evaluation at t, that is, e t (γ) := γ(t). Notice that e t is continuous. For a measurable vector field v : (0, 1) × R d → R d , we define the following subset of Γ consisting of AC 2 curves solving the ODE (6) in the sense of Carathéodory: Moreover define the set of solutions to the ODE which live inside Ω for all times: The superposition principle for probability solutions to (8) states as follows.
Theorem 7. Let t ∈ [0, 1] → ρ t ∈ P(Ω) be a narrowly continuous solution of the continuity equation in the sense of (8), for some measurable v : (0, 1) × Ω → R d such that Then there exists a probability measure σ ∈ P(Γ) concentrated on Γ v (Ω) and such that ρ t = (e t ) # σ for every t ∈ [0, 1], that is, Proof. Letv : (0, 1) × R d → R d be the extension to zero of v to the whole R d . Similarly, for each t ∈ [0, 1], letρ t ∈ P(R d ) be the extension to zero of ρ t in R d . Note that the pair (ρ,vρ) is a solution of the continuity equation in (0, 1) × R d in the sense of (8). Moreoverρ andv satisfy (14) in (0, 1) × R d . Therefore we can apply Theorem 8.2.1 in [2] and obtain a probability measure σ ∈ P(Γ) concentrated on Γv(R d ) and such thatρ t = (e t ) # σ for all t ∈ [0, 1], that is, We claim that σ is concentrated on Γ v (Ω). In order to show that, partition Γv(R d ) into Notice that, since Ω c is open and v ≡ 0 in Ω c , the curves in A are constant, so that we can write From this, it follows that A ⊂ e −1 0 (Ω c ). Moreover, (16) impliesρ 0 (Ω c ) = σ(e −1 0 (Ω c )). Therefore, using thatρ t is concentrated on Ω, we conclude that σ(A) = 0, showing that σ is concentrated on Γv(Ω). Finally, (16) implies (15) sinceρ t is supported in Ω and it coincides with ρ t in Ω. Also Γv(Ω) = Γ v (Ω) by definition ofv, thus concluding the proof.
As (ρ j , m j ) are solutions of the continuity equation and J α,β (ρ j , m j ) = 1, from Lemma 3 it follows that the maps t → (e t ) # σ j are narrowly continuous. In particular, (35) holds for each t ∈ [0, 1]. However, by (27) and by definition of A, σ 1 , σ 2 , we have which contradicts (35). Therefore (ρ 1 , m 1 ) = (ρ 2 , m 2 ), which shows that the decomposition (31) is non-trivial. This is a contradiction, since we are assuming that (ρ, m) is an extremal point for C α,β . Thus the claim follows. Suppose now that α = 0 and define the set Notice that Z is measurable, due to the measurability of the map L at (28). We claim that σ(Z) = 0. In order to prove that, let Z c := Γ Z and define the measures σ Z := σ Z, σ Z c := σ Z c , so that σ = σ Z + σ Z c . Recalling that ρ t = (e t ) # σ for all t ∈ [0, 1], we can decompose Let j = 1, 2. Notice that ρ j ∈ M + (X) since σ is a positive measure concentrated on Γ v (Ω) and a > 0. Following similar computation as (33)- (34) we infer that (ρ j , m j ) solves the continuity equation in the sense of (8). Moreover, by definition of Z and the fact that σ is concentrated on Γ v (Ω) we obtain where we employed Fubini's Theorem, which holds thanks to the measurability of the map L and the identity (30), the latter implying boundedness of the last term in (37). By (37) and arguing as in (29)- (30), it is immediate to check that J 0,β (ρ j , m j ) = J 0,β (ρ, m). Recalling (25) we then obtain (ρ j , m j ) ∈ C 0,β . As (ρ, m) is an extremal point of C 0,β , from (36) we deduce that (ρ 1 , m 1 ) = (ρ 2 , m 2 ) and thus dt ⊗ (e t ) # σ Z = 0. In particular, there existst ∈ [0, 1] such that (et) # σ Z = 0. Hence for every E ⊂ Γ measurable, by the positivity of σ, we have (27) and the definition of λ 1 , λ 2 , we conclude that λ 1 , λ 2 > 0. With this property established, the claim that supp ρ t is a singleton for each t ∈ [0, 1] follows by repeating the same arguments of the case α > 0, employing the decomposition of (ρ, m) as in (31).
4. Application to sparse representation for inverse problems with optimal transport regularization In this section we deal with the problem of reconstructing a family of time-dependent Radon measures given a finite number of observations. To be more specific, let H be a finite dimensional Hilbert space and A : C w ([0, 1]; M(Ω)) → H be a linear continuous operator, where continuity is understood in the following sense: given a sequence (t → ρ n t ) in C w ([0, 1]; M(Ω)), we require that where, with a little abuse of notation, we will denote by ρ n both the curve t → ρ n t , as well as the measure ρ n := dt ⊗ ρ n t . For some given data y ∈ H, we aim to reconstruct a solution ρ ∈ C w ([0, 1]; M(Ω)) to the dynamic inverse problem (41) Aρ = y .
For α > 0 and β > 0 we regularize the above inverse problem by means of the energy J α,β defined in (10), following the approach in [16]. In practice, upon introducing the space we consider the Tikhonov functional G : M → R ∪ {+∞} defined as where F : H → R ∪ {+∞} is a given fidelity functional for the data y, which is assumed to be convex, lower semicontinuous and bounded from below. Additionally, we assume that G is proper. We then replace (41) by Remark 8. Two common choices for the fidelity term F in the case H = R k are, for example, i) F (x) = I {y} (x) for a given y ∈ R k that forces the constraint Aρ = y, ii) F (x) = 1 2 x − y 2 2 that recovers a classical l 2 penalization. Remark 9. Under the above assumptions on A and F , problem (43) admits a solution. Indeed, since G is proper, any minimizing sequence {(ρ n , m n )} n is such that {G(ρ n , m n )} n is bounded. As F is bounded from below and J α,β ≥ 0, we deduce that {J α,β (ρ n , m n )} n is bounded. Therefore, Lemma 4 implies that (ρ n , m n ) converges (up to subsequences) to some (ρ, m) ∈ M, in the sense of (13). By weak* lower semicontinuity of J α,β in M (see Lemma 4) and by (40) together with the lower-semicontinuity of F , we infer that (ρ, m) solves (43).
It is well-known that the presence of a finite-dimensional constraint in an inverse problem, such as (41), promotes sparsity in the reconstruction. This observation has been recently made rigorous in [15] and [11], where it has been shown that the atoms of a sparse minimizer are the extremal points of the ball of the regularizers. In Theorem 6, we provided a characterization for the extremal points of the ball of J α,β . Therefore, specializing the above-mentioned results to our setting yields the following characterization theorem for sparse minimizers to (43). Theorem 10. Let α, β > 0. There exists a minimizer (ρ,m) ∈ M of (43) that can be represented as (44) ( ∈ Ω for each t ∈ [0, 1], and a −1 γ i := β 2 1 0 |γ i | 2 dt + α. In other words, the above theorem ensures the existence of a minimizer of (43) which is a finite linear combination of measures concentrated on the graphs of AC 2 -trajectories contained in Ω. In Section 4.1 we give a proof of Theorem 10, and we conclude the paper with Section 4.2, where we apply the sparsity result of Theorem 10 to dynamic inverse problems with optimal transport regularization, following the approach of [16].
4.1. Proof of Theorem 10. As already mentioned, the proof is an immediate consequence of Theorem 6 and a particular case of [11,Corollary 2] (see also [11,Theorem 1]). Before proceeding with the proof, for the reader's convenience, we recall Corollary 2 from [11]. The definitions appearing in the statement below will be briefly recalled in the proof of Theorem 10.
Suppose that the set of minimizers of (45), denoted by S, is non-empty. Additionally, assume that there existsû ∈ Ext(S) such that the set is linearly closed, the lineality space of C is {(0, 0)} and inf u∈U R(u) < R(û). Then, exactly one of the following conditions holds: i)û is a convex combination of at most dim(H) extremal points of C, ii)û is as a convex combination of at most dim(H) − 1 points, which are either extremal points of C, or belong to an extreme ray of C.
Proof of Theorem 10. We just need to verify that we can apply Theorem 11 to the variational problem (43). So, we choose U = M , R = J α,β and F and A satisfying the assumptions stated above. Let S be the set of solutions to (43). First, notice that in Remark 9 we have already shown that the set of minimizers for (43) is non-empty, so that S = ∅. Moreover S is compact with respect to the weak* topology. Indeed, given a sequence (ρ n , m n ) in S we can use Lemma 4 to extract a subsequence (not relabelled) such that (ρ n , m n ) * ⇀ (ρ, m) in M and ρ n t * ⇀ ρ t in M(Ω) for every t ∈ [0, 1]. Using the sequential lower semicontinuity of J α,β with respect to weak* convergence combined with the continuity of A (according to (40)) and the lower semicontinuity of F , we obtain (ρ, m) ∈ S. We conclude that S is sequentially weakly* compact and hence weakly* compact, thanks to the metrizability of the weak* convergence on bounded sets. Finally note that S is convex thanks to the convexity of F and J α,β (Lemma 4). By Krein-Milman's Theorem, we then infer the existence of a (ρ,m) ∈ Ext(S). The lineality space of C is defined as lin(C) = rec(C) ∩ (−rec(C)), where rec(C) is the recession cone of C defined as the set of all (ρ, m) ∈ U such that C + R + (ρ, m) ⊂ C. Hence, from the coercivity of J α,β in Lemma 4 it is immediate to conclude that lin(C) = {(0, 0)}. Moreover, C is linearly closed if the intersection of C with every line is closed. It is easy to verify that, as C is weakly* closed (Remark 9), it is also linearly closed. Finally, the assumption inf (ρ,m)∈ M J α,β (ρ, m) < J α,β (ρ,m) is satisfied whenever (ρ,m) = 0, as in this case J α,β (ρ,m) > 0, while inf (ρ,m)∈ M J α,β (ρ, m) = 0. Hence, the hypotheses of Theorem 11 for the functional (42) are verified. Notice also that C does not contain extreme rays. In order to prove that, we first recall that a ray of C is any set of the form r p,v = {p + tv : t > 0} for p, v ∈ C, v = 0. An extreme ray of C is a ray r p,v such that for every segment intersecting r p,v , the whole segment is contained in r p,v . Thanks to the coercivity of J α,β in Lemma 4, it is immediate to see that C contains no rays and thus no extreme rays. Hence, from either of the conclusions i), ii) in Theorem 11, we deduce that there exists a minimizer (ρ,m) ∈ M of (43) that can be represented as (47) (ρ,m) = where (ρ i , m i ) ∈ Ext(C α,β ), p ≤ dim(H), c i > 0 and p i=1 c i = J α,β (ρ,m). We remark that if (ρ,m) = 0, the assumption inf (ρ,m)∈ M J α,β (ρ, m) < J α,β (ρ,m) in Theorem 11 is not satisfied, but the representation (47) holds trivially. Using the characterization of extremal points in Theorem 6 and (47), we obtain an explicit sparse representation for solutions of (43) and the proof is achieved.

4.2.
Dynamic inverse problems. Theorem 10 provides a representation formula for sparse solutions of (43) that holds for every A and F satisfying the above-stated hypotheses. A relevant choice for A and F is proposed in [16] as a model for dynamic inverse problems: in particular, the authors apply their framework to variational reconstruction in undersampled dynamic MRI. In what follows we make an explicit choice of F and A in order to apply Theorem 10 to a special case of the framework in [16], namely the case of discrete time sampling, and finite-dimensionality of the data for each sampled time.
To be more specific, consider a discretization of the interval [0, 1] in N points t 1 < t 2 < . . . < t N and assume that we want to reconstruct an element of C w ([0, 1]; M(Ω)), by only making observations at the time instants t 1 , . . . , t N . To this aim, let H t i be a family of finite-dimensional Following [16], we regularize the above problem by Notice that A is continuous in the sense of (40), thanks to the assumptions on A t i . We can then equivalently rewrite (48) as In this way, we recover a problem of the type of (43), where F (x) := 1 2 x − y 2 H . Notice that F is convex, lower semicontinuous and bounded from below. Moreover, the functional in (49) is proper, since J α,β (0, 0) = 0. Hence, we can apply Theorem 10 to conclude the following result. Corollary 12. Let α, β > 0. There exists a minimizer (ρ,m) ∈ M of (48) that can be represented as (50) (ρ,m) = where γ i ∈ AC 2 ([0, 1]; R d ) with γ(t) ∈ Ω for each t ∈ [0, 1], and a −1 γ i := β 2 1 0 |γ i | 2 dt + α.
Remark 13. The upper bound p ≤ N i=1 dim(H i ) in the representation formula (50) might not be optimal. However, a careful analysis of the faces of the ball of the Benamou-Brenier energy, possibly under additional assumptions on the operator A and fidelity term F , could be needed to substantiate such conjecture. We leave this question open for future research.