A survey of Optimal Transport for Computer Graphics and Computer Vision

Optimal transport is a long‐standing theory that has been studied in depth from both theoretical and numerical point of views. Starting from the 50s this theory has also found a lot of applications in operational research. Over the last 30 years it has spread to computer vision and computer graphics and is now becoming hard to ignore. Still, its mathematical complexity can make it difficult to comprehend, and as such, computer vision and computer graphics researchers may find it hard to follow recent developments in their field related to optimal transport. This survey first briefly introduces the theory of optimal transport in layman's terms as well as most common numerical techniques to solve it. More importantly, it presents applications of these numerical techniques to solve various computer graphics and vision related problems. This involves applications ranging from image processing, geometry processing, rendering, fluid simulation, to computational optics, and many more. It is aimed at computer graphics researchers desiring to follow optimal transport research in their field as well as optimal transport researchers willing to find applications for their numerical algorithms.


Introduction
Optimal transport is a theory that has been mainly used to compare probability distributions and interpolate between them.The optimal transport problem was first stated by Monge in 1781 [Mon81] as the problem of moving a pile of earth from one location to another with minimum effort.While Monge did not succeed in solving this problem, Kantorovich did more than 150 years later, with his new linear programming formalism, in 1942 [Kan42].While the problem was formally solved, fast algorithmical solutions remained limited.For decades this problem has thus remained computationally intractable even for moderately coarse discretizations.With recent advances in numerical methods and solvers, optimal transport has finally found its way to solve realistic computer graphics and vision related problems.It is now routinely used to solve problems in image processing, geometry processing, rendering, and animation, as well as many other fields -problems we will cover in breadth throughout this survey.
While the mathematical theory behind optimal transport and numerical techniques are largely presented in recent books [Vil09,San15,PC17] and the state-of-the-art in numerical solvers is extensively developed in a recent survey [PC19], our state-of-the-art report focuses on computer graphics and vision applications.Applications in image processing are more exhaustively covered in the Habilitation of Papadakis [Pap15].The aim of this survey is twofold.First, it gives a brief and gentle introduction to optimal transport for computer graphics and computer vision researchers who are not necessarily comfortable with heavy mathematical developments but desiring to follow optimal transport research in their own field.Second, it gives optimal transport researchers a set of potentially exciting applications to their research.To our knowledge, these aspects are not covered by existing surveys.Researchers familiar with optimal transport can directly skip to the application section they are interested in.These applications are organized into sections corresponding to different fields of computer graphics and computer vision: image processing (sec.4), rendering (sec.5), geometry processing (sec.6), topological data analysis (sec.7), and simulation and animation (sec.8).Table 6 is a good starter for an overview of the possible applications of optimal transport.

Principles of optimal transport
While this report does not aim at fully exposing optimal transport theory and numerical techniques, we briefly present the problem in layman's terms as well as broad classes of solvers.This will involve sloppy notations and definitions that might hurt mathematically inclined researchers -proper definitions and introductions to optimal transport can be found in other work [San15,PC19].For instance, our exposition will not require knowledge of measure theory, and we will merely refer to the optimal transport of nonnegative functions or, at worst, set of Dirac distributions, seen as point sets.

T f g
x Figure 1: The Monge problem seeks the minimum cost transport map T allowing to reshape a function f to another function g of the same total integral (mass).This figure illustrates the 1-d case, but the theory extends to functions over arbitrary dimensional spaces, and even Riemannian manifolds.

The Monge problem
Monge was interested in moving a pile of earth from one location to another with minimum amount of effort (Fig. 1).He formulated the problem as an energy minimization problem, considering the cost of moving the pile of earth (or of sand) as the sum of the costs of moving each of its individual particles, each particle of earth being moved from location x ∈ X to location T (x) ∈ X.The cost of moving each particle from a location x to a location y is denoted c(x, y), and is also called ground distance.Monge initially used c(x, y) = ∥x−y∥, the distance traveled by the particle of sand, though this renders the problem much more complicated.For many practical applications in computer graphics, the quadratic cost c(x, y) = ∥x − y∥ 2 is used, i.e., the squared distance, which has nice properties and physical interpretation.We assume that the pile of earth to move has a shape that can be represented by a function f over domain X, and we want to give it a new shape g elsewhere in the domain X.The Monge problem defines the transport cost W ( f , g) between f and g as follows: W ( f , g) = min T X c(x, T (x)) f (x)dx (1) The optimization is performed over all possible ways to reshape the function f using all possible warps T .Here, X is here an ndimensional Euclidean space, and the formulation extends to surfaces or Riemannian manifolds, replacing distances with geodesic distances).Denoting x = (x 0 , x 1 , . . ., xn) the space variable, the optimization constraint 2 involves the Jacobian J T of the transformation T = (T 1 , . . ., Tn) defined at x as: and enforces "mass preservation", i.e., that after reshaping f by the warp T , the pile of sand matches g.One can see y = T (x) as a change of variable, and constraint 2 as the well-known "change of variable formula" for integration.
Of course, way beyond moving earth, this problem allows to reshape actual functions.The transport cost W ( f , g) gives the optimal cost of moving f towards g, assuming that the total mass (i.e., the integral of) f equals that of g, and that these functions are never negative.A particular case of "non-negative functions having the same integral" is non-negative functions of integral 1.This arises notably when dealing with probability density functions.This framework has thus met with large success in the probabilistic setting, to transport probability density functions and compare them with this transport cost W ( f , g) (it also generalizes to probability distributions).
If c(x, y) = c(y, x), we have W ( f , g) = W (g, f ), and if c(x, y) is a distance between points x and y, then W ( f , g) is also a distance between functions f and g.In this case, this guarantees that W ( f , g) = W (g, f ), that W ( f , g) = 0 iff f = g, and that W respects the triangle inequality W ( f , g) ≥ W ( f , h) + W (h, g) ∀h.In fact, when c(x, y) = ∥x − y∥ p for p ≥ 1, the function Wp( f , g) = W ( f , g) 1/p is called Wasserstein-p distance and is also a distance between functions f and g.This distance is also called Earth Mover's Distance (EMD), in reference to the Monge problem.

The Kantorovich problem
The Monge problem is non-linear, extremely difficult to solve in general, and may not have solutions in contexts useful to the computer scientist.For instance, replacing the function f by a Dirac distribution δx and function g by a sum of two half Dirac distributions 1 2 δy 0 + 1 2 δy 1 , it becomes impossible to find a one-to-one warp T that sends the mass at a single x to two different locations y 0 and y 1 .Dirac distributions are often encountered when dealing with discrete problems, e.g., attempting to discretize a continuous problem or when sampling from a probability distribution.Fortunately, Kantorovich came up with a solution that both simplifies the resolution of the optimal transportation problem, and generalizes the formulation to extend it to a broader class of objects.Now, one does not attempt to find a warp (more often called transport map, see Fig. 2 right) T : X → X that maps each particle to its new location, but a transport plan P : X × X → R that indicates how much mass is transported between two locations.This led to the (sloppy) formulation † : where the optimization is now performed over all transport plans.The constraints on the transport plans express the fact that the total mass moved from each location x corresponds to the mass f (x) present at location x (eq.5), that the total mass received at location y corresponds to what is needed at that location, that is g(y) (eq.6), † Beware that more formally, P is a measure and requires more care in writing the integrals which should read X×X c(x, y)dP(x, y).Similarly, constraints 5,6 would require the notion of push-forward, which is beyond the scope of our survey.Right.The Kantorovich solution yields a transport plan P that tells how much mass is moved between all pairs (x, y) of locations.This transport plan is very sparse.When there is no Dirac or oddities, both solutions coincide, and the transport plan is the graph of the transport map.In 1-d like here, this graph is monotonically increasing.More generally, in higher dimension, it is the gradient of a convex function.
and that the mass transport from each location and received by another location cannot be negative (eq. 7), all these constraints enforce mass conservation.
One advantage of this formulation is that it is more flexible, and when Monge's solution T exists, it coincides with that of Kantorovich by taking P = (Id, T ) with Id the identity function (Fig. 2).Another advantage is that it yields a linear program.This is more easily seen, from a computer scientist perspective, if one discretizes the function f at m discrete locations (x i ) i=1..m ∈ X m and function g at n locations (y j ) j=1..n ∈ X n : where the optimization is now performed over all m×n matrices P ∈ R m×n .This indeed corresponds to minimizing a linear combination of variables, under both m + n linear equality constraints and m n positivity constraints.
Since it corresponds to a linear program, it admits a dual formulation with the same optimal cost, which, in its continuous formulation, gives: where ψ : X → R and φ : X → R are dual variables (the plus sign for ψ depends on conventions, you may find it with a minus).Duality can probably again be better understood from a computer scientist f g x Figure 3: Using the map Tt = (1 − t)Id + t T , one can smoothly interpolate between the two functions shown in Fig. 1.This is called displacement interpolation (here with t = 0.25, 0.5, 0.75, from red to blue.
point of view in the discrete setting: which now optimizes vectors φ ∈ R n and ψ ∈ R m under m n inequality constraints.Intuitively, this corresponds to the problem of an transport company who charges the consumer an amount of money ψ(x) for loading their truck at location x with 1 unit of mass and charging again φ(y) for unloading their truck at location y of 1 unit of mass.The company's goal is to maximize their profit, under the constraint that the cost of loading plus unloading one unit of mass is not greater than the cost c(x, y) defined in the primal problem.The dual formulation is sometimes used in linear program solvers, and for solving the semi-discrete optimal transport problem (see sec.3.2).

Wasserstein barycenters
As mentioned previously, the optimal transport framework produces a distance between non-negative functions that have the same integral when the cost of moving a particle from x to y is c(x, y) = ∥x − y∥ p .This notably allows to produce interpolations between functions.The case p = 2 is particularly interesting.When displacing particles of mass from x to their target T (x) (e.g., assuming the Monge formulation), one may stop the motion of all particles simultaneously, i.e., replacing the map T by Tt = (1 − t)Id + t T for some time parameter t.Doing so continuously warps f towards g and the resulting interpolation, ft (x) = f (Tt (x))| det J Tt (x)|, is called displacement interpolation (Fig. 3).Displacement interpolation can also be interpreted as moving along a geodesic between the two functions with respect to the Wasserstein metric: any point on this geodesic corresponds to an intermediate function during the interpolation.
This interpolation can also be obtained by solving the following optimization problem that resembles the one used for obtaining a linear interpolation between any two points (replacing W 2 by ∥.∥): For instance, for t = 0.5, this amounts to finding a function h that is as equidistant as possible to the functions f and g in term of Wasserstein-2 distance.
The formulation as an energy minimization problem allows to © 2023 The Author(s) generalize the notion of displacement interpolation to the case where one wants to interpolate between more than two non-negative functions of the same integral.This leads to the notion of Wasserstein barycenter of a set of functions { f k } k with their corresponding barycentric weights {λ k } k as: For instance, the Wasserstein centroid of three functions f 0 , f 1 , f 2 would be the Wasserstein barycenter of

Particular cases
There are simple cases which can be trivially solved.

The 1-d case
Computing the optimal transport map T between two 1-d functions f and g can be easily achieved when the cost c(x, y) is a convex function of the distance ∥x−y∥ (e.g., when computing Wp for p > 1).We consider the cumulative sums: The optimal transport is then: Here G −1 denotes the inverse function.However, since g may be sometimes zero, it means G(x) may be constant for a certain range of x values, thus, G may not be injective, and G −1 may be ill defined.
It is thus necessary to define a more general inverse function: In the discrete setting, F and G are obtained using discrete cumulative sums (e.g., Matlab's cumsum) or higher order integration schemes (e.g., Newton-Cotes), and the inverse function can be obtained using a line search.Computing the map for consecutive discrete values of x allows to start the line search at the previously found location, producing an O(m + n) cost algorithm.
In the case where f and g actually consist in a sum of the same number of uniformly weighted Diracs, a transport map merely assign each Dirac of f to a Dirac of g.It can be shown that in the convex cost setting above, one can get the optimal transport map simply by sorting the Diracs along the line from left to right, and associating the k th Dirac of f with the k th Dirac of g.
In 1-d with convex cost, Wasserstein barycenters can also be easily achieved by considering: This allows to obtain B −1 , the inverse cumulative sum of the desired barycenter b, as a function of all {F −1 k } k , the inverse cumulative sums of each input distribution.In the discrete setting, once the inverse cumulative sum of the result is numerically computed, one can obtain the Wasserstein barycenter b by numerical inversion followed by numerical differentiation.
Costs that are concave functions of the ℓ 2 distance between particles leads to much more complex algorithms [DSS12], and other 1-d extensions have been proposed (e.g., transporting on the circle [RDG11] or transporting Diracs towards a larger number of Diracs [BC19]).
To simply compute the optimal transport cost, for more general costs of the form c(x, y) = c(x − y), we can use [Vil09]: One dimensional transport is the basis for higher-dimensional techniques that rely on 1-d projections.

Transporting Gaussians
In some cases, and in particular high-dimensional datasets, the distribution of data is well approximated by a Gaussian distribution, and one needs to merely transport Gaussians.This can be easily achieved.Interestingly, one can show that Wasserstein barycenters of Gaussians are also Gaussians.
For the quadratic cost, the optimal transport cost between two d-dimensional Gaussians N 0 = N (µ 0 , Σ 0 ) and N 1 = N (µ 1 , Σ 1 ) of means µ 0 ∈ R d and µ 1 ∈ R d and covariance matrices Σ 0 ∈ R d×d and Σ 1 ∈ R d×d is obtained by: The square roots 1/2 are understood as matrix square roots (involving square roots of eigenvalues).
The displacement interpolation between two Gaussians N 0 and with with Σ obtained by solving a fixed point equation: 1/2 (26)

Numerical methods
Again, exhaustive coverage of numerical solvers can be found in the recent survey of Peyré and Cuturi [PC19], and this section is aimed at briefly exposing broad classes of solvers.

LP solvers
Linear programming (LP) solvers provide exact solutions to the discrete optimal transport problem.The optimal transport linear program can be seen as a graph, where each of the m and n discrete locations of the input and target functions are nodes, and moving mass from location i to location j (i.e., when the entry (i, j) of the transport plan matrix P is non-zero) amounts to connecting node i with node j with an arc.An interesting property of this LP is that it possess at most m + n − 1 non-zero variables, i.e., the graph of the optimal transport plan has at most m + n − 1 arcs with nonzero flow, among m n possible arcs, which makes it suitable for sparse graph representations.Displacement interpolation between two distributions can be obtained in this way by advecting particles along arcs [BvdPPH11].When the particles are uniformly weighted and m = n, the optimal transport problems amounts to a Linear Assignment Problem (LAP), i.e., the transport plan is actually a one-to-one mapping, and there are only m non-zero arcs.Note that while an LP formulation of Wasserstein barycenters (for more than two distributions) is possible, existing solvers are limited to very small problems [ABM16].
For the most general problem, with arbitrary cost functions, the network simplex algorithm has shown to work fastest [BvdPPH11].However, memory storage can still become a bottleneck for values of m = n ≈ 40, 000 (problems that typically take minutes to solve, though this heavily depends on the cost function).
For problems structured on a grid with quadratic ground distance, it becomes possible to benefit from the separability of the quadratic cost and split the optimal transport problem along each axis, leading to faster solutions [ABGV18] (about 50x faster than default network simplex for 128x128 grids, and works with 512x512 grids).Multiscale approaches may also be used in the Cartesian grid setting [Sch16,OR20].
The special case of ℓ 1 ground distance has also received a lot of attention.To alleviate the computational complexity of LP solvers, in the case of 2-d histogram to 2-d histogram mass transportation, Ling and Okada [LO07] proposed to split the particle flow into sequences of motions of magnitude 1.This drastically reduces the number of variables and allows to define a network flow interpretation to solve the problem efficiently.They further reduce the complexity by introducing a tree-based algorithm extending the transportation simplex to the ℓ 1 case yielding the EMD-ℓ 1 algorithm.
A state-of-the-art general purpose solver is the Network Simplex of the LEMON C++ library [DJK11] adapted by Bonneel et al. [BvdPPH11] and found at https://github.com/nbonneel/network_simplex.

Semi-discrete solvers
In the special case where one is interested in transporting a continuous function towards a set of Dirac distributions (of the same total mass), the Monge optimal transport problem leads to a solution that can be expressed with a geometric construction called Laguerre's diagrams [AHA92].If in addition the ground distance is quadratic, this corresponds to a power diagram, i.e., a Voronoï diagram for which the size of each cell can be controlled with a weighting parameter.An intuitive example of the semi-discrete optimal transport Here, a set of 10 bakeries can produce either 5 units of bread (orange dots) or 1 unit of bread (pink dots).Inhabitants of a uniform density population are divided among these bakeries such that the total transportation effort is minimized and their need for bread is exactly satisfied.This leads to a cell decomposition where each inhabitant in one cell will go to its associated bakery (arrows) -bakeries producing more bread lead to bigger cells (in blue).Note that the bakery sites are not necessarily inside their corresponding cells (here, all bakeries producing 1 unit of bread are located outside of their cell).
problem consists in a density of population desiring to get their morning bread from a set of bakeries at specific locations.If the bread supplies of all bakeries is unlimited, the optimal choice for each inhabitant would be to go to the nearest bakery to buy its bread -this leads to a Voronoï diagram partition of the space.However, if the sum of the bread supplies of all bakeries exactly equals the sum of the bread demands of all inhabitants, each inhabitant would not necessarily go to its nearest bakery as its bread supply might be exhausted by other customers.Instead, each bakery would attract customers from a connected and convex polygonal neighborhood (in the case of the quadratic ground distance), that may not necessarily contains the bakery location (Fig. 4).The diagram produced by other costs can be much more complex -e.g., for the ground distance c(x, y) = ∥x − y∥, this leads to the Apollonius diagram, that has curved edges.This lends itself to efficient implementations, as power diagrams are well studied geometric constructions that can be obtained by computing a Voronoï diagram in a higher dimensional space.Typical semi-discrete solvers optimize an energy designed so that, at optimality, the area of each power cell (i.e., the polygonal part of the city going towards its optimal bakery, in the above example) matches the mass of each Dirac (i.e., the amount of bread available at that bakery).This leads to fast, multiscale, implementations in 2-d [Mér11] and 3-d [Lév15], or using second order optimization schemes [Lév15, SWS * 15, KMT19], and extends to optimal transport on the sphere [CQW * 19] or when the masses in the two distributions are different [Lév22].This is also relatively easy to implement -e.g., as a student project or lab.This however suffers from the difficulty in implementing these constructions in higher than 3-dimensional spaces.
Typical sizes of distributions handled by this setting by a stateof-the-art solver involves up to millions of particles, in 3-d, on the GPU [vHLM22].Insights on semi-discrete optimal transport solvers and their implementation can be found in the paper of Lévy and Schwindt [LS18].

Entropy regularized solver
By adding the entropy of the transport plan as a regularization term in the Kantorovich minimization problem (eq. 4 and 8), it turns out that the discrete optimal transport problem can be solved with the help of iterative matrix multiplications and elementwise divisions.The resulting algorithm is called the Sinkhorn algorithm.It is extremely easy to implement (a couple of lines of Matlab or Python code), can be very fast, works with arbitrary costs and geometric settings, and extends to the construction of Wasserstein barycenters [CD14, SdGP * 15].In the special case of grids and quadratic ground distance, these matrix multiplications can advantageously be replaced by fast Gaussian convolutions [SdGP * 15], and the implementation can be even faster and simpler to implement (also lending itself well to labs or projects for students).
The downside is that the entropic regularization adds a small amount of blur to the transport plans: the amount of mass leaving location i will spread a little bit over its target location j (technically, it is spread infinitely, but the mass traveling far away is negligible -see Fig. 5).It will spread even more as the amount of added entropy increases.While it is tempting to make the amount of added entropy tend to zero to recover sharp transport plans or Wasserstein barycenters, this approach leads to numerical instabilities as the amount entropy decreases too much.Performing computations in the log-domain stabilizes these computations, but also makes them much slower [CPSV18].
Another downside of entropy regularized distances, denoted Wε( f , g), is that the resulting optimal energy is not a distance anymore, while the corresponding non-regularized problem leads to a distance.Specifically, with the entropic term, we do not have Wε( f , f ) = 0 and the triangle inequality also largely breaks.An interesting approach is to debias the result energy, and instead con- . This enforces that Wε( f , f ) = 0 by construction, and in practice, reduces the number of cases where triangle inequality is not satisfied.The resulting energy is called the Sinkhorn divergence [FSV * 19].This energy can be used in an energy optimization routine to produce (approximate) Wasserstein barycenters.In particular, using a single descent step already produces plausible results, and amounts to linearizing the problem [Fey20].
Nowadays, a state-of-the-art Python solver for Sinkhorn divergences for scattered data, implemented on the GPU, with an additional multiscale strategy, is that of GeomLoss [FSV * 19].It supports discrete problems of about a million of data points within seconds and is currently available at https://www.kernel-operations.io/geomloss/.
Replacing entropy by other types of regularization is possible, such as regularizing by the quadratic norm of the transport plan [ES18], but are far less used in practice for computer vision and graphics applications.

Sliced optimal transport
An alternative functional for high-dimensional optimal transport relies on optimal transport over projections on random onedimensional lines [RPDB12,Bon13].The corresponding distance between two functions is thus merely the average over all 1-d projections of 1-d optimal transport problems over these projections (the projection of a function onto a straight line is the integral of that function over the hyperplane orthogonal to that line passing through each point on that line).This can be computed efficiently using Radon transform in the case of densities stored on a grid, or by projecting points in the case of sums of Dirac distributions [RPDB12,BRPP15].They lead to either Radon or Sliced optimal transport problems, and can be used to compute Sliced and Radon Wasserstein barycenters.They are also extremely easy to implement and very fast.
Contrary to entropic regularization or Sinkhorn divergences, this does not rely on a parameter that makes the problem converge to the "true" optimal transport problem as it tends to zero.Projecting on 1-d lines makes the problem fundamentally easier to solve, but it does not produce the same result, and does not converge to the classical optimal transport problem, even as the number of 1-d projection increases.However, when the cost writes as a power of the 1-d distance, c(x, y) = |x − y| p , it also produces a distance between functions, similarly to the Wasserstein-p distance.In this case, a (loose) error bounds exist.Denoting SWp( f , g) the corresponding sliced Wasserstein-p distance in d dimension, one has [Bon13]: (27) for f and g supported in a ball of radius R and some constant C d,p .
While sliced transport is easily implemented, reference Matlab code for sliced and Radon barycenters can be found from the work of Bonneel et al. [BRPP15] at https://github.com/gpeyre/2014-JMIV-SlicedTransport.

Dynamic formulation
A quantity ft is preserved when it is advected by a vector field v if it satisfies the conservation PDE: It can be shown that for the quadratic ground distance, a function ft such that f 0 = f and f 1 = g satisfying this conservation PDE for a vector field v(x,t) minimizing: is the displacement interpolation between f and g.Intuitively, one is looking for the motion of a gas shaped as f at t = 0 and g at In computer graphics, the dynamic formulation has been used for various problems such as computing geodesics on surfaces [SRGB14a], or for other 2-d problems [NG18], using dedicated solvers.

Other solvers
Another recent trend is to solve optimal transport problems using deep learning.It includes learning the optimal transport plan [CDC22], the Wasserstein embedding [CFD18], or Wasserstein barycenters [LDCB22].
Generalizations of the optimal transport problems have also been investigated.Notably, when the two functions being compared do not have the same mass, this leads to the unbalanced and partial optimal transportation problems [CPSV18, BC19, Lév22], with similar solutions (entropy regularized, sliced, semi-discrete or dynamical approaches).Similarly, when the two functions live on two different spaces and one only has access to pairwise distances within each of these spaces that need to be matched, the corresponding problem amounts to the Gromov-Wasserstein problem [Mém11].
Many different solvers are also available within the Python Optimal Transport library "POT" [FCG * 21], available at https: //pythonot.github.io/.Among others, it includes linear program, entropy-regularized, and sliced solvers.In Fig. 6, we summarize the various solvers and problem settings encountered in computer graphics and vision applications, which we detail next.

Image retrieval
Probably the earliest use of optimal transport within the computer graphics and vision literature consists in using the CIE-Lab color distributions of images as a feature for image retrieval [RTG00], and matching it using EMD.Due to low efficiency of optimal transport solvers in the years 2000s (their LP solver took less than a second for m = n = 100, and minutes for m = n = 1000), image colors were first quantified.On average they used less than 9 color clusters and the clusters spanned less than 25 units in any of the L, a and b axes.This allows to retrieve images of the same color distribution within a database -though they also proposed extensions for querying other features with spatial and textural components.To alleviate the linear search complexity of retrieval algorithms and general speed issue, Indyk and Thaper [IT03] proposed an approximate embedding of the optimal transport cost into R k (with relatively large k) equipped with the Manhattan distance in the case of the cost c(x, y) = ∥x − y∥.This allows for the use of Locality-Sensitive Hashing and thus, sublinear query times.Ling and Okada [LO07] also applied their EMD-ℓ 1 formulation for shape retrieval and shape recognition using histograms of SIFT descriptors [Low04], shape contexts [BMP02] and spin images [JH99].
For image retrieval, via histogram matching, a key feature is that histograms might be non-normalized.Pele and Werman [PW08] proposed to modify the original EMD formulation into ÊMD by adding a term accounting for the difference in total mass, which under mild assumption is a metric even for non-normalized histograms.By further thresholding ground distances above 2, and using their modified ÊMD, they turn the LP formulation into a linear-time algorithm, since the maxflow iterations in the simplex algorithm are efficiently reduced through ground distance thresholding.This proved very efficient to match SIFT features [Low04] for which the histogram magnitude conveys critical information.They later extended this method to any threshold value, leading to flow network with drastically reduced number of edges and faster computations [PW09] with applications to SIFT matching, Lab histogram matching, and a combination of both.

Color grading
Altering colors of images is certainly one of the most widely spread application to illustrate one's optimal transport algorithm.Specifically, the goal is to transform the color distribution of an input photograph or video frame so that the color distribution of the result matches that of an exemplar image (Fig. 7).
To achieve this effect, the color distribution of an input image in a given color space is first stored, discarding any spatial information.The distribution can be stored either as a density on a grid, or directly considering individual pixels as Diracs in that color space, or using color clusters, or even considering the color distribution as a Gaussian distribution.The color distribution of the exemplar image is stored similarly.An optimal transport map between the  input and exemplar distributions is then computed.Finally, the color of each pixel of the input image is matched to its target color using the optimal transport map or plan (in case of a plan, one color can be matched with multiple targets -in that case, a weighted average can be used to obtain a single target [SdGP * 15]).
Sliced [PKD05, PKD07, BRPP15] and unbalanced sliced [BC19] formulations are efficient for solving the optimal transport problem in that case, but entropy regularization can be used as well [SdGP * 15] or even using a simplex-based solver on color clusters [MS03,BvdPPH11,RFP14,FSDH14].Even faster is to consider the closed-form expression obtained for Gaussian distributions in the color space [PK07].To illustrate 2-d optimal transport solvers, the color distribution is sometimes stored in the CIE-Lab color space, and the color transfer problem is separated into matching a 1-d luminance channel and 2-d chrominance channels [SdGP * 15, BSPP13].
Directly using the optimal transport result to match colors may lead to quantization artifacts, since similar neighboring colors maybe be stretched to visibly different colors.To alleviate these artefacts, Rabin et al. proposed to regularize the transport plan directly by filtering it using a nonlocal means-like filter [BCM05] adapted to the space of transport maps [RDG10].Later, Rabin et al. [RFP14] directly add regularization within the linear program by relaxing the mass preservation constraints.Similar regularization for color transfer can be obtained via unbalanced or partial optimal trans-port [CPSV18,BC19].Pitié et al. instead propose a variational formulation that enforces the resulting image's gradient to be regularized towards the original image gradients [PKD07] in order to reduce grain.Solving an optimal transport between the graphs of two functions leads to the transportation L p -distance, that can be used to match colors accounting for spatial correlations [TPK * 17] and can more generally be used to compare general functions (that can be negative, and do not have the same mass).
Note that these methods often produce convincing results (see Fig. 7 where we compared standard optimal transport based approaches to direct Adobe Photoshop color matching), but quantitative perceptual evaluation of the color transformed images is challenging, and in practice, to our knowledge, not done.In this context, it is hard to assess which technique in general works best.
Instead of matching one input image color distribution to one exemplar, another goal can be to harmonize the colors of a set of input images, e.g., to make them all look as if they had been shot with the same camera (or with the same camera settings).In that case, it can be useful to transfer the color of each input image to the Wasserstein barycenter of the set of images [BRPP15].A single image can also be matched to multiple exemplars.In that case, its color distribution can be projected to the nearest Wasserstein barycenter of the exemplar images [BPC16], which is formulated as an optimization problem, or Wasserstein barycenters can be used to obtain  In 2016, Gatys et al. [GEB16] defined the style of an image as the Gram matrix of its CNN features, and used a gradient descent to transfer style from an image to another.The core idea to match the distribution of features of two images can be seen as an optimal transport problem, and many subsequent methods have explored this path.Kolkin et al. [KSS19] proposed to use relaxed optimal transport for their style loss.This relaxation called Word Mover Distance [KSKW15] drops the target mass conservation constraint.Hence, an optimal solution moves the total mass of each point to its nearest point in the target distribution.By inverting source/target roles in this computation, and taking the maximum of the two approximations one gets the final relaxed EMD.The result is more akin to a closest-point assignment than an optimal transportation, but it has been presented and used as an optimal transport surrogate.The relighting problem is finally solved by a gradient descent minimizing this relaxed EMD combined with a content loss and a moment matching loss leading to an efficient content-preserving style transfer.
Leclaire and Rabin [LR21] addressed style transfer using a stochastic approximation of semi-discrete optimal transport in patch space.For the continuous measure, they consider the Gaussian random field synthesized patches distribution, and for the discrete measure they consider the discrete empirical distribution of the sample patches.It works by computing multiple restricted optimal transport problems between the source measure and a hierarchical simplification of the target measure, given by multi-resolution clustering.A blending step mixes the style feature with geometric features of the source image.
Style transfer is intrinsically linked with texture synthesis.This is exemplified by the seminal works of Gatys et al. [GEB15,GEB16], who addressed both in a similar gradient descent on extracted features and Gram matrix of the features.Many methods tackling one of the problems can be adapted to the other [HVCB21, LR21], and the reader should refer to Section 4.7 for more details.

Image interpolation
Optimal transport has also been considered as a tool to morph images, notably with images of low structures such as fluids in which classical feature detection+matching algorithms might fail [HZTA04].This however supposes a notion of "mass" to be matched with pixel color or intensity, and that this mass should somewhat be preserved along the morphing, which is not intuitive in the context of image morphing.Optimal transport may also split mass (e.g., one eye on a face could be divided into two eyes during the interpolation!), but solutions to enforce rigidity have been proposed [HMP15].To interpolate between sketches Courty et al. proposed to learn a Wasserstein embedding of sketches, allowing then to interpolate in the embedding space [CFTR17], however the results remain blurry.For the same problem, Lacombe et al. proposed to learn directly the barycenter prediction without building an embedding [LDCB22].Acknowledging the non-natural artefacts caused by Wasserstein-based interpolation, Simon and Aberdam [SA20] proposed to constrain the barycenters to follow an image prior given by a GAN or a sparse prior.In practice this is done through an ADMM optimization, by computing the Wasserstein barycenter and projecting it on the image prior manifold.The resulting interpolation avoids unnatural artefacts (mass splitting...) but is mostly demonstrated on background-less pictures of objects.
Nowadays, many other tools -notably based on deep learningachieve much better results, and optimal-transport based image interpolation is mostly used as illustrative of optimal transport algorithms performance or behavior [Mér11,PPO14].

Optical flow and image matching
Similarly to image interpolation, optimal transport has been used to estimate optical flow and match images.Within a differentiable renderer, Xing et al. [XLY * 22] uses a differentiable Sinkhorn divergence to match pixels between the current rendering estimate and the reference rendering, where each pixel has a unit mass and the ground distance accounts for color and positional distances.Liu et al. [LZYY20] uses the Sinkhorn algorithm with a cosine similarity ground distance between feature maps to establish semantic correspondences between images.In a more specific setting, Kolesov et al. [KKTH10] computes an optical flow of image sequences of fire and smoke by considering that fire and smoke approximately conserve their pixel intensity, using a dynamic formulation.

Image segmentation
Optimal transport has also been applied to the image segmentation problem.In the regional active contour framework Chan et al. [CEN07,NBCE09] proposed to combine shape features with an optimal transport approach to model regional statistics.It strives to find an optimal image segmentation into 2 regions Σ, Σ c , such that the local histograms inside Σ (resp.Σ c ) can be well approximated by a single histogram representing Σ (resp.Σ c ).This approximation is captured by a W 1 distance.The segmentation functional also includes a more traditional term aiming at reducing the region perimeter.Following this trend, Peyré et al. [PFR12] proposed Wasserstein Active Contours, based on similar ideas but with an explicit derivation of the active contour velocity.In addition, to make the computation tractable they relied on sliced Wasserstein distances.Papadakis [PR17] proposed a segmentation energy based on the minimization of the Wasserstein-p optimal transport distance between the histograms of the 2 segmented regions and two example histograms.While the segmentation results are not on par with other image segmentation techniques (even old graph cuts based methods), the mathematical derivation for these algorithms is very interesting and provides new insight on optimal transport driven functionals.

Generative Models and texture synthesis
Texture synthesis is a widely spread application of optimal transport approaches, for both synthesizing new textures from one exemplar image (either patch-based or using statistical models), or for interpolating between different texture models.A related application is the synthesis of natural, non-stochastic, images.
Synthesizing textures.Multiple approaches exist to synthesize new textures from an exemplar image.This includes pixel-based, patch-based, or stochastic models.
The widely used Heeger and Bergen texture synthesis algorithm [HB95] decomposes an exemplar image into steerable pyramid coefficients, and then, from a Gaussian white noise image, iteratively matches each sub-band of its own steerable pyramid decomposition to the exemplar's one using cdf-based 1-d optimal transport.Tartavel et al. [TPG16] instead replace this matching with an LBFGS-based optimization of a sliced optimal transport functional.
Optimal transport is also instrumental for spot noise-based texture synthesis.Discrete spot noise is defined as the addition of n random translations of an input image -as n grows this can serve to synthesize textures.It can also be seen as convolving a random point set with the input image.This process yields a Gaussian Random Field, a R 2 field such that any vector of field values anywhere in R 2 are distributed as a multivariate Gaussian.Galerne et al. [GLR18] proposed to use optimal transport to constrain a spot-noise texture synthesis method to match an exemplar texture.It starts by sampling a spot noise texture, and then solves a semi-discrete optimal transport problem between the set of possible 3×3 generated patches (as a continuous distribution) and the set of exemplar patches (as a discrete distribution).Each spot noise patch can then be transformed to match a texture patch and the final image is obtained by averaging.In a way it is mixing parametric models given by spot noise, generating a texture as a stationary Gaussian Random Field and nonparametric methods, such as patch-based texture synthesis.Leclaire and Rabin's fast semi-discrete optimal transport approximation using hierarchical decomposition of the target measure allowed to improve this method numerically, allowing to use larger image patches [LR21] (see Section 4.3).
Another way to include optimal transport into texture synthesis is to use it as a loss for deep learning models.Heitz et al [HVCB21] proposed to replace the features Gram matrix loss in texture synthesis task [GEB15] with a sliced Wasserstein loss between features (Fig. 8).Houdard et al. [HLPR22] later leveraged the semi-discrete idea in a deep learning synthesis framework.Their method uses the semi-discrete optimal transport map between the texture features continuous distribution synthesized by a generator and the target texture features discrete distribution as a loss to learn to synthesize a stationary texture from a single example.
Optimal transport was also used as a preprocessing for texture patches.Heitz and Neyret perform patch-based texture synthesis by blending patches taken from an exemplar image [HN18].The color distribution within each patch is considered Gaussian, and a Gaussian distribution blending formula is used (without resorting to optimal transport).However, to conform to this Gaussian distribution hypothesis, in a precomputation step, patch color distributions are made Gaussian by computing an optimal transport map between the actual color distribution and a sampled Gaussian distribution, using a linear program solver.
Interpolating between textures.Matusik et al. [MZD05] interpolate between multiple textures by first computing a (non-transport based) warping between pairs of textures, warping them halfway, and blending them using 1-d optimal transport (Fig. 9).More precisely, after alignment using the precomputed warp, textures are blended by decomposing them into steerable pyramid subbands [HB95], and adjusting the bands' histograms by using the 1-d cdf-based formula for Wasserstein barycenters.Rabin et al. [RPDB12] proposed to interpolate between textures by computing sliced Wasserstein barycenters between the distributions of the texture statistical textures.This method can be seen as a way to generalize [HB95]: instead of considering only 1d distribution it considers higher order distributions.Ferradans et al. [FXPA13] proposed an optimal transport-based texture interpolation method in the spot noise model.Static (resp.dynamic) textures are represented as 2-d (resp.3-d) stationary Gaussian processes.Optimal transport is computed between these Gaussian random fields, and texture mixing is achieved by sampling along the geodesics.To synthesize textures that interpolate between more than two input textures, Bonneel et al. use the Radon Wasserstein barycenter of input texture power spectra, and apply a random phase [BRPP15].In a similar setting, Peyré et al.
[PCVS19] better account for correlations between colors by interpolating 3x3 Gaussian covariance matrices instead of scalar values (see Sec. 6.6).
By considering the mean and covariance of a texture CNN features, and using closed-form formulas for the Wasserstein-2 distance between the estimated Gaussians, Vacher et al. [VDKCC20] interpolate textures along geodesics defined by the Wasserstein metric.The results are reported to be better in terms of visual perception.
Synthesizing natural images.Optimal transport was also used in the more general setting of natural image synthesis, in the context of Generative Adversarial Networks (GAN).Arjovsky et al. [ACB17] introduced Wasserstein GAN, which replaces the Jensen-Shannon divergence with a Wasserstein-1 distance.Given a generator G, a discriminator (called critic in the WGAN setting) D, µ G the generated distribution, and µ re f the reference distribution one wants to learn, the WGAN loss writes: D tries to maximize this loss which G tries to minimize in turn.By using the Kantorovich-Rubinstein dual formulation, one gets that for a given generator G, the optimal D can be constrained to produce a measurable Lispchitz function using weight clipping so that L W GAN ∝ W 1 (µ G , µ re f ).Then G in turn tries to minimize this Wasserstein-1 Distance, leading to more realistic samples.Wasserstein GAN perform better than regular GANs, avoiding vanishing gradients and mode collapse problems, but they are still unstable and slow to converge.

Other image processing applications
Optimal transport has been also applied to other image processing tasks, more marginally.Tartavel et al. use a sliced Wasserstein term in a Total Variation (TV) based image restoration framework, and applies it for denoising and super-resolution tasks [TPG16], with the

Image stippling and sampling
Semi-discrete optimal transport provides a way to measure the distance between a set of Diracs and a density.As such, it easily found its place for problems related to the approximation of a density with a set of Diracs, that is, the energy optimization problem consisting in minimizing the optimal transport distance between a density and a sum of Diracs.This approach finds applications for two related problems.First, that of stippling an image, for instance in the context of artistic image stylization or for dithering an image in the context of printing grayscale (or color) images with black (or CMYK) ink droplets.Second, that of sampling a density (most often the uniform distribution on a unit hypercube domain) for Monte Carlo integration, and in particular for physically-based rendering.While the first application mainly looks for visually pleasing point distributions, the second application is more interested in reducing the error or variance in integral estimators.Optimal transport offers benefits for both, as it produces interesting blue noise properties [dG-BOD12, PSC * 15].
The general idea behind these approaches is to initialize with a random point set, and to iterate the computation of a semi-discrete optimal transport computation step between the current point set and a density (uniform or not), and a centering step that moves each sample towards or at the location of the centroid of its power cell, in a spirit similar to Lloyd's algorithm [Llo82].
Stippling an image has been proposed as an application for several semi-discrete approaches [XLC * 16, dGBOD12] and state-of-theart results were obtained by the "BNOT" approach [dGBOD12] (Fig. 10, left, only recently outperformed by a non optimal transport based approach [ARW22]).A different, much faster but approximate, solver on 2-d grids has been proposed [NG18] and allows to generate optimal transport stippling patterns visually similar to that of de Goes et al. [dGBOD12].Qin et al. instead uses entropyregularized optimal transport for sampling multiple classes (e.g., for color stippling), but also to sample meshes [QCHC17].Their algorithm should be general but is demonstrated in two dimensions (or on surfaces embedded in 3-d).A sliced approach also allows for multi-class color stippling [SGSS22].An interesting generalization relates to the optimal transport approximation of a density by other non punctual measures [MM99], such as a single (long) continuous curve [LdGKW19] (Fig. 10, right).
Sampling for rendering involves different constraints.While image stippling usually restricts the problem to the approximation of non-uniform 2-d densities, sampling for rendering generally requires higher-dimensional densities, albeit generally uniform (they are usually non-linearly transformed to match the desired distribution for importance sampling afterwards).This adds difficulties as the semi-discrete optimal transport problem is much easier in the low-dimensional setting.Even in the case of 2-d or 3-d integration problems, semi-discrete optimal transport sampling has shown benefits in term of integration error because the resulting point distribution exhibits blue noise properties [PSC * 15].In the higher dimensional setting (up to 20-d), a sliced approach has notably been proposed for physically-based rendering applications [PBC * 20] by enforcing blue noise properties both in the high-dimensional space and in its lower dimensional projections.This approach also works for the task of multi-class blue noise sampling.Sliced multi-class sampling has also been proposed by Salaün et al. [SGSS22] with a custom energy that tends to produce blue noise spatial error distribution in the image plane, by producing one point set per pixel such that neighboring pixel samples are well interleaved in a blue noise fashion.

Reflectance manipulation
Displacement interpolation can be used to obtain intermediate Bidirectional Reflectance Distribution Functions (BRDFs) between two input BRDFs.While for parameterized BRDF models, it is often more intuitive to directly interpolate BRDF parameters, this is not the case for BRDFs that are captured by gonioreflectometers and

Computational optics and imaging
Optimal transport has a relatively long history in solving nonimaging optic problems [GO03].Specifically, one problem is to redirect some incoming light distribution via a mirror or lens so as to produce a specific reflected or refracted light distribution, e.g., an image.Intuitively, mass conservation translates into light energy preservation, and a transport plan would dictate the shape of the mirror or lens, which makes optimal transport an appropriate tool for solving this kind of problem.
In the work of Schwartzburg et al. [STTP14] a lens is designed by first computing a set of normals using semi-discrete optimal transport.Here, a flat light emitting surface is discretized into centroidal Voronoi cells, and the semi-discrete optimal transport plan between this discretization and the target light distribution on the flat receiver surface produces one-to-one correspondences between the Voronoi cells of the emitter and the power diagram cells produced by the semi-discrete optimal transport map on the receiver.Using Snell's law, the distribution of normals over the lens surface is then computed, and an optimization reconstructs the corresponding surface (Fig. 12, left).
In the far-field reflector design problem of Andre et al. [AAMT15], a light distribution at infinity (on the sphere of directions, as opposed to a near-field receiver surface) is prescribed and should be produced by a point light reflecting off a mirror whose shape needs to be found.The reflector is designed as a set of intersecting paraboloids whose focal point correspond to the point light location and the focal distances need to be determined, and the paraboloids axis cover the set of desired directions.They solve a linear program formulation of optimal transport with cost c(x, y) = − log(1 − x.y) for an incoming light direction x and a paraboloid of axis y, between a discretization of the point light directional intensity distribution and a discretization of the desired light distribution on the sphere.The resulting transport plan indicates how much light from a incoming direction reaches any outgoing direction after being reflected.It also turns out that the solution to the dual linear program directly gives the desired focal distances, and thus solves the problem.
Meyron et al. [MMT18b] generalize the non-imaging optic problem (Fig. 12, right).They show that all the problems of obtaining a target near-field or far-field image from a given collimated or punctual light source whose light is refracted in a lens or reflected over a mirror that can be concave or convex, can all be solved via a semi-discrete optimal transport problem with quadratic cost, up to a change of variable in the power diagram weights.They solve the semi-discrete optimal transport using a damped Newton scheme, with automatic differentiation.

Shape Comparison and retrieval
Mesh surface comparison can be performed using optimal transport [LD09,LD11], by mapping simply connected Riemann surfaces to unit disks and transporting the domains onto each other using a ground distance invariant to Möbius coordinate changes.By discretizing the two domains, the optimal transport problem can be solved using a linear program solver.This method is limited to simply connected surfaces and inherits linear program solver limits, i.e., the sampling of the unit disks should remain relatively small for the transport to be computed.
Similarly to image retrieval, 2-d and 3-d shape retrieval can also benefit from optimal transport tools, by computing shape signatures and matching these signatures using optimal transport.Rabin et al. [RPC10] compute a shape signature by sampling points on the surface and computing quantiles of geodesic distances distributions.They then match these descriptors using the sliced Wasserstein-1 distance.

Shape Interpolation
Optimal transport based image interpolation is only partially satisfying because features can get split in a non semantic way, while other methods would match image features and yield much more natural interpolations.For 2-d or 3-d shapes, the problem is however slightly different.Take for instance two spheres or two icosahedra.Matching features would be impossible (a sphere has no meaningful feature) or error-prone (an icosahedron has repeated features).By considering the shapes as boundaries of indicator functions, and normalizing the mass, one can match the shapes as 2-d or 3-d densities using optimal transport, and perform displacement interpolation.
Rabin et al. [RPDB12] already proposed interpolating between shapes using sliced Wasserstein barycenters.The volume of the shape is first sampled uniformly and the sliced barycenter is computed between these discrete distributions.Lévy [Lév15] proposed to use semi-discrete optimal transport for interpolating between two tetrahedral meshes M and M ′ , given an additional number of desired vertices k.The algorithm is strikingly simple: one starts by sampling M ′ with a set Y = (y i ) i=1•••k of points.The semi-discrete transport between Y and M is then computed as a power diagram restricted to M, from which one can extract the dual regular triangulation.A correspondence between the triangles of this triangulation and the ones of the restricted Delaunay of Y on M ′ can be established since they are related to the same y i , and the common topology can be found by looking for the triangles that exist in both (i.e., the corresponding cells do not degenerate).The vertices positions are linearly interpolated between the barycenters of the power cells on M and the y i on M ′ .This provides an efficient way to interpolate between meshes without resorting to feature point computation and matching (Fig. 13).
Instead of sampling the shape volume, Solomon et al. [SdGP * 15] considered shapes a indicator functions discretized on 2-d or 3d grids and, proposed to compute interpolation between several voxelized shapes using fast convolutional Wasserstein distances.
Although problems are less obvious than for image interpolation, optimal transport-based shape interpolation still faces issues: the interpolated shapes might split and have more connected components than what would be expected, which is due to optimal transport being topology agnostic.

Shape Registration
The optimal transportation problem consists in matching data.As such, a natural application is to look for reliable and consistent pointwise correspondences between shapes.
Point Cloud to Point Cloud Discrete formulation of optimal transport naturally leads to solutions of point clouds (i.e., sets of Dirac) processing problems.In the work of Bonneel and Coeurjolly [BC19], a gradient flow of a sliced partial optimal transport energy allows to advect a set of points in 3-d towards a larger set of points.They introduce a 1-d partial optimal transport algorithm and use it in a sliced context to propose a variant of the Iterative Closest Point (ICP) point registration algorithm [BM92].In classical ICP, a rigid transformation between the input point cloud and the nearest neighbor of each input point in the target point cloud is estimated -a possible issue is that many input points can be matched to the same nearest neighbor.Using partial optimal transport, injectivity is guaranteed and this issue cannot happen.In the proposed approach, the input point cloud is advected towards the target point cloud via the sliced partial optimal transport gradient flow, and the rigid (or similarity) transform is computed between the input point cloud and the advected point cloud (Fig. 14).

Shen et al. 2021 [SFL
* 21] add an entropy-regularized unbalanced optimal transport [CPSV18] layer to a deep learning architecture as a drop-in replacement of closest-point matching, along with confidence weights.Specifically, an optimal transport plan leads to a one-to-many point matching that is not suitable for registration problems, but one-to-one matching can be approximately recovered by a weighted average of the many matches of soft transport plans [SdGP * 15].They apply this to rigid, affine or deformable transformation estimation to register point sets, and use a deep learning architecture to directly predict model parameters.Optimal transport has also been used for scene flow estimation.Li et al. [LLX21] uses the Sinkhorn algorithm to match consecutive point clouds in time using a ground distance accounting for differences in point position, colors, and estimated normals.
Point Cloud to Mesh.Mérigot et al. [MMT18a] introduced semidiscrete optimal transport to compute distances between a point cloud and a mesh.As done traditionally [AHA98], the problem is cast into a power diagram weight optimization, by considering the intersection of the power diagram with the mesh.A damped Newton algorithm is proposed to for finding the power diagram weights, with proven convergence in linear time under mild conditions.They propose two applications to this algorithm, remeshing and Iterative Closest Point.For remeshing, they use the continuous measure on the mesh as a target triangle density measure and the mesh vertices as the point set, which produces a remeshing as the dual of the computed Laguerre diagram.Optimal transport-based Iterative Closest Point is another interesting application of this algorithm, since it allows to drop the costly assignment step and replace it by a semi-discrete optimal transport computation between a constant measure on the mesh and a constant discrete measure on the point set.The alorithm then associates each point of the point set with the barycenter of the corresponding power cell, yielding faster empirical convergence.Dropping the nearest neighbor search also permits to better handle far-away initial poses.
Mesh to Mesh To register a mesh onto another mesh, Mandad et al. [MCSK * 17] relied on Sinkhorn iterations to compute a transport map and deducing a variance-minimizing correspondence.To do so, they rely on a surface discretization and aim at minimizing the variance of transported neighborhoods, by alternating between relocating samples to local barycenters and solving an optimal transport problem.Optimal transport is computed via Sinkhorn iterations, relaxed to allow for small changes of mass, and a coarse-to-fine strategy.Eisenberger [ETLTC20] introduced Deep Shells extending the Smooth Shells [ELC20] correspondence framework.It replaces the non-differentiable costly correspondence search step with an entropy-regularized optimal transport computation.This change makes all steps of the smooth shells approach differentiable, and thus trainable in an end-to-end manner.This approach interestingly mixes optimal transport with the functional map approach.The development of entropy-regularized tools for optimal transportation led to other breakthrough for functional maps shape registration via Sinkhorn filters [PRM * 21].While this is not directly an optimal transport approach, it benefited from advances in this field.

Shape Reconstruction
The shape reconstruction problem can be seen in a similar light as the point cloud to mesh registration problem.
De Goes et al. [GCSAD11] proposed to compute the distance between a 2-d mesh and a (noisy) 2-d pointset where points are located on curves.In this setting, optimal transport is computed by first assigning each point to the nearest mesh edge and distributing its mass regularly on the edge.The corresponding cost can be computed in closed form.This principle is applied to curve reconstruction: starting from an initial Delaunay triangulation of the point set, vertices are iteratively removed by choosing the point whose removal increases the optimal transport distance least, and the triangulation is adapted accordingly.Final edges are filtered out by removing edges without any mass, and vertex positions are optimized to minimize the transport cost (with frozen transport plan).Digne et al. [DCSA * 14] extended the problem to the 3-d setting.They discretize a constant measure on regularly sampled bins on the simplices, and formulate the optimal transport problem as a linear program between points and simplices, with additional constraints enforcing the piecewise constant measure condition.To make the problem tractable, local problems are considered iteratively.Applications to mesh reconstruction, similary to de Goes et al. [GCSAD11], and sharp edge recovery (e.g., after a Marching Cube-based reconstruction) are proposed by alternating between the relocation of vertices to minimize transport cost with frozen transport plan, and transport plan update.Arguably the setting of both methods is not a "true" optimal transport since the continuous Figure 14: A sliced partial optimal transport gradient flow can be used instead of nearest neighbor queries in the ICP algorithm to register point sets.Left.Initial configuration.Middle.Classical ICP to register the point sets using rotation, translation and scaling.Since many points are matched to the same nearest neighbor, this leads to a degenerate configuration in the scaling factor.Right.The sliced optimal transport solution enforces injectivity and leads to a correct registration.measure on the simplicial complex has to be optimized as well: we only know its total mass and that it is piecewise constant on the simplices.

Shape Parameterization
Shape parameterization corresponds to the task of unfolding a mesh on a planar or spherical domain, and has also attracted optimal transport applications.Parameterization methods usually target some quantity (areas, angles, distances) preservation.The most common mapping is conformal mapping which preserves angles but can exhibit large area distortions (Fig. 15).To reduce this distortion, Dominitz and Tannenbaum [DT09] first compute a conformal parameterization of a mesh onto a sphere, and then compute the optimal transport map between the conformal factor and a uniform density over the sphere using a PDE-based gradient flow formulation and multiresolution solver using Discrete Exterior Calculus (DEC).The UV coordinates are then advected along this flow.This allows to produce area-preserving mapping of low angle distortion.Zhao et al. [ZSG * 13] use a similar intuition but rely on a semi-discrete optimal transport solver to transport the conformal factor of the embedding onto a disk to the uniform density over that disk.UV coordinates are then moved towards the centroid of their power cells.
Instead of transporting the conformal factor, Su et al. [SWS * 15] directly transports the area of neighboring triangles in a conformal map parameterization onto the disk or sphere towards the uniform measure using semi-discrete optimal transport.They again move UVs coordinates towards the centroid of their corresponding power cells to obtain an area-preserving map.This further serves to compute a Wasserstein-2 distance between surfaces by comparing their mapped domains and then be used for mesh retrieval [SWS * 15].

Transport on Surfaces
Instead of transporting mass on Euclidean domains, it is also possible to transport densities defined on Riemann surfaces.Solomon et al. [SRGB14a] considered the problem of computing the optimal transport between densities on a manifold surface M represented as a mesh and linked it with geodesics computation.It uses a variational formulation casting the Wasserstein-1 distance between two densities defined on M as solving a flow with constraints on the boundaries of M and divergence of the flow.More precisely, the method looks for a tangential vector field V on M: where n(x) is the normal to the boundary ∂M at point x ∈ ∂M.
The flow lines of J are known to be geodesics on M. The vector field can be easily decomposed using the Helmoltz-Hodge decomposition, and projected on a spectral basis which allows for an efficient Finite Elements discretization.This formulation is both fast and accurate, and yields a family of geodesic distances, depending on the size of the spectral basis, ranging from a purely spectral distance to the "canonical" geodesic distance.However this method cannot be applied for quadratic costs, and does not support smooth interpolation.
An alternative to using a dynamical optimal transport formulation is to rely on entropic regularization, and use a heat kernel instead of a distance [SdGP * 15], since both are related by Varadhan's for-mula [Var67].However the resulting interpolations remain blurry, as expected when using entropy regularization.
Lavenant et al. [LCCS18] proposed a dynamic formulation for quadratic costs.It is based on a similar discretization of the Benamou-Brenier definition on a manifold, but it is strictly convex and more amenable to interpolation computation than the Wasserstein 1 method [SRGB14a].It is also less blurry than entropy-regularized variants or convolutional Wasserstein distances [SdGP * 15].Solomon and Vaxman [SV19] proposed to interpolate between two tangent vector fields defined on a mesh.It uses optimal transport as a way of matching the vector field singularity in an interesting way.Singularities have -possibly negative -indices (i.e.classification) that sum to the same value for two smooth tangent vector fields on the surface.Therefore these singularity indices can be considered as mass to be transported.A variant of discrete optimal transport is presented, allowing for negative mass, and it is solved using a linear program solver, which is completely tractable, since the number of singularities is low.The rest of the vector field is constructed using other dedicated optimization schemes.Going higher in dimension, Peyré et al. [PCVS19] considered the transport of tensor-valued distributions on a mesh surface using quantum-entropy regularized optimization, which in practice leads to a modified Sinkhorn algorithm.They applied their algorithm to the interpolation of orientation fields (as opposed to vector fields) and metric tensors for anisotropic remeshing.

Applications in Topological Data Analysis
In topological data analysis, data can be analyzed via their persistence diagram, an object accounting for birth and death of features along a filtration [EH08].A natural question when dealing with such objects is to compute distances and persistence diagrams, and optimal transport is particularly well suited for these tasks.Turner et al. [TMMH14] proposed a Wasserstein barycenter computation based on the auction algorithm for point matching, but it was extremely slow.Lacombe et al. [LCO18] proposed to use the Sinkhorn algorithm to compute distances between persistence diagrams discretized on a grid.To avoid the discretization step and the rasterization effects it can cause, Vidal et al. [VBT19] proposed a progressive Wasserstein barycenter computation technique.It iteratively decreases the Fréchet energy while increasing the accuracy of the computation, leading to more accurate results and better computation times.

Fluid simulation
Brenier's polar factorization theorem [Bre91] indicates that optimal transport provides a projection of any map to the closest volumepreserving map.By considering the incompressible Euler equations as a flow under a volume-preserving map (i.e., one starts with the identity map, and at each time step, this map is transformed by a divergence-free velocity field), Gallouët and Mérigot proposed a Lagrangian fluid simulation scheme based on semi-discrete optimal transport [GM18] inspired by the discrete optimal transport scheme of Brenier [Bre00], which they illustrate in 2-d.The idea is to start with a set of particles covering the unit square along with initial velocities.Then, at each time step, one advects the particles according to their velocities, computes a semi-discrete optimal transport map between a uniform density on the unit square and the advected particles thus defining power cells (see Sec. 3.2), and finally pushes the particles towards the barycenter of their power cells in order to recover incompressibility.This formalizes the optimal-transport based fluid simulation heuristic presented earlier by de Goes et al. [dGWH * 15].Then, Lévy extended the approach to handle freeboundary fluids [Lév22], by considering a partial optimal transport problem, and illustrate it in 3-d (Fig. 16), with viscosity and surface tension added.Considering the weights of the power cell of each fluid particle as a Lagrange multiplier of mass preservation constraint, an additional Lagrange multiplier is introduced to receive the entire mass of the empty space unoccupied by fluid.The fluid interface is shown to be the intersection between power cells and a balls, which makes the modeling of the interface accurate.To make the fluid simulation faster, Qu et al. [QLDGJ22] leverage semidiscrete entropy-regularized optimal transport to avoid computing costly power diagrams, by applying Sinkhorn algorithm between particles and a uniform density grid.

Animation
Shape animation through optimal transport is a difficult task since optimal transport can break shapes and does not have any topological preservation guarantees.However it is extremely powerful at modeling unstructured data evolution.Zhang et al. [ZSS22] propose to interpolate unstructured freeform objects between user-provided keyframes by combining optimal transport with Continuous Normalizing Flows (CNF) [CRBD18].Here, the Sinkhorn divergence is used to replace the Kullback-Leibler divergence in the CNF framework.It is also used to ensure that the trajectories adhere to the keyframes by computing Wasserstein barycenters in between ODEadvected keyframes.Webanck et al. [WCGG18] proposed to interpolate between clouds keyframes using pointwise density matching.In a different context, optimal transport was used to interpolate between point clouds captured from plants at several timesteps [GKK * 20]: plant scans are segmented and the segments are matched across time.A Sinkhorn-regularized optimal transport is then computed in-between the matched segments, yielding interpolated point cloud sequences.Interestingly this method copes with the necessary topology preservation by matching segments, hence ensuring that optimal transport does not break the plant structure.
Another way to use optimal transport is for parameterizing shapes inside a shape space.Ma et al. [MDZ * 21] used Wasserstein barycenters to model soft underwater swimmers, that are optimized via differentiable swimming simulation, yielding better swimming performances.Here optimal transport serves to parameterize the shape space and optimize easily the shape geometry, since it is fully differentiable using the formulation of Bonneel et al. [BPC16].

Other applications
Optimal transportation has been applied to a wide variety of other problems far beyond computer graphics and vision, from economics [Gal18] to operations research.We shall give a few words Figure 16: Free-boundary 3d fluid simulation including viscosity and surface tension, by projecting velocities on divergence-free velocity fields using semi-discrete optimal transport, with 500k power cells [Lév22] on notable applications, either related to the computer graphics and vision communities, or widely adopted by their communities.
For music interpretation, optimal transport has been applied to music transcription, i.e. recovering the notes from a music spectrum [FFCE16].Henderson and Solomon [HS19] used 1-d optimal transportation to match pitches between two audio signals yielding a natural portamento (i.e., a transition) even between different polyphonic instruments.
In cosmology, optimal transport has been used to reconstruct accurate acoustic baryon variations, using a dedicated algorithm based on semi-discrete optimal transport [vHLM22].The idea is to compute a displacement interpolation between a uniform density (the universe at time zero) and the current distribution of mass in the universe.
In text retrieval and analysis, document distances can be computed by computing words embedding and comparing the words distribution in the Euclidean space, using Relaxed EMD [KSKW15].
Optimal transport has also been used extensively in the broader machine learning literature, as a way to cope with semisupervision [SRGB14b] or to perform transfer learning [CFTR17].
In genomics, genes or gene signatures can be extracted from different cells at several times, and unbalanced optimal transport can be used to recover developmental trajectories by coupling these data over time [SST * 19].The Gene Mover's Distance [BCG * 21] computes the similarity between two cells by computing an earth mover's distance with a ground distance consisting of the ℓ 2 distance in a 200-dimensional euclidean embedding of genes.

Discussion and conclusion
In this survey of optimal transport applications, we encountered various uses of optimal transport, notably for interpolation purposes.For such applications, care must be taken as optimal transport is agnostic to topology changes and semantics.For instance, interpolating between two sketches will produce a result which is well defined from an optimal transport perspective, but might not be interesting for the application: connected components might break and merge throughout the process, and semantics may be lacking.For unorganized matter, such as clouds, fluids, smoke, this is an acceptable behavior, but it is not the case for structured data such as human, animals or object shapes.When the input data do not directly represent probability distributions, one issue for this type of application is the interpretation of the input data (e.g., pixel intensities) as mass that ought to be preserved during interpolation.This survey explored the many facets of optimal transport applications, from archetypal applications such as image color grading to more advanced or niche albeit interesting applications, or even applications whose main purpose is to illustrate one's optimal transport algorithm behavior.While optimal transport's main downside is its computational cost, many fast relaxations or approximations have been proposed and continue to be developed.Still, scaling to millions of variables remain to this day a challenge in time critical applications, preventing its wide integration in many computational problems.

Figure 2 :
Figure 2: Solution to the problem of Fig. 1.Left.The Monge solution yields a transport map T that tells where the mass of f located at x should move.Right.The Kantorovich solution yields a transport plan P that tells how much mass is moved between all pairs (x, y) of locations.This transport plan is very sparse.When there is no Dirac or oddities, both solutions coincide, and the transport plan is the graph of the transport map.In 1-d like here, this graph is monotonically increasing.More generally, in higher dimension, it is the gradient of a convex function.

Figure 4 :
Figure4: Semi-discrete optimal transport leads to power diagrams.Here, a set of 10 bakeries can produce either 5 units of bread (orange dots) or 1 unit of bread (pink dots).Inhabitants of a uniform density population are divided among these bakeries such that the total transportation effort is minimized and their need for bread is exactly satisfied.This leads to a cell decomposition where each inhabitant in one cell will go to its associated bakery (arrows) -bakeries producing more bread lead to bigger cells (in blue).Note that the bakery sites are not necessarily inside their corresponding cells (here, all bakeries producing 1 unit of bread are located outside of their cell).

Figure 5 :
Figure 5: Solution to the problem of Fig. 1 with increasing amount of entropy.From left to right, the Gaussian standard deviation used in the convolutional approach of Solomon et al. [SdGP * 15] is σ = 2, 5, 10, 20, 50.As σ increases, the transport plan converges towards the trivial transport plan P(x, y) = f (x)g(y).

Figure 6 :
Figure6: We summarize the solvers or problem settings used for the applications detailed in this STAR.Note that a solver can be used in different settings -e.g., an entropy-regularized or sliced solver for a semi-discrete OT problem.This classification is thus only indicative. Photoshop

Figure 7 :
Figure 7: From top to bottom.Example-based color grading using Photoshop "Match Color", optimal transport-based color grading using a Gaussian approximation[PK07], sliced optimal transport using a continuous cdf-based transfer plus TV-based reconstruction to reduce grain[PKD07] and sliced optimal transport using a discrete 1-d matching[BRPP15] plus filtering to reduce grain[RDG10]

Figure 8 :
Figure 8: Replacing the Gram matrix distance by a sliced Wasserstein loss in the approach of Gatys et al. [GEB15] produces better texture synthesis and transfer results ; here, a texture synthesis guided by a label mask [HVCB21].
Later improvements have been proposed to stabilize the training, replacing weight clipping by a discriminator gradient penalty [GAA * 17].Wasserstein GANs are a broad image generation technique, which can be used in a wide variety of applications.Lei et al. [LSC * 19] revisit the Wasserstein GAN architecture with a semi-discrete optimal transport perspective.They consider that G optimizes a transport map from the latent space to the true data manifold, while D estimates the Kantorovich potential permitting to compute the Wasserstein-1 distance between the synthesized continuous distribution and the discrete target distribution.Using this geometric view, they show that if the cost c(x, y) = h(x − y) is a strictly convex function, the generator can be written in closed form from the optimal D. While this is shown mainly theoretically, some promising experiments are also shown.GAN (and WGAN) are now superseded by Diffusion Model approaches [HJA20], for which Optimal Transportation can also provide interesting insights.In that context, De Bortoli [DB22] provided bounds on the Wasserstein-1 distance between the distribution synthesized by a diffusion model and the true distribution.More generally, since the goal of any generative model is to learn a distribution of samples (images, shapes, or data in general), then Optimal Transportation is a good way of measuring the performance of the model, by computing the Wasserstein-p distance between the generated distribution and the target distribution.

Figure 10 :Figure 11 :
Figure10: Semi-discrete optimal transport allows to solve the problem of approximating a density with discrete distributions.This finds applications in image stippling when the target distribution is a set of Diracs[dGBOD12], or stylization when the target distribution is a continuous curve[LdGKW19]

Figure 12 :
Figure 12: Left.A brain caustic obtained from a light panel and optimized lens by Schwartzburg et al. [STTP14].Right three images.A lens producing an image of a train from a collimated light by Meyron et al. [MMT18b].

Figure 13 :
Figure13: 3-d mesh interpolation between 6 volume shapes and a ball, performed by Lévy [Lév15] using semi-discrete optimal transport.Optimal transport allows to interpolate between shapes of different topology, and for which feature points would be hard to compute.