Classification and image processing with a semi-discrete scheme for fidelity forced Allen--Cahn on graphs

This paper introduces a semi-discrete implicit Euler (SDIE) scheme for the Allen-Cahn equation (ACE) with fidelity forcing on graphs. Bertozzi and Flenner (2012) pioneered the use of this differential equation as a method for graph classification problems, such as semi-supervised learning and image segmentation. In Merkurjev, Kosti\'c, and Bertozzi (2013), a Merriman-Bence-Osher (MBO) scheme with fidelity forcing was used instead, as the MBO scheme is heuristically similar to the ACE. This paper rigorously establishes the graph MBO scheme with fidelity forcing as a special case of an SDIE scheme for the graph ACE with fidelity forcing. This connection requires using the double-obstacle potential in the ACE, as was shown in Budd and Van Gennip (2020) for ACE without fidelity forcing. We also prove that solutions of the SDIE scheme converge to solutions of the graph ACE with fidelity forcing as the SDIE time step tends to zero. Next, we develop the SDIE scheme as a classification algorithm. We also introduce some innovations into the algorithms for the SDIE and MBO schemes. For large graphs, we use a QR decomposition method to compute an eigendecomposition from a Nystr\"om extension, which outperforms the method used in e.g. Bertozzi and Flenner (2012) in accuracy, stability, and speed. Moreover, we replace the Euler discretisation for the scheme's diffusion step by a computation based on the Strang formula for matrix exponentials. We apply this algorithm to a number of image segmentation problems, and compare the performance of the SDIE and MBO schemes. We find that whilst the general SDIE scheme does not perform better than the MBO special case at this task, our other innovations lead to a significantly better segmentation than that from previous literature. We also empirically quantify the uncertainty that this segmentation inherits from the randomness in the Nystr\"om extension.


Introduction
In this paper, we investigate the Allen-Cahn gradient flow of the Ginzburg-Landau functional on a graph, and the Merriman-Bence-Osher (MBO) scheme on a graph, with fidelity forcing. We extend to the case of fidelity forcing the definition of the semi-discrete implicit Euler (SDIE) scheme introduced in [1] for the graph Allen-Cahn equation (ACE), and prove that the key results of [1] hold true in the fidelity forced setting, i.e.
• the MBO scheme with fidelity forcing is a special case of the SDIE scheme with fidelity forcing; and • the SDIE solution converges to the solution of Allen-Cahn with fidelity forcing as the SDIE time step tends to zero.
We then demonstrate how to employ the SDIE scheme as a classification algorithm, making a number of improvements upon the MBO-based classification in [2]. In particular, we have developed a stable method for extracting an eigendecomposition or singular value decomposition (SVD) from the Nyström extension [3,4] that is both faster and more accurate than the previous method used in [2,5]. Finally, we test the performance of this scheme as an alternative to graph MBO as a method for image processing on the "two cows" segmentation task considered in [2,5].
Given an edge-weighted graph, the goal of two-class graph classification is to partition the vertex set into two subsets in such a way that the total weight of edges within each subset is high and the weight of edges between the two subsets is low. Classification differs from clustering by the addition of some a priori knowledge, i.e. for certain vertices the correct classification is known beforehand. Graph classification has many applications, such as semi-supervised learning and image segmentation [5,6].
All programming for this paper was done in MatlabR2019a. Except within algorithm environments and URLs, all uses of typewriter font indicate in-built Matlab functions.

Contributions of this work
In this paper we have: • Defined a double-obstacle ACE with fidelity forcing (Definition 2.6), and extended the theory of [1] to this equation (Theorem 2.7).
• Defined an SDIE scheme for this ACE (Definition 2.8) and following [1] proved that this scheme is a generalisation of the fidelity forced MBO scheme (Theorem 2.9), derived a Lyapunov functional for the SDIE scheme (Theorem 2.12), and proved that the scheme converges to the ACE solution as the time-step tends to zero (Theorem 2.17).
• Described how to employ the SDIE scheme as a generalisation of the MBO-based classification algorithm in [2].
• Developed a method, inspired by [7], using the QR decomposition to extract an approximate SVD of the normalised graph Laplacian from the Nyström extension (Algorithm 1), which avoids the potential for errors in the method from [2,5] that can arise from taking the square root of a nonpositive-semi-definite matrix, and empirically produces much better performance than the [2,5] method ( Fig. 4) in accuracy, stability, and speed.
• Developed a method using the quadratic error Strang formula for matrix exponentials [8] for computing fidelity forced graph diffusion (Algorithm 2), which empirically incurs a lower error than the error incurred by the semi-implicit Euler method used in [2] (Fig. 6), and explored other techniques with the potential to further reduce error (Table 1).
• Demonstrated the application of these algorithms to image segmentation, particularly the "two cows" images from [2,5], compared the quality of the segmentation to those produced in [2,5] ( Fig. 11), and investigated the uncertainty in these segmentations (Fig. 14), which is inherited from the randomisation in Nyström.
This work extends the work in [1] in four key ways. Firstly, introducing fidelity forcing changes the character of the dynamics, e.g. making graph diffusion affine, which changes a number of results/proofs, and it is thus of interest that the SDIE link continues to hold between the MBO scheme and the ACE. Secondly, this work for the first time considers the SDIE scheme as a tool for applications. Thirdly, in developing the scheme for applications we have made a number of improvements to the methods used in the previous literature [2] for MBO-based classification, which result in a better segmentation of the "two cows" image than that produced in [2] or [5]. Fourthly, we quantify the randomness that the segmentation inherits from the Nyström extension.

Background
In the continuum, a major class of techniques for classification problems relies upon the minimisation of total variation (TV), e.g. the famous Mumford-Shah [9] and Chan-Vese [10] algorithms. These methods are linked to Ginzburg-Landau methods by the fact that the Ginzburg-Landau functional Γconverges to TV [11,12] (a result that continues to hold in the graph context [13]). This motivated a common technique of minimising the Ginzburg-Landau functional in place of TV, e.g. in [14] two-class Chan-Vese segmentation was implemented by replacing TV with the Ginzburg-Landau functional; the resulting energy was minimised by using a fidelity forced MBO scheme.
Inspired by this continuum work, in [5] a method for graph classification was introduced based on minimising the Ginzburg-Landau functional on a graph by evolving the graph Allen-Cahn equation (ACE). The a priori information was incorporated by including a fidelity forcing term, leading to the equation where u is a labelling function which, due to the influence of a double-well potential (e.g. W (x) = x 2 (x − 1) 2 ) will take values close to 0 and 1, indicating the two classes. The a priori knowledge is encoded in the reference f which is supported on Z, a subset of the node set with corresponding projection operator P Z . In the first term ∆ denotes the graph Laplacian and ε, µ > 0 are parameters. All these ingredients will be explained in more detail in Sections 1.3 and 2.
In [2] an alternative method was introduced: a graph Merriman-Bence-Osher (MBO) scheme with fidelity forcing. The original MBO scheme, introduced in a continuum setting in [15] to approximate motion by mean curvature, is an iterative scheme consisting of diffusion alternated with a thresholding step. In [2] this scheme was discretised for use on graphs and the fidelity forcing term −µP Z (u − f ) was added to the diffusion. Heuristically, this MBO scheme was expected to behave similarly to the graph ACE as the thresholding step resembles a "hard" version of the "soft" double-well potential nonlinearity in the ACE.
In [1] it was shown that the graph MBO scheme without fidelity forcing could be obtained as a special case of a semi-discrete implicit Euler (SDIE) scheme for the ACE (without fidelity forcing), if the smooth double-well potential was replaced by the double-obstacle potential defined in (1.1), and that solutions to the SDIE scheme converge to the solution of the graph ACE as the time step converges to zero. This double-obstacle potential was studied for the continuum ACE in [16,17,18] and was used in the graph context in [19]. In [20] a result similar to that obtained in [1] was obtained for a mass-conserving graph MBO scheme. In this paper such a result will be established for the graph MBO scheme with fidelity forcing.
In [21] it was shown that the graph MBO scheme pins (or freezes) when the diffusion time is chosen too small, meaning that a single iteration of the scheme will not introduce any change as the diffusion step will not have pushed the value at any node past the threshold. In [1] it was argued that the SDIE scheme for graph ACE provides a relaxation of the MBO scheme: The hard threshold is replaced by a gradual threshold, which should allow for the use of smaller diffusion times without experiencing pinning. The current paper investigates what impact that has in practical problems.

Groundwork
We briefly summarise the framework for analysis on graphs, following the summary in [1] of the detailed presentation in [21]. A graph G = (V, E) will henceforth be defined to be a finite, simple, undirected, weighted, and connected graph without self-loops with vertex set V , edge set E ⊆ V 2 , and weights {ω ij } i,j∈V with ω ij ≥ 0, ω ij = ω ji , ω ii = 0, and ω ij > 0 if and only if ij ∈ E. We define the following function spaces on G (where X ⊆ R, and T ⊆ R an interval): Defining d i := j∈V ω ij to be the degree of vertex i ∈ V , we define inner products on V (or V X ) and E (where r ∈ [0, 1]): and define the inner product on V t∈T (or V X,t∈T ) These induce inner product norms || · || V , || · || E , and || · || t∈T . Next, we define the L 2 space: and, for T an open interval, we define the Sobolev space H 1 (T ; V) as the set of u ∈ L 2 (T ; V) with weak derivative du/dt ∈ L 2 (T ; V) defined by is the set of ϕ ∈ V t∈T that are infinitely differentiable with respect to time t ∈ T and are compactly supported in T . By [1, Proposition 2.1], u ∈ H 1 (T ; V) if and only if u i ∈ H 1 (T ; R) for each i ∈ V . We define the local H 1 space on any interval T (and likewise define the local L 2 space L 2 loc (T ; V)): Next, we introduce the graph gradient and Laplacian: Note that ∆ is positive semi-definite and self-adjoint with respect to V. As shown in [21], these operators are related via: We can interpret ∆ as a matrix. Define D := diag(d) (i.e. D ii := d i , and D ij := 0 otherwise) to be the diagonal matrix of degrees. Then writing ω for the matrix of weights ω ij we get From ∆ we define the graph diffusion operator : where v(t) = e −t∆ u is the unique solution to dv/dt = −∆v with v(0) = u. Note that e −t∆ 1 = 1, where 1 is the vector of ones. By [1, Proposition 2.2] if u ∈ H 1 (T ; V) and T is bounded below, then e −t∆ u ∈ H 1 (T ; V) with d dt e −t∆ u = e −t∆ du dt − e −t∆ ∆u.
We recall from functional analysis the notation, for any linear operator F : V → V, ||F u|| V and recall the standard result that if F is self-adjoint then ||F || = ρ(F ).
Finally, we recall some notation from [1]: for problems of the form argmin x f (x) we write f g and say f and g are equivalent when g(x) = af (x) + b for a > 0 and b independent of x. As a result, replacing f by g does not affect the minimisers.
Lastly, we define the non-fidelity-forced versions of the graph MBO scheme, the graph ACE and the SDIE scheme.
The MBO scheme is an iterative, two-step process, originally developed in [15] to approximate motion by mean curvature. On a graph, it is defined in [2] by the following iteration: for u n ∈ V {0,1} , and τ > 0 the time step, 1. v n := e −τ ∆ u n , i.e. the diffused state of u n after a time τ .
To define the graph Allen-Cahn equation (ACE ), we first define the graph Ginzburg-Landau functional as in [1] by where W is a double-well potential and ε > 0 is a scaling parameter. Then the ACE results from taking the ·, · V gradient flow of GL ε , which for W differentiable is given by the ODE (where ∇ V is the Hilbert space gradient on V): To facilitate the SDIE link from [1] between the ACE and the MBO scheme, we will henceforth take W to be defined as: the double-obstacle potential studied by Blowey and Elliott [16,17,18] in the continuum and Bosch, Klamt, and Stoll [19] on graphs. As W is not differentiable, we redefine the ACE via the subdifferential of W . As in [1] we say that a pair (u, 1] , and for u ∈ V [0,1] it is the set of β ∈ V such that Finally, the SDIE scheme for the graph ACE is defined in [1] by the formula or more accurately, given the above detail with the subdifferential, where λ := τ /ε and β n+1 ∈ B(u n+1 ). The key results of [1] are then that: • When τ = ε, this scheme is exactly the MBO scheme.
• For ε fixed and τ ↓ 0, this scheme converges to the solution of the double-obstacle ACE (which is a well-posed ODE).

Paper outline
The paper is structured as follows. In Section 1.3 we introduced important concepts and notation for the rest of the paper. Section 2 contains the main theoretical results of this paper. It defines the graph MBO scheme with fidelity forcing, the graph ACE with fidelity forcing, and the SDIE scheme for graph ACE with fidelity forcing. It proves well-posedness for the graph ACE with fidelity forcing and establishes the rigorous link between a particular SDIE scheme and the graph MBO with fidelity forcing. Moreover, it introduces a Lypunov functional for the SDIE scheme with fidelity forcing and proves convergence of solutions of the SDIE schemes to the solution of the graph ACE with fidelity forcing. In Section 3 we explain how the SDIE schemes can be used for graph classification. In particular, the modifications to the existing MBO-based classification algorithms based on the QR decomposition and Strang formula are introduced. Section 4 presents a comparison of the SDIE and MBO scheme for an image segmentation applications, and an investigation into the uncertainty in these segmentations. In Appendix A it is shown that the application of the Euler method used in [2] can be seen as an approximation of the Lie product formula.
2 The Allen-Cahn equation, the MBO scheme, and the SDIE scheme with fidelity forcing 2.1 The MBO scheme with fidelity forcing Following [2,14], we introduce fidelity forcing into the MBO scheme by first defining a fidelity forced diffusion.
Definition 2.1 (Fidelity forced graph diffusion). For u ∈ H 1 loc ([0, ∞); V) and u 0 ∈ V we define fidelity forced diffusion to be: 1] is the reference, µ > 0 paramaterises the strength of the fidelity to the reference, ∅ = Z ⊆ V is the reference data we enforce fidelity on, and P Z is the projection map: For the purposes of this section we shall treat µ,f , and Z as fixed and given. Moreover, sincef only ever appears in the presence of P Z , we define f := P Zf which is supported only on Z and so P Z f = f .
Proof. For the lower bound, we show that A is strictly positive definite. Let u = 0 be written u = v + α1 for v⊥1. Then u, Au V = v, ∆v V + µ u, P Z u V and note that both terms on the right hand side are non-negative. Next, if v = 0 then For the upper bound: A is the sum of self-adjoint matrices, so is self-adjoint and hence has largest eigenvalue equal to ||A|| = ||∆ + µP Z || ≤ ||∆|| + µ||P Z || = ||∆|| + µ.
This solution map has the following properties: i. If u 0 ≤ v 0 vertexwise, then for all t ≥ 0, S t u 0 ≤ S t v 0 vertexwise.
i. By definition, S t v 0 − S t u 0 = e −tA (v 0 − u 0 ). Thus it suffices to show that e −tA is a non-negative matrix for t ≥ 0. Note that the off-diagonal elements of −tA are non-negative: for i = j, −tA ij = −t∆ ij = td r i ω ij ≥ 0. Thus for some a > 0, M := aI − tA is a non-negative matrix and thus e M is a non-negative matrix. It follows that e −tA = e −a e M is a non-negative matrix.
Then min t ∈[0,t] min i∈V u i (t ) < 0 and since each u i is continuous this minimum is attained at some t * ∈ [0, t] and i * ∈ V . Fix such a t * . Then for any i * minimising u(t * ), since u i * (t * ) < 0 we must have t * > 0, so u i * is differentiable at t * with du i * /dt(t * ) = 0. However by (2.1) We claim that we can choose a suitable minimiser i * such that this is strictly positive. First, since any such i * is a minimiser of u(t * ), and f i ≥ 0 for all i, it follows that each term is nonnegative. Next, suppose such an i * has a neighbour j such that u j (t * ) > u i * (t * ), then it follows that (∆u(t * )) i * < 0 and we have the claim. Otherwise, all the neighbours of that i * are also minimisers of u(t * ). Repeating this same argument on each of those, we either have the claim for the above reason or we find a minimiser i * ∈ Z, since G is connected. But in that case µ(f i * −u i * (t * )) ≥ −µu i * (t * ) > 0 and we again have the claim. Hence du i * /dt(t * ) > 0, a contradiction. Therefore u i (t) ≥ 0 for all t.
The case for u i (t) ≤ 1 is likewise.
where S τ is the solution map from (2.2). Note that (2.3) has variational form similar to that given for graph MBO in [21], which we can then re-write as in [1]:

The Allen-Cahn equation with fidelity forcing
To derive the Allen-Cahn equation (ACE) with fidelity forcing, we re-define the Ginzburg-Landau energy to include a fidelity term (recalling the potential W from (1.1)): Taking the gradient flow of (2.5) we obtain the Allen-Cahn equation with fidelity: Where B(u(t)) is defined as in (1.2). Recalling that A := ∆ + µP Z and f := P Zf , we can rewrite the ODE in (2.6) as As in [1], we can give an explicit expression for β given sufficient regularity on u.
Thus following [1] we define the double-obstacle ACE with fidelity forcing.
Definition 2.6 (Double-obstacle ACE with fidelity forcing). Let T be an interval. Then a pair (u, β) ∈ V [0,1],t∈T × V t∈T is a solution to double-obstacle ACE with fidelity forcing on T when u ∈ H 1 loc (T ; V) ∩ C 0 (T ; V) and for almost every t ∈ T , We frequently will refer to just u as a solution to (2.8), since β is a.e. uniquely determined as a function of u by (2.7).
We now demonstrate that this has the same key properties, mutatis mutandis, as the ACE in [1].
vertexwise at a.e. t ∈ T . Next, take the inner product with w + := max(w, 0), the vertexwise positive part of w: As in the proof of [1, Theorem B.2], the RHS ≤ 0. For the rest of the proof to go through as in that Theorem, it suffices to check that Aw(t), w (e) Let u solve (2.8). Then for a.e. t ∈ T , β(t) ∈ B(u(t)), and at such t we have Next, let u be as in (2.12). Then it is easy to check that and that this satisfies (2.8). Next, we check the regularity of u. The continuity of u is immediate, as it is a sum of two smooth terms and the integral of a locally bounded function. To check that u ∈ H 1 loc : u is bounded, so is locally L 2 , and by above du/dt is a sum of (respectively) two smooth functions, a bounded function and the integral of a locally bounded function, so is locally bounded and hence locally L 2 .
we have by (f) that and so, writing B s := (e −sB − I)/s for s > 0 (which we note commutes with B), Note that B s is self-adjoint, and as −B has largest eigenvalue less than ε −1 we have ||B s || < (e s/ε − 1)/s, with RHS monotonically increasing in s for s > 0. 1 Since 1] for all t, for t 2 − t 1 < 1 we have: completing the proof.
(h) We prove this as Theorem 2.21 for the solution given by Theorem 2.17, which by uniqueness is the general solution.
Note. Given the various forward references in the above proof, we take care to avoid circularity by not using the corresponding results until they have been proven.

The SDIE scheme with fidelity forcing and link to the MBO scheme
for a β n+1 ∈ B(u n+1 ) to be characterised in Theorem 2.9.
As in [1], we can plot (2.16) to visualise the SDIE scheme (2.14) as a piecewise linear relaxation of the MBO thresholding rule. Next, we note that we have the same Lipschitz continuity property from [1].

A Lyapunov functional for the SDIE scheme
has the following properties: i. It is strictly concave.
ii. It has first variation at u We expand around u: This has uniform lower bound and is a Lyapunov functional for (2.14), i.e. H(u n+1 ) ≤ H(u n ) with equality if and only if u n+1 = u n for u n+1 defined by (2.14). In particular, we have that Proof. We can rewrite H as: so we have by Proposition 2.2 that Next we show that H is a Lyapunov functional. By the concavity of J: with equality in ( * ) if and only if u n+1 = u n as the concavity of J is strict, and where the last line follows since by β n+1 ∈ B(u n+1 ) Corollary 2.13. For λ = 1 (i.e. the MBO case) the sequence u n defined by (2.14) is eventually constant.
Proof. For the first claim, the proof follows as in [

Convergence of the SDIE scheme with fidelity forcing
Following [1], we first derive the n th term for the semi discrete sceme.
Proposition 2.14 (Cf. [1, Proposition 5.1]). For the sake of notation, define: Then for λ ∈ [0, 1) the sequence generated by (2.14) is given by: Proof. We can rewrite (2.14) as We then check (2.22) inductively. The n = 0 case is trivial, and we have that completing the proof.
Next, we consider the asymptotics of each term in (2.22).
Note that O(τ ) is the same as O(λ), and also that, for bounded (in τ ) invertible matrices X and Y with bounded (in τ ) inverses, ii. We note that We next consider each term of w individually. First, we seek to show that so it suffices to show that This holds if and only if and since B = A − ε −1 I the result follows. Next we seek to show that which holds if and only if and since B = A − ε −1 I the result follows.
iii. We follow [1, Proposition 5.1] and consider the difference as desired.
Following [1] we define the piecewise constant function z τ : following the bookkeeping notation of [1] of using the superscript [τ ] to keep track of the time-step governing u n and β n . Next, we have weak convergence of z, up to a subsequence, as in [1].
n < ε for all n, there exists a function z : [0, ∞) → V and a subsequence τ n such that z τn converges weakly to z in L 2 loc . It follows that: (C ) Passing to a further subsequence of τ n , we have strong convergence of the Cesàro sums, i.e. for all Proof. Follows as in [1, Proposition 5.2] and [1, Corollary 5.3] mutatis mutandis.
We thus infer convergence of the SDIE iterates as in [1]. Taking τ to zero along the sequence τ n , we can define for all t ≥ 0:û (t) := lim n→∞,m= t/τn (2.24) By the above discussion, we can rewrite this as: Next, note that mτ n = τ n t/τ n =: t + η n where η n ∈ [0, τ n ). Therefore So we have that Note the similarity between (2.25) and the explicit form for ACE solutions (2.12). Thus, to prove that u is a solution to (2.8) it suffices to show that: These results follow as in [1]. Item (a) follows immediately from the fact that for all n, u Towards (b), note that each term in (2.25) except for the integral is C ∞ ([0, ∞); V), and that t 0 z(s) ds is continuous since z is locally bounded as a weak limit of locally uniformly bounded functions. Hencê u is continuous. By (a),û is bounded so is locally L 2 . Finally, it is easy to check thatû has weak derivative This is locally L 2 since (for T a bounded interval) B and e −tB are bounded operators from L 2 (T ; V) to L 2 (T ; V), z is a weak limit of locally L 2 functions so is locally L 2 , and t 0 z(s) ds is continuous so is locally bounded.
Proof. Let x n : t → u [τn] t/τn and let τ n k be any subsequence of τ n . Then by the theorem there is a subsubsequence τ n k l such that x n k l →ũ pointwise whereũ is a solution to (2.8) with initial conditioñ u(0) = u 0 . By Theorem 2.7(c) such solutions are unique, soũ =û. Thus there exists x (in particular, x =û) such that every subsequence of x n has a convergent subsubsequence with limit x. It follows by a standard fact of topological spaces that x n →û pointwise as n → ∞. 6 Finally, we follow [1] to use Theorem 2.17 to deduce that the Ginzburg-Landau energy monotonically decreases along the ACE trajectories by considering the Lyapunov functional H defined in (2.19). We also deduce well-posedness of the ACE.
Proof. Expanding and collecting terms in (2.5), we find that for u ∈ V [0,1] Suppose xn x. Then there exists U which is open in the topology of pointwise convergence such that x ∈ U and infinitely many xn / ∈ U . Choose xn k such that for all k, xn k / ∈ U . This subsequence has no further subsubsequence converging to x.
Then by (2.19) and recalling that λ := τ /ε To show the uniform convergence, note that ||u|| V and ||u − 2µA −1 f || V are uniformly bounded in u for u ∈ V [0,1] . Thus it suffices to prove that ||Q τ || is uniformly bounded in τ . But Q τ is self-adjoint, and if ξ k is an eigenvalue of A then Q τ has corresponding eigenvalue Then by the above expression for H τ since the right-hand entry in the inner product is bounded uniformly in τ . . Suppose and ε −1 / ∈ σ(A). Then the ACE trajectory u defined by Definition 2.6 has GL ε,µ,f ,Z (u(t)) monotonically decreasing in t. More precisely: for all t > s ≥ 0, Furthermore, this entails an explicit C 0,1/2 condition for u ||u(s) − u(t)|| V ≤ |t − s| 2 GL ε,µ,f ,Z (u(0)).

The SDIE scheme as a classification algorithm
As was noted in the introduction, trajectories of the ACE and of the MBO scheme on graphs can be deployed as classification algorithms, as was originally done in work by Bertozzi and co-authors in [2,5].
In the above, we have shown that the SDIE scheme (2.14) introduced in [1] generalises the MBO scheme into a family of schemes, all of the same computational cost as the MBO scheme, and which as τ ↓ 0 become increasingly more accurate approximations of the ACE trajectories. In the remainder of this paper, we will investigate whether the use of these schemes can significantly improve on the use of the MBO scheme to segment the "two cows" images from [2,5]. We will also discuss other potential improvements to the method of [2].

Groundwork
In this section, we will describe the framework for applying graph dynamics to classification problems, following [2,5].
The individuals that we seek to classify we will denote as a set V , upon which we have some information x : V → R q . For example, in image segmentation V is the pixels of the image, and x is the greyscale/RGB/etc. values at each pixel. Furthermore, we have labelled reference data which we shall denote as Z ⊆ V , and binary reference labels f supported on Z.
To build our graph, we first construct feature vectors z : V → R . The philosophy behind these is that we want vertices which are "similar" (and hence should be similarly labelled) to have feature vectors that are "close together". What this means in practice will depend on the application, e.g. [23] incorporates texture into the features and [5,6] give other options.
Next, we construct the weights on the graph by deciding on the edge set E (e.g. and for each ij ∈ E computing ω ij := Ω(z i , z j ) (and for ij / ∈ E, ω ij := 0). There are a number of standard choices for the similarity function Ω, see for example [5,24,25,26]. The similarity function we will use in this paper is the Gaussian function: Finally, from these weights we compute the graph Laplacian so that we can employ the graph ODEs discussed in the previous sections. In particular, we compute the normalised (a.k.a. random walk) graph Laplacian, i.e. we will henceforth take r = 1 and so ∆ = I − D −1 ω. We will also consider the symmetric normalised Laplacian ∆ s := I − D −1/2 ωD −1/2 , though this does not fit into the above framework. This normalisation matters because, as discussed in [5], the segmentation properties of diffusion-based graph dynamics are linked to the segmentation properties of the eigenvectors of the corresponding Laplacian. As shown in [5, Fig. 2.1], normalisation vastly improves these segmentation properties. As that figure looked at the symmetric normalised Laplacian, we include Fig. 2 to show the difference between the symmetric normalised and the random walk Laplacian. Figure 2: Second, third, and fourth eigenvectors of the random walk Laplacian (left) and symmetric normalised Laplacian (right) for the graph built on one of the "two cows" images from Section 4, computed using Algorithm 1.

The basic classification algorithm
For some time step 0 < τ ≤ ε note that 1. Input: Vector x : V → R q , reference data Z, and labels f supported on Z.

Compute A and therefore b.
5. From some initial condition u 0 , compute the SDIE sequence u n until it meets a stopping condition at some n = N .
Unfortunately, as written this algorithm cannot be feasibly run. The chief obstacle is that in many applications A is too large to store in memory, yet we need to quickly compute e −τ A u, potentially a large number of times. We also need to compute b accurately. Moreover, in general A does not have low numerical rank, so it cannot be well approximated by a low-rank matrix. In the rest of this section we describe our modifications to this basic algorithm that make it computationally efficient.

Matrix compression and approximate SVDs
We will need to compress ∆ into something we can store in memory. Following [2,5], we employ the Nyström extension [3,4]. We choose K |V | to be the rank to which we will compress ∆, and choose nonempty Nyström interpolation sets X 1 ⊆ V \ Z and X 2 ⊆ Z at random such that |X| = K where X := X 1 ∪X 2 . Then using the function ij → ω ij we compute ω V X := ω(V, X) (i.e. ω V X := (ω ij ) i∈V,j∈X ) and ω XX := ω(X, X) and then the Nyström extension is the approximation: Note that this avoids having to compute the full matrix ω which in many applications is too large to store in memory. We next compute an approximation for the degree vector d and degree matrix D := diag(d) of our graph (recall that the operator diag sends a vector to the diagonal matrix with that vector as diagonal, and also vice versa) We thus approximately normalise ω Following [2,5], we next compute an approximate eigendecomposition ofω. We here diverge from the method of [2,5]. The method used in those papers requires taking the matrix square root of ω XX , but unless ω XX is the zero matrix it will not be positive semi-definite. 7 Whilst this clearly does not prevent the method of [2,5] from working in practice, it is a potential source of error and we found it conceptually troubling. We here present an improved method, adapted from the method from [7] for computing a singular value decomposition (SVD) from an adaptive cross approximation (ACA) (see [7] for a definition of ACA).
First, we compute the thin QR decomposition (see [27,Theorem 5

.2.2])
where Q ∈ R |V |×K is orthonormal and R ∈ R K×K is upper triangular. Next, we compute the eigendecomposition Rω −1 XX R T = ΦΣΦ T where Φ ∈ R K×K is orthogonal and Σ ∈ R K×K is diagonal. It follows thatω has approximate eigendecomposition:ω ≈ QΦΣΦ T Q T = U s ΣU T s where U s := QΦ is orthonormal. This gives an approximate eigendecomposition of the symmetric normalised Laplacian ∆ s = I −ω ≈ U s (I K − Σ)U T s = U s ΛU T s 7 It is easy to see that non-zero ω XX has negative eigenvalues, as it has zero trace.
where I K is the K × K identity matrix and Λ := I K − Σ, and so we get an approximate SVD of the random walk Laplacian where U 1 :=D −1/2 U s and U 2 :=D 1/2 U s . As in [7], it is easy to see that this approach is O(K|V |) in space and O(K 2 |V | + K 3 ) in time. We summarise this all as Algorithm 1.
Algorithm 1 QR decomposition-based Nyström method for computing an approximate SVD of ∆, inspired by [7].

Numerical assessment of the matrix compression method
We consider the accuracy of our Nyström-QR approach for the compression of the symmetric normalised Laplacian ∆ s built on the simple image in Fig. 3, containing |V | = 6400 pixels, which is sufficiently small that we can compute the true value of ∆ s to high accuracy. For K ∈ {50, 100, . . . , 500}, we compare the rank K approximation U s ΛU T s with the true ∆ s in terms of the relative Frobenius distance, i.e. ||U s ΛU T s − ∆ s || F /||∆ s || F . Moreover, we compare these errors to the errors incurred by other low-rank approximations of ∆ s , namely the Nyström method suggested by [2,5], the Halko-Martinsson-Tropp (HMT) method 8 [29] (a randomised algorithm), and the rank K approximation of ∆ s obtained by setting all but its K leading singular values to 0. By the Eckart-Young theorem [30] (see also [27,Theorem 2.4.8]) the latter is the optimal rank K approximation of ∆ s with respect to the Frobenius distance. In addition to the methods' accuracy, we measure their complexity in terms of the elapsed time used for their execution, obtained with an implementation in MatlabR2019a on the set-up described in Section 4.2.  We report the relative Frobenius distance in the left of Fig. 4. As the Nyström (and HMT) methods are randomised, we repeat the experiments 100 times and plot the mean error as solid lines and the mean error ± standard deviation as dotted lines. To expose the difference between the methods for K ≥ 100, we subtract the SVD error from the other errors and show this difference in the central figure. In the right figure, we compare the complexity of the algorithms in terms of their average runtime. The SVD timing is constant in K as we always computed a full SVD and kept the largest K singular values.
We observe that the Nyström-QR method outperforms the Nyström method from [2,5]: it is faster, more accurate, and is stable for small K. In terms of accuracy, the Nyström-QR error is equal to only 4.6 × 10 −5 plus the optimal low-rank error. Notably, this additional error is (almost) constant, indicating that the Nyström-QR method and the SVD converge at a similar rate. The Nyström-QR randomness has hardly any effect on the error; the standard deviation of the relative error ranges from 8.87 × 10 −7 (K = 50) to 5.00 × 10 −8 (K = 500). By contrast, for the Nyström method from [2,5] we see much more random variation.

Implementing the SDIE scheme: a Strang formula method
To compute the iterates of our SDIE scheme, we will need to compute an approximation for S τ u n . In [2], at each iteration S τ u n was approximated via a semi-implicit Euler method, which therefore incurred a linear (in the time step of the Euler method, i.e. δt in the below notation) error in both the e −τ A u n and b terms (plus a spectrum truncation error). In Appendix A we show that the method from [2] works by approximating a Lie product formula approximation (see [31,Theorem 2.11]) of e −τ A , therefore we propose as an improvement a scheme that directly employs the superior Strang formula 9 to approximate e −τ A u n -with quadratic error (plus a spectrum truncation error). We also consider potential improvements of the accuracy of computing b: by expressing b as an integral and using quadrature methods; 10 by expressing b as a solution to the ODE from (2.1) with initial condition 0, and using the Euler method from [2] with a very small time step (or a higher-order ODE solver); 11 or by computing the closed form solution for b directly using the Woodbury identity. We therefore improve on the accuracy of computing S τ u at low cost.
The Strang formula for matrix exponentials [8] is given, for k > 0 a parameter and O relative to the limit k → ∞, by e X+Y = (e Given ∆ ≈ U 1 ΛU T 2 as in Algorithm 1 (the case for ∆ s is likewise) for any u ∈ V we compute (writing where E is a spectrum truncation error and v k is defined by v 0 := u, and v r for r ∈ {1, ..., k} is defined iteratively by where is the Hadamard (i.e. elementwise) product, a 1 (δt) := exp(−µδtχ Z ), a 2 (δt) := exp(−δt diag(Λ))− 1 K , and a 3 (δt) := exp(− 1 2 µδtχ Z ) is the elementwise square root of a 1 (δt) (where exp is applied elementwise, and 1 K is the vector of K ones). In Fig. 6, we verify on a simple image that this method has quadratic error (plus a spectrum truncation error) and outperforms the [2] Euler method. Furthermore any of which we can approximate efficiently via the above methods. Furthermore, as we only need to compute b once, we can take a large value, k b , for k in those methods. As is standard for quadrature methods, the accuracy can often be improved by subdividing the interval. For example, using Simpson's rule and subdividing into intervals of length h := τ /(2m) we get , and E 2 is the spectrum truncation error. Finally, we can also let Matlab compute its preferred quadrature using the in-built integrate function, using either the Strang formula or Yoshida method to compute the integrand. However, we found this to be very slow.
Another method to compute b is to solve an ODE. We note that, by (2.2), b is the fidelity forced diffusion of 0 at time τ , i.e.
has v(τ ) = b. Hence we can approximate b by solving via the semi-implicit Euler method from [2]. Since we only need to compute b once we can choose a small time step, i.e. a time step of τ /k b for k b large, for this Euler method. One could also choose a higher-order ODE solver for this same reason, however as [2] notes this ODE is stiff, which we found causes standard solvers such as ode45 (i.e. Dormand-Prince-(4, 5) [33]) to be inaccurate, and we ran into the issue of the Matlab stiff solvers requiring matrices too large to fit in memory.
Finally, we can try to compute the formula for b directly. By the Strang formula or Yoshida method, we can efficiently compute g :

by the Woodbury identity [34]
where Y := I + µP Z , superscript + denotes the pseudoinverse, and recall that Σ = I K − Λ. Then where y := (1 + µχ Z ) −1 , reciprocation applied elementwise, and h is given by solving where we define y U 1 as columnwise Hadamard multiplication, i.e. (y U 1 ) ij := y i (U 1 ) ij . We compare the accuracy of these approximations of b in Table 1, and observe that no method is hands-down superior. Table 1 also indicates that the likely reason for methods like Simpson's rule not performing as well as expected is that the spectrum truncation error is dominating.
Given these ingredients, it is then straightforward to compute the SDIE scheme sequence via Algorithm 2.

Numerical assessment of methods
In this section, we will build our graphs on the image in Fig. 5, which has sufficiently few pixels that we can compute the true values of e −τ A u (with A here given by ∆ s + µP Z ) and b to high accuracy. First, in Fig. 6 we investigate the accuracy of the Strang formula and Yoshida method vs. the [2] Euler method. We take |V | = 1600, τ = 0.5, u a random vector given by Matlab's rand(1600,1), µ = 1, and Z as the left two quadrants of the image. We consider two cases: one where K = |V | (i.e. full-rank) and one where K = |V |. We observe that the Strang formula and Yoshida method are more accurate than the Euler method in both cases, and that the Yoshida method is more accurate than the Strang formula, but only barely in the rank-reduced case. Furthermore, the log-log gradients of the Strang formula error and the Yoshida method error (excluding the outliers for small k values and the outliers caused by errors from reaching machine precision) in Fig. 6(b) are respectively 2.000 and 3.997 (computed using polyfit), confirming that these methods achieve their theoretical orders of error in the full-rank case.

Algorithm 2
The SDIE scheme via a Strang formula method.
Computes the terminus of the SDIE scheme.

2:
if using the quadrature method then 3: Computed using Strang formula or Yoshida method, with parameter k b Approximates µ τ 0 F 1 (t) dt by some quadrature method 5: else if using the ODE method then 6: else if using the Woodbury identity method then 10: end if 16: 17: for r = 1 to k do Strang formula iteration 24: end for Approximates v = S τ u n 26: for i ∈ V do 27: if v i < τ /2ε then     Next, in Table 1 we compare the accuracy of the different methods for computing b. We take µ = 1, Z as the left two quadrants of the image, f as equal to the image on Z, and k b = 1000 in the Strang formula/Yoshida method approximations for e −tA f and in the [2] Euler scheme. We observe that the rank reduction plays a significant role in the errors incurred, and that no method is hands-down superior. In the "two cows" application (Example 4.1), we have observed that (interestingly) the [2] Euler method yields the best segmentation. A topic for future research can be whether this is generally true for large matrices.   Table 1: Comparison of the relative 2 errors from the methods for approximating b on the image from Fig. 5. We did not compute integrate for K = 1600 as it ran too slowly. Bold entries indicate the smallest error in that column.

Examples
We consider three examples, all using images of cows from the Microsoft Research Cambridge Object Recognition Image Database 13 ). Some of these images have been used before by [2,5] to illustrate and test graph-based segmentation algorithms.
Example 4.1 (Two cows). We first introduce the two cows example familiar from [2,5]. We take the image of two cows in the top left of Fig. 7 as the reference data Z and the segmentation in the bottom left as the reference labels f , which separate the cows from the background. We apply the SDIE scheme to segment the image of two cows shown in the top right of Fig. 7, aiming to separate the cows from the background, and compare to the ground truth in the bottom right. Both images are RGB images of size 480 × 640 pixels, i.e. the reference data and the image are tensors of size 480 × 640 × 3.

Data image Image
Data segmentation Ground truth segmentation Figure 7: Two cows: the reference data image, the reference f , the image to be segmented, and the ground truth segmentation associated to Example 4.1. We drew both segmentations by hand.
We will use Example 4.1 to illustrate the application of the SDIE scheme. Moreover, we will run several numerical experiments on this example. Namely, we will: • study the influence of the parameters ε and τ , comparing non-MBO SDIE (τ < ε) and MBO SDIE (τ = ε); • compare different normalisations of the graph Laplacian, i.e. the symmetric vs. random walk normalisation; • investigate the influence of the Nyström-QR approximation of the graph Laplacian in terms of the rank K; and • quantify the inherent uncertainty in the computational strategy induced by the randomised Nyström approximation.
Example 4.2 (Greyscale). This example is the greyscale version of Example 4.1. Hence, we map the images in Fig. 7 to greyscale using rgb2gray. We show the greyscale images in Fig. 8. We use the same segmentation of the reference data image as in Example 4.1. The greyscale images are matrices of size 480 × 640. Figure 8: Two cows greyscale: the reference data image and the image to be segmented associated to Example 4.2. Note that the reference f and the ground truth segmentation are identical to those in Fig. 7.

Data image Image
The greyscale image is much harder to segment than the RGB image, as there is no clear colour separation. With Example 4.2, we aim to illustrate the performance of the SDIE scheme in a harder segmentation task.

Example 4.3 (Many cows).
In this example, we have concatenated four images of cows that we aim to segment as a whole. We show the concatenated image in Fig. 9. Again, we shall separate the cows from the background. As reference data, we use the reference data image and labels as in Example 4.1. Hence, the reference data is a tensor of size 480 × 640 × 3. The image consists of approximately 1.23 megapixels. It is represented by a tensor of size 480 × 2560 × 3. Figure 9: Many cows: the image to be segmented associated to Example 4.3. Note that the reference data image and labels are identical to those in Fig. 7 (left).

Image
With Example 4.3, we will illustrate the application of the SDIE scheme to large scale images, as well as the case where the image and reference data are of different sizes.
Note. In each of these examples we took as reference data a separate reference data image. However, our algorithm does not require this, and one could take a subset of the pixels of a single image to be the reference data, and thereby investigate the impact of the relative size of the reference data on the segmentation, which is beyond the scope of this paper but is explored for the [2] MBO segmentation algorithm and related methods in [35, Fig. 4].

Set-up
Algorithms We here use the Nyström-QR method to compute the rank K approximation to the Laplacian, and we use the [2] semi-implicit Euler method (with time step τ /k b ) to compute b (as we found that in practice this worked best for the above examples).
Feature vectors Let N (i) denote the 3×3 neighbourhood of pixel i ∈ V in the image (with replication padding at borders performed via padarray) and let K be a 3×3 Gaussian kernel with standard deviation 1 (computed via fspecial('gaussian',3,1)). Thus x| N (i) can be viewed as a triple of 3 × 3 matrices x J | N (i) for J ∈ {R, G, B} (i.e. one in each of the R, G, and B channels). Then in each channel we define and thus define z i : , which we reshaped (using reshape) so that z ∈ R |V |×27 .
Interpolation sets For the interpolation sets X 1 , X 2 in Nyström, we took K/2 vertices from the reference data image and K/2 vertices from the image to be segmented, chosen at random using randperm.
We experimented with choosing interpolation sets using ACA (see [7]), but this showed no improvement over choosing random sets, and ran much slower.
Initial condition We took the initial condition, i.e. u 0 , to equal the reference f on the reference data vertices and to equal 0.49 on the vertices of the image to be segmented (where f labels 'cow' with 1 and 'not cow' with 0). We used 0.49 rather than the more natural 0.5 because the latter led to much more of the background (e.g. the grass) getting labelled as 'cow'. This choice can be viewed as incorporating the slight extra a priori information that the image to be segmented has more non-cow than cow.

Two cows
We begin with some examples of segmentations obtained from the SDIE scheme. Based on these, we illustrate the progression of the algorithm and discuss the segmentation output qualitatively. We will give a quantitative analysis in Section 4.3.1. Note that we give here merely typical realisations of the random output of the algorithm-the output is random due to the random choice of interpolation sets in the Nyström approximation. We investigate the stochasticity of the algorithm in Section 4.3.2.
(a) Reference data image, image to be segmented, and image masked with ground truth segmentation.    We consider three different cases: the MBO case τ = ε and two non-MBO cases, where τ ε, and τ < ε. We show the resulting reconstructions from these methods in Fig. 10. Moreover, we show the progression of the algorithms in Fig. 12. The parameters not given in the captions of Fig. 10 are µ = 30, σ = 35, k b = 1, k = 1, δ = 10 −10 , and K = 70.
Note. The regime τ > ε is not of much interest since, by [1,Remark 4.8] mutatis mutandis, in this regime the SDIE scheme has non-unique solution for the update, of which one is just the MBO solution.
Comparing the results in Fig. 10, we see roughly equivalent segmentations and segmentation errors. Indeed, the cows are generally nicely segmented in each of the cases. However, the segmentation also labels as 'cow' a part of the wall in the background and small clumps of grass, while a small part of the left cow's snout is cut out. This may be because the reference data image does not contain these features and so the scheme is not able to handle them correctly.
In Fig. 11 we compare the result of Fig. 10(d) (our best segmentation) with the results of the analogous experiments in [2,5]. We observe significant qualitative improvement. In particular, we achieve a much more complete identification of the left cow's snout, complete identification of the left cow's eyes and ear tag, and a slightly more complete identification of the right cow's hind.  Figure 11: Comparison of our segmentation (using the set-up in Fig. 10(d)) with the analogous segmentations from the previous literature [2,5], both reproduced with permission from SIAM and the authors. Note that unfortunately in reproduction the colour balances and aspect ratios have become slightly inconsistent, but we can still make qualitative comparisons.
We measure the computational cost of the SDIE scheme through the measured runtime of the respective algorithm. We note from Fig. 10 that the MBO scheme (τ = ε) outperforms the non-MBO schemes (τ < ε); the SDIE relaxation of the MBO scheme merely slows down the convergence of the algorithm, without improving the segmentation. This can especially be seen in Fig. 12, where the SDIE scheme needs many more steps to satisfy the termination criterion. At least for this example, the non-MBO SDIE scheme is less efficient than the MBO scheme. Thus, in the following sections we focus on the MBO case.

Errors and timings
We now quantify the influence, on the accuracy and computational cost of the segmentation, of the Nyström rank K, the number of discretisation steps k b and k in the Euler method and the Strang formula respectively, and the choice of normalisation of the graph Laplacian. To this end, we segment the two cows image using the following parameters: ε = τ = 0.003, µ = 30, σ = 35, and δ = 10 −10 . We take K ∈ {10, 25, 70, 100, 250}, (k b , k) ∈ {(1, 1), (10, 5)}, and use the random walk Laplacian ∆ and the symmetric normalised Laplacian ∆ s . the segmentations using the symmetric normalised Laplacian seem to deteriorate for a large K. The random walk Laplacian has diametric properties in this regard: the segmentations are only reliably accurate when K is reasonably large.

Uncertainty in the segmentation
Due to the randomised Nyström approximation, our approximation of the SDIE scheme is inherently stochastic. Therefore, the segmentations that the algorithm returns are realisations of random variables. We now briefly study these random variables, especially with regard to K. We show pointwise mean and standard deviations of the binary labels in each of the left two columns of the four subfigures of Fig. 14.
In the remaining figures, we weight the original two cows image with these means (varying continuously between label 1 for 'cow' and label 0 for 'not cow') and standard deviations. For these experiments we use the same parameter set-up as Section 4.3.1.
We make several observations. First, as K increases we see less variation. This is what we expect, as when K = |V | the method is deterministic so has no variation. Second, the type of normalisation of the Laplacian has an effect: the symmetric normalised Laplacian leads to less variation than the random walk Laplacian. Third, the parameters k b and k appear to have no major effect within the range tested. Finally, looking at the figures with rather large K, we observe that the standard deviation of the labels is high in the areas of the figure in which there is indeed uncertainty in the segmentation, namely the boundaries of the cows and the parts of the wall with similar colour to the dark cow. Determining the exact position of the boundary of a cow on a pixel-by-pixel level is indeed also difficult for a human observer. Moreover, the SDIE scheme usually confuses the wall in the background for a cow. Hence, a large standard deviation here reflects that the estimate is uncertain. This is of course not a rigorous Bayesian uncertainty quantification, as for instance is given in [35,36] for graph-based learning. However the use of stochastic algorithms for inference tasks, and the use of their output as a method of uncertainty quantification, has for instance been motivated by [37].

Greyscale
We now move on to Example 4.2, the greyscale problem. We will especially use this example to study the influence of the parameters µ and σ. The parameter µ > 0 determines the strength of the fidelity term in the ACE. From a statistical point of view, we can understand a choice of µ as an assumption on the statistical precision (i.e. the inverse of the variance of the noise) of the reference f (see [36,Section 3.3] for details). Thus, a small µ should lead to a stronger regularisation coming from the Ginzburg-Landau functional, and a large µ leads to more adherence to the reference. The parameter σ > 0 is the 'standard  deviation' in the Gaussian kernel Ω used to build the weight matrix ω. For our methods we must not choose too small a σ, as otherwise the weight matrix becomes sparse (up to some precision), and so the Nyström submatrix ω XX has a high probability of being ill-conditioned. 14 If σ is too large then the graph structure no longer reflects the features of the image.  In the following, we set ε = τ = 0.00024, k b = 10, k = 5, and δ = 10 −10 . To get reliable results we choose a rather large K, K = 200, and therefore (by the discussion in Section 4.3.2) we use the random walk Laplacian. We will qualitatively study single realisations of the inherently stochastic SDIE algorithm. We vary µ ∈ {50, 100, 150, 200} and σ ∈ {20, 35, 50}. We show the best results from these tests in Fig. 15. Moreover, we give a comprehensive overview of all tests and the progression of the SDIE scheme in Fig. 16. We observe that this segmentation problem is indeed considerably harder than the two cows problem, as we anticipated after stating Example 4.2. The difference in shade between the left cow and the wall is less visible than in Example 4.1, and the left cow's snout is less identifiable as part of the cow. Thus, the segmentation errors we incur are about 3 times larger than before. There is hardly any visible influence from changing σ in the given range. However, µ has a significant impact on the result. Indeed, for µ = 50 the algorithm does not find any segmentation. For µ ≥ 100, we get more practical segmentations. Interestingly, for µ = 150 and µ = 200 we get almost all of the left cow, but misclassify most of the wall in the background; for µ = 100, we miss a large part of the left cow, but classify more accurately the wall. The interpretation of µ as the statistical precision of the reference explains this effect well. For µ = 100, we assume that the reference is less precise, leading us (due to the smoothing effect of the Ginzburg-Landau regularisation) to classify accurately most of the wall. With µ ≥ 150, we assume that the reference is more precise, leading us to misclassify the wall (due to its similarity to the cows in the reference data image) but classify accurately more of the cows. At µ = 200, this effect even leads to a larger total segmentation error. The runtimes are approximately equal across all choices of parameters, except for µ = 200 and σ = 20 for which the runtime is much higher.

Many cows
We finally study the many cows example, i.e. Example 4.3. The main differences to the earlier examples are the larger size of the image that is to be segmented and the variety within it. We first comment on the size. The image is given by a 480 × 2560 × 3 tensor, which is a manageable size. The graph Laplacian, however, is a dense matrix with 1.536 × 10 6 rows and columns. A matrix of this size requires 17.6 TB of memory to be constructed in MatlabR2019a. This image is much more difficult to segment than the previous examples, in which the cows in the image to be segmented are very similar to the cows in the reference data. Here, we have concatenated images of cows that look very different, e.g. cows with a white blaze on their nose.
As the two cows image is part of the many cows image, we first test the algorithmic set-up that was successful at segmenting the former. We show the result (and remind the reader of the set-up) in Fig. 17(a). The segmentation obtained in this experiment is rather mediocre-even the two cows part Figure 16: Progression of the MBO SDIE scheme for the greyscale segmentation task. In each subfigure: The first row shows the reference data, the image to be segmented, and the ground truth segmentation. The middle rows, showing the reshaped label vector un and the image masked by the label, each represent one iteration of the considered algorithm, to be read from top to bottom. The last row gives the state returned by the scheme, i.e. the state satisfying the termination criterion. Figure 17: Segmentations of the MBO scheme for the many cows segmentation task. In each subfigure: the top row shows the reference data image and twice the image that is to be segmented, and the bottom row shows the reference f , the segmentation returned by the respective algorithm, and the original image masked with the segmentation.
In the case where µ = 500, we see a good, but slightly noisy segmentation of the brown and black parts of the cows. In the case where µ = 400, we reduce the noise in the segmentation, but then also misclassify some parts of the cows. The blaze (and surrounding fur) is not recognised as 'cow' in any of the segmentations, likely because the blaze is not represented in the reference data image. The influence of the set-up on the runtimes is now much more pronounced. For the given segmentations, however, all the runtimes are at most a factor of eight larger than the smallest runtimes in the previous examples, despite the larger image size.

Conclusion
Extending the set-up and results in [1], we defined the continuous-in-time graph Allen-Cahn equation (ACE) with fidelity forcing as in [5] and a semi-discrete implicit Euler (SDIE) time discretisation scheme for this equation. We proved well-posedness of the ACE and showed that solutions of the SDIE scheme converge to the solution of the ACE. Moreover, we proved that the graph Merriman-Bence-Osher (MBO) scheme with fidelity forcing [2] is a special case of an SDIE scheme.
We applied this algorithm to a number of image segmentation examples concerning images of cows from the Microsoft Research Cambridge Object Recognition Image Database. We found that whilst the SDIE scheme yielded no improvement over the MBO scheme (and took longer to run in the non-MBO case) the other improvements that we made led to a substantial qualitative improvement over the segmentations of the corresponding examples in [2,5]. We furthermore investigated empirically various properties of this numerical method and the role of different parameters. In particular: • We found that the symmetric normalised Laplacian incurred consistently low segmentation error when approximated to a low rank, whilst the random walk Laplacian was more reliably accurate at higher ranks (where 'higher' is still less than 0.1% of the full rank). Thus for applications that require scalabity, and thus very low-rank approximations, we recommend using the symmetric normalised Laplacian.
Note. In fact, in [2] the authors use ∆ s not ∆. It makes no difference to this analysis which Laplacian is used.
Given ∆ ≈ U 1 ΛU T 2 , an approximate SVD of low rank, the authors approximate (A.1) bŷ v k+1 = U 1 ( Therefore, the final approximation for S τ u in [2] is computed by settingv 0 = u and S τ u ≈v τ /δt . We can also relate this Euler method to a modified quadrature rule. It is easy to see from (2.2) that S τ u = e −τ A u + µ τ 0 e −tA f dt.

A.1 Analysis
We understand the Euler approximation for the e −τ A u term by (A.6). By (A.3), we can write the Euler approximation for the integral term as recalling that kδt = τ . This can be seen to be a quadrature by the right-hand rule, multiplied by a factor of (1 − µδt) −1 . Likewise, we can rewrite (A.8) as where going from (A.7) to (A.8) has incurred an extra error from the spectrum truncation alongside the quadrature and Lie product formula errors.

A.2 Conclusions from analysis
The key takeaway from these calculations (besides the verification that we have the usual O(δt) Euler error) concerns (A.6). That equation shows that the Euler approximation for the e −τ A term is in fact an approximation of a Lie product formula approximation for e −τ A . This motivates our method of cutting out the middleman and using a matrix exponential formula directly, and furthermore motivates our replacement of the linear error Lie product formula with the quadratic error Strang formula.
We have also shown how the Euler method approximation for b can be written as a form of quadrature, motivating our investigation of other quadrature methods as potential improvements for computing b.