The GHP scaling limit of uniform spanning trees of dense graphs

We consider dense graph sequences that converge to a connected graphon and prove that the GHP scaling limit of their uniform spanning trees is Aldous' Brownian CRT. Furthermore, we are able to extract the precise scaling constant from the limiting graphon. As an example, we can apply this to the scaling limit of the uniform spanning trees of the Erd\"os-R\'enyi sequence $(G(n,p))_{n \geq 1}$ for any fixed $p \in (0,1]$, and sequences of dense expanders. A consequence of GHP convergence is that several associated quantities of the spanning trees also converge, such as the height, diameter and law of a simple random walk.


Introduction
Uniform spanning trees (USTs) are fundamental objects in probability theory and computer science, with close connections to many other areas of mathematics including electrical network theory [20], loop erased random walks [32] and random interlacements [18], to name but a few.
It was recently shown in [7], building on the work of [31], that the universal metric measure space scaling limit of USTs of a large class of graphs is Aldous' Brownian continuum random tree (CRT).The purpose of the present paper is to extend this result to sequences of dense graphs encoded by graphons.Due to a transitivity assumption in previous papers, these USTs are not covered by the results of [31] and [7], but here we establish that the CRT is nevertheless still the scaling limit.In addition we are able to express the precise scaling factor in terms of the encoding graphon, making the result more precise than that in [7] and demonstrating that the notion of graphon convergence is enough to fully determine the UST scaling limit.
The CRT, introduced by Aldous [1,2,3], is a well-known object in probability theory, and is perhaps best-known as the scaling limit of critical finite variance Galton-Watson trees.We do not attempt to give a full introduction here; we will give a formal definition in Section 3 and we refer to the survey of Le Gall [23] for further background.
A weighted graph (G, w) is a graph G = (V, E) in which we assign to each edge e ∈ E a non-negative weight w e .In this paper, we will work with sequences of weighted graphs with no loops or multiple edges in which w e ∈ [0, 1] for each e ∈ E. In the case where all edge-weights are equal to 1, we say that the graph is simple.We extend the definition of vertex degree to weighted graphs by defining deg v to be the sum of the weights of the edges emanating from v.
The uniform spanning tree of a weighted graph (G, w) is a random spanning tree chosen from the set of all spanning trees of G where each spanning tree t is chosen with probability proportional to e∈t w e .
We will say that such a sequence (G n ) n≥1 of weighted graphs is dense if there exists δ > 0 such that ∆ n := min v∈Gn deg(v) ≥ δ#V (G n ) for all n.The notion of convergence of dense graph sequences is naturally captured by objects known as graphons, introduced by Lovász and Szegedy [25] and also Borgs, Chayes, Lovász, Sós and Vesztergombi [10] for this purpose.See also [14] for a very quick introduction.A graphon W is a symmetric measurable function from [0, 1] 2 to [0, 1] and can be thought of as (roughly) the continuum analogue of an adjacency matrix.Using this viewpoint, there is a natural notion of distance between discrete graphs and graphons, known as the cut-distance, which we will define in Section 2.1.This allows us to consider the notion of convergence to a given graphon W .
Graphons are commonly used in combinatorics and computer science to analyze large dense graphs.For example, they have been used in extremal graph theory [12], mean-field games [11], analysis of large graphs [21], and to study the thermodynamic limit of statistical physics systems [27,13], to give a very non-exhaustive list.
Given a graphon W , define a constant W (x, y)dy Note it follows immediately from Jensen's inequality that α W ≥ 1, with equality if and only if W is constant almost everywhere.We also say that a graphon W is connected if for all A ⊂ [0, 1] of positive Lebesgue measure, it holds that The main result of the present paper is the following.Below, the GHP distance refers to the Gromov Hausdorff Prohorov distance between metric measure spaces; we define it in Section 2.8.
Theorem 1.1.Let (G n ) n≥1 be a dense sequence of deterministic weighted graphs converging to a connected graphon W , where each G n has n vertices.For each n ≥ 1, let T n be a uniform spanning tree of G n .Denote by d Tn the corresponding graph-distance on T n and by µ n the uniform probability measure on the vertices of T n .Then where α W is defined as in (1), (T , d T , µ) is the CRT equipped with its canonical mass measure µ and −→ denotes convergence in distribution with respect to the GHP distance between metric measure spaces.
A single graphon can also encode sequences of random graphs G(k, W ) k≥1 and H(k, W ) k≥1 with k nodes, obtained by sampling k uniform vertices x 1 , . . ., x k in [0, 1], and either adding an edge of weight 1 between nodes i and j with probability W (x i , x j ) (this is the sequence G(k, W ) k≥1 ), or instead adding an edge of weight W (x i , x j ) (this is the sequence H(k, W ) k≥1 ).We will deduce the following as a consequence of Theorem 1.1.
Corollary 1.2.Let W be a connected graphon.Suppose that there exists δ > 0 such that the minimal degree of G(n, W ) is at least δn with probability tending to 1 as n → ∞.For each n ≥ 1, let T n be a uniform spanning tree of G(n, W ). Denote by d Tn the corresponding graph-distance on T n and by µ n the uniform probability measure on the vertices of T n .Then where (T , d T , µ) is the CRT equipped with its canonical mass measure µ and −→ denotes convergence in distribution with respect to the GHP distance between metric measure spaces.
Moreover, the same statement holds for H(n, W ) in place of G(n, W ).
For example, this applies to the Erdös-Rényi sequence (G(n, p)) n≥1 for any fixed p ∈ (0, 1], which is the sequence (G(n, W )) n≥1 when W is the graphon that is p (almost) everywhere, and in which case α W = 1.Theorem 1.1 shows that graphons contain enough information to determine the scaling limit of USTs, or in other words that the GHP scaling limit is continuous with respect to the topology induced by the cut-distance.In [16], the authors show an analogous result for the Benjamini-Schramm local limit of the USTs appearing in Theorem 1.1, and show that the local limit can be characterized as a multi-type critical branching process conditioned to survive, where the offspring distributions are encoded by the limiting graphon.Additionally, the authors show that continuity also holds for the total number of spanning trees of G n , after being properly renormalized.However, they also give an example to show that this is no longer true under weaker assumptions.
Note that convergence of a graph sequence to a connected graphon does not automatically imply that the graph sequence must be dense, and in fact the local limit result for USTs of dense graphs obtained in [16] does not require this assumption.There, the authors assume only that the limiting graphon is non-degenerate, meaning that and that the graph sequence is connected.In fact this implies that "most" vertices have high degree; see [16, Theorem 2.7 and Definition 2.6] for a precise statement.This is enough to prove a local limit statement since with high probability, the local limit will not see the exceptional vertices of low degree.On the other hand, the GHP scaling limit is a global statement and therefore we require more uniform control of the underlying graphs.One can easily see this through a simple counterexample: let G n denote the complete graph on n − n 2/3 vertices, and attach a stick of length n 2/3 to one vertex of the complete graph.The graphs still converge to the graphon that is 1 everywhere, and the local limit of UST(G n ) is once again the Poisson(1) Galton-Watson tree conditioned to survive.On the other hand, the only non-trivial compact scaling limit is a single stick, and not the CRT.One can also construct similar counterexamples with minimum degree at least n γn for any sequence γ n → ∞, meaning that the assumption of linear minimal degree is indeed necessary.Since the local limit of the CRT is well-known [1, Section 6] to be Aldous' self-similar CRT (SSCRT), one can also ask whether the operations of taking scaling limits and local limits commute.In general, answering this question seems quite non-trivial, as the multitype branching process appearing as the local limit is very non-homogeneous and the offspring distributions of successive generations are not independent.However, a special case arises when the sequence (G n ) n≥1 is regular.In this case the local limit is a Poisson(1) Galton-Watson tree conditioned to survive, which is well-known to rescale to the SSCRT; moreover we will show in Remark 7.3 that the constant α W must be equal to 1, which entails that 1  αW is equal to the variance of the Poisson(1) offspring distribution, and from which we can deduce that the operations do indeed commute in this case.
For non-regular graph sequences, the question seems a bit more subtle.While the expected number of non-backbone neighbours of the root vertex of the local limit is indeed 1, the variance is not necessarily equal to 1 αW .For example, for the complete bipartite graph K 2n 3 , n 3 , one can calculate using [16, Definition 1.2] that the variance of the offspring number of the root vertex is equal to 3  2 , but 1 αW is equal to 8  9 .This does not preclude the possibility that the operations commute, since the variance in subsequent generations may converge to 1  αW in the appropriate sense.For K 2n 3 , n 3 we can in fact apply results of Miermont [28] (the local limit in this case is in fact a Galton-Watson tree with two alternating types: Poi(2) and Poi( 12 )) to deduce that the operations do commute.However, in the general case the local limit is a Galton-Watson tree with uncountably many types, for which, to the best of our knowledge, scaling limits are not covered by the existing Galton-Watson tree literature.
Finally, we note that in [6], the authors consider similar dense graph sequences, but do not assume that the sequence converges to a graphon.Under this weaker assumption, they prove that the diameter of UST(G n ) is of order √ n with high probability.We cannot hope to prove a scaling limit result under the same hypotheses, since one can, for example, connect two copies of K n/2 by a single edge, in which case the diameter is still of order √ n but the scaling limit is not the CRT.However, when the graphs are well-connected, we can obtain the scaling limit.
In this paper we in fact prove the following theorem.In what follows, for a given γ > 0 we say that a graph G is a γ-expander if for all U ⊂ V (G), the number of edges between U and Theorem 1.3.Take γ > 0 and δ > 0 and let (G n ) n≥1 be a dense sequence of connected γ-expanders, where each G n has n vertices and minimal degree at least δn.For each n ≥ 1, let T n be a uniform spanning tree of G n .Denote by d Tn the corresponding graph-distance on T n and by µ n the uniform probability measure on the vertices of T n .Then there exists a sequence (α n ) n≥1 , satisfying 1 ≤ α n ≤ δ −1 for all n ≥ 1, such that as n → ∞ where (T , d T , µ) is the CRT equipped with its canonical mass measure µ and −→ denotes convergence in distribution with respect to the GHP distance between metric measure spaces.
In fact the theorem holds slightly more generally, see Remark 1.4, but the above assumptions make the proof more straightforward.Clearly one cannot hope for convergence of the parameter α n without making stronger assumptions, since one can alternate graphs from sequences with different limiting values of α n .For example, for the sequence of complete graphs α n → 1, but if G n is instead the complete bipartite graph K n 3 , 2n 3 , then α n → 9  8 .As well as the convergence of the rescaled diameter, it follows directly from the GHP convergence of Theorem 1.3 that we also have convergence of the rescaled height and rescaled simple random walk on UST(G n ).More formally, the following three convergences hold in distribution. 1.
3. If X n is a simple random walk on T n , then the quenched law of t≥0 converges in distribution to the quenched law of Brownian motion on the CRT.It also follows that the associated mixing times converge on the same time scale.
See [7,Section 1.3] for further details of why these three properties follow from GHP convergence.In the settings of Theorem 1.1 and Corollary 1.2, we can replace α n with α W in the above three statements.

Proof strategy
Clearly, in order to prove the main theorems, it suffices to first prove Theorem 1.3 and then show that the graph sequence is an expander sequence and that α n → α W under the additional assumption of Theorem 1.1.
We will prove Theorem 1.3 in two steps using the lower mass bound criterion of [8].In particular, by [7,Theorem 6.5], in order to prove the GHP convergence of Theorem 1.3 it is enough to prove the following two statements.
(A) The convergence holds in a finite-dimensional sense (this will be formally stated in Theorem 3.1).
(B) The lower mass bound condition holds; that is, if , then for every η > 0 the sequence m n (η) −1 is tight (this will be formally stated in Claim 6.4).
The second condition will follow quite straightforwardly from minor adaptations of the arguments in [7].The bulk of this paper is devoted to proving the first condition.In fact, this condition is equivalent to the joint convergence, for all k ≥ 1, of the set of k 2 distances between k points chosen uniformly at random in UST(G n ).This type of convergence was previously proved for USTs of sequences of high-dimensional graphs in [31].This is a different class of graphs and includes the assumption of transitivity.Their proof uses Wilson's algorithm, which is a method for sampling USTs one branch at a time by running loop erased random walks (LERWs).In their proof, they couple Wilson's algorithm on G n with Wilson's algorithm on the complete graph and prove that the set of k 2 distances on the two graphs must have the same scaling limit.Our proof, by contrast, is more direct.We also use Wilson's algorithm, but we work directly with UST(G n ) and use the Laplacian random walk representation of LERWs to sample each branch.By tightly controlling the capacity of loop-erased random walks, we are able to directly compute the probability that a given branch exceeds a given length, and show that this converges to the analogous quantity for the CRT using Aldous' stick-breaking construction.
Remark 1.4.As demonstrated by the examples and discussion above Theorem 1.3, the assumption of linear minimal degree is necessary in order to obtain convergence in the GHP topology.In order to keep the exposition clean, we prove both conditions (A) and (B) above under these assumptions.However, the assumption is not really necessary for condition (A).The proof would work unchanged if we allow o(n) vertices to have degrees less than √ n, for example (since the loop-erased random walk that we analyze in Section 5 will never hit this set, whp).In fact, we believe that it may be possible to adapt our proof of condition (A) (Theorem 3.1) to work under the original assumptions of [31], but this would require one to keep track of several additional messy details, and would not add further insight.

Organization of the paper
This paper is organized as follows.In Section 2 we give the necessary background, including an introduction to graphons, USTs and the topologies of interest.In Section 3 we introduce a general framework for stickbreaking constructions of trees, and state Aldous' stick-breaking construction of the CRT.In Section 4 we give some precise random walk estimates and we apply these with the Laplacian random walk representation in Section 5 to obtain estimates for the first steps of Wilson's algorithm.In Section 6 we use these estimates to couple stick-breaking on the CRT with Wilson's algorithm and prove that the two processes are very similar when n is large enough.This proves condition (A) above.We also explain how (B) can be deduced from the results of [7] which in fact establishes Theorem 1.3.Finally, in Section 7 we prove Theorem 1.1 and Corollary 1.2.

Acknowledgments
We would like to thank Asaf Nachmias and Jan Hladky for suggesting to look at graphons and for many helpful comments.This research is supported by ERC consolidator grant 101001124 (UniversalMap), and by ISF grant 1294/19.EA was partially supported by the ANR ProGraM grant.

Background 2.1 Graphons
As mentioned in the introduction, graphons were introduced by Borgs, Chayes, Lovász, Sós, Szegedy and Vesztergombi [25,10] in order to characterize dense graph limits.To understand why this definition is natural, we define the graphon representation of a discrete graph G as follows.Suppose that G is a simple graph with n vertices.Number the vertices from v 1 to v n and partition the interval [0, 1] into a sequence of intervals (I i ) n i=1 , where If G is a weighted graph, we instead define where w(v i , v j ) represents the weight of the edge joining v i and v j (and is zero if there is no such edge).
Note that, given only G, this definition of W G is not unique, since it depends on the ordering of the vertices.Therefore, in order to define a metric on the space of graphons, we will instead consider equivalence classes of graphons.In particular, given two graphons W 1 and W 2 the cut-distance between them is defined as (e.g.see [25,Equation (8.16)]) where the infimum is taken over all measure-preserving automorphisms of [0, 1], where W ϕ is defined by W ϕ (x, y) = W (ϕ(x), ϕ(y)), and where the cut-norm of a measurable function We therefore say that a sequence of deterministic graphs Remark 2.1.Graphons can in fact be defined as functions from where Ω is any probability space, see [25,Chapter 13], but since all probability spaces are isomorphic, this does not provide much greater generality.
We will make use of the following lemma.

Random graphs and graphons
A graphon W can be used to define a random graph with n vertices in two ways.
2. Sample x 1 , . . ., x n i.i.d.uniformly on [0, 1].We define a random weighted graph on {1, . . ., n} by adding an edge between i and j of weight W (x i , x j ) for each (unordered) pair (i, j).We denote the resulting random graph H(n, W ).
In both constructions, note that we can use a single graphon to define a whole sequence of random graphs.
The following lemma tells that in either case, the cut-distance between a random sample of G(k, W ) or H(k, W ) and W goes to zero w.h.p. as k → ∞.In particular this means that results we prove for USTs of deterministic sequences of graphs extend automatically to sequences of the form G(k, W ) k≥1 or H(k, W ) k≥1 under the assumptions of Corollary 1.2.
For example, the classical Erdös-Rényi graphs G(n, p) for n ≥ 1, p ∈ [0, 1] correspond to the graphs G(n, W p ) where W p is the graphon that is equal to p everywhere.
For further background and applications of graphons, we refer to [25, Part 3].

Mixing times
Let G be a connected weighted graph with n vertices, with weights (w(x, y)) x,y∈V (G) , and with no loops or multiple edges.A random walk on G is the Markov Chain (X m ) m≥0 such that, for all vertices x, y ∈ V (G), and all m ≥ 1, , where z ∼ x means that z is a neighbour of x.Due to periodicity considerations, it is sometimes more convenient to instead use the notion of a lazy random walk.This is defined by for all m ≥ 1.
For each t ≥ 0 let p t denote the t-step transition density of a lazy random walk, i.e. p t (x, y) = P(X t = y | X 0 = x) for all x, y ∈ V (G).We define the mixing time of G as (see [24,Equation (4.31)]),where π denotes the stationary measure on G.
We will also need the notion of total variation distance between two probability measures on µ and ν on a finite subset X ⊂ V (G).This is defined by Furthermore, by [24, Section 4.5], we have for any k ≥ 1, any t ≥ kt mix and any vertex x that (4)

Expanders
We will use the following definition of an expander graph.
Although we give the definition for loopless graphs, note that adding loops to a graph does not change the law of its UST, since loops can never appear in a UST.Note that often in the literature a slightly different definition of expander is used, involving the Cheeger constant.We are using the definition above as it fits more naturally into the framework of dense graphs (as we will later show in Claim 7.1) and is the same definition used to consider the local limit in [16].
The main property of expanders that we will use is as follows.
Claim 2.5.Let γ > 0 and let G be a γ-expander with n ≥ 2 vertices.Then, provided that n is large enough (depending on only γ), we have that Proof.Note that it follows from Definition 2.4 that G has minimal degree at least γ 2 n.First note that by [24,Theorem 12.4] that where t rel is the relaxation time of G.By the Cheeger inequality (see [4,5,19,22] for various proofs), Combining all the inequalities gives the result.

Loop-erased random walk and Wilson's algorithm
We now describe Wilson's algorithm [32] which is a widely-used algorithm for sampling USTs.A walk where a, b are integers, we write X[J] for {X i } b i=a .Given a walk X, we define its loop erasure Y = LE(X) = LE(X[0, L]) inductively as follows.We set Y 0 = X 0 and let λ 0 = 0.Then, for every i ≥ 1, we set We halt this process once we have λ i > L. When X is a random walk on the weighted graph G starting at some vertex v and terminated when hitting another vertex u (L is now random), we say that LE(X) is a loop erased random walk (LERW) from v to u.
To sample a UST of a finite connected weighted graph G we begin by fixing an ordering of the vertices of V = (v 1 , . . ., v n ).First let T 1 be the tree containing v 1 and no edges.Then, for each i > 1, sample a LERW from v i to T i−1 and set T i to be the union of T i−1 and the LERW that has just been sampled.We terminate this algorithm with T n .Wilson [32] proved that T n is distributed as UST(G).An immediate consequence is that the path between any two vertices in UST(G) is distributed as a LERW between those two vertices.This was first shown by Pemantle [30].

Laplacian random walk
Here we outline the Laplacian random walk representation of the LERW (see [26,Section 4.1] for full details) and its application to Wilson's algorithm.Take a finite, weighted, connected graph G and suppose we have sampled T j for some j ≥ 1 using Wilson's algorithm as described above.We now sample a LERW from v j+1 to T j .Denote this LERW by (Y m ) m≥0 .Also let X denote a random walk on G.For a set A ⊂ G, let τ A denote the hitting time of A by X, and τ + A denote the first return time to A by X.The Laplacian random walk representation of Y says that, conditionally on T j and on the event {(Y m ) i m=0 ∩ T j = ∅}, we have for any i ≥ 0 that .
Clearly this is only non-zero when v / ∈ i m=0 {Y m }.We can now extrapolate this to ask about the law of (Y m ) i+H m=i+1 for some where

Capacity and closeness
Recall that G is a connected weighted graph with n vertices with minimal degree at least δn.The capacity of a set of vertices of G quantifies how difficult it is for a random walk to hit the set.Let (X i ) i≥0 be a random walk on G and for U ⊂ V (G), let τ U = inf{i ≥ 0 : Here we collect some useful facts about the capacity.
Proof.The upper bound follows from a union bound.The lower bound follows from the Bonferroni inequalities and the lower bound on the degree, which imply that We will also use the following claim.
Then, provided n is large enough, Proof.Let X be a random walk started at u ∈ G. Clearly, for any t ≥ 0, the first t steps of X can be coupled with the first t non-repeat steps of a lazy random walk X.Therefore, first run a lazy random walk started from u until time T = 2 log n • t mix .Let N denote the total number of non-repeat jumps of this lazy random walk.The distribution of Xt is almost stationary by (4).Moreover, we have that 0 ≤ N ≤ T deterministically.To sample (X t ) M t=0 , we first couple it with the first N steps of ( Xt ) T t=0 as explained above, and then run X for a further M − N steps.Under this coupling, we therefore have from a union bound that Similarly, In order to obtain lower bounds on capacity, we define the k-closeness of two sets U and W by Corollary 2.8.For any disjoint sets U, W ⊂ G, we have that In particular,

Random variables
Here we present two elementary results that will be useful in Section 6.
Claim 2.9.Let ε > 0 and let For any L > 0, let X L be the random variable on (0, ∞) satisfying Then for any δ > 0, there exists η = η(δ, L) > 0 such that the following holds.Let Y be another random variable on (0, ∞), and suppose that for all x > 0, Then this implies that we can couple X L and Y so that Furthermore, for any δ, L 1 and L 2 with L 1 < L 2 , there exists η = η(δ, L 1 , L 2 ) such that we can couple X L and Y as described above for every Proof.Note that we can couple X L and Y by first sampling U ∼ Uniform([0, 1]) and setting Wlog assume that δ < 1 and K δ,L > 1, otherwise decrease or increase them if necessary.Note that, for all 0 ≤ x < K δ,L , we have that Now suppose that (8) holds and η < M δ,L .Then, for any 0 ≤ x < K δ,L we have that Therefore, under the coupling, we have for any x < K δ,L that Similarly, {X L ≥ x} ⇒ {Y ≥ x − δ}.Therefore, under this coupling we have that as required.
For the second claim, note that for every L ′ > L we also have that P(X L ′ ≥ K δ,L ) < δ.Therefore for the interval [L 1 , L 2 ] we can simply use K δ,L1 and M δ,L1 on the whole interval.

GHP topology
Here we define the GHP topology.We use the framework of [29, Sections 1.3 and 6] and work in the space X c of equivalence classes of metric measure spaces (mm-spaces) (X, d, µ) such that (X, d) is a compact metric space and µ is a Borel probability measure on it, and we say that (X, d, µ) and (X ′ , d ′ , µ ′ ) are equivalent if there exists a bijective isometry φ : X → X ′ such that φ * µ = µ ′ (here φ * µ is the pushforward measure of µ under φ).To ease notation, we will represent an equivalence class in X c by a single element of that equivalence class.First recall that if (X, d) is a metric space, the Hausdorff distance d H between two sets A, A ′ ⊂ X is defined as For ε > 0 and A ⊂ X we also let A ε = {x ∈ X : d(x, A) < ε} be the ε-fattening of A in X.If µ and ν are two measures on X, the Prohorov distance between them is given by Definition 2.11.Let (X, d, µ) and (X ′ , d ′ , µ ′ ) be elements of X c .The Gromov-Hausdorff-Prohorov (GHP) distance between (X, d, µ) and (X ′ , d ′ , µ ′ ) is defined as where the infimum is taken over all isometric embeddings φ : X → F , φ ′ : X ′ → F into some common metric space F .
Recall that our aim in this paper is to prove distributional convergence with respect to the GHP topology.Given an mm-space (X, d, µ) and a fixed m ∈ N we define a measure ν m ((X, d, µ)) on R ( m 2 ) to be the law of the m 2 pairwise distances between m i.i.d. points drawn according to µ.Each law P on X c therefore defines random measures (ν m ) m≥2 and annealed measures (ν m ) m≥2 on R ( m 2 ) , given by νm (P) := Xc ν m ((X, d, µ))dP.
In [7] we rephrased a result of [8, Theorem 6.1] in the distributional setting to characterize GHP convergence in terms of convergence of the measures (ν m ) m≥2 and a volume condition.To state the version that we will use in this paper, given c > 0 and an mm-space (X, d, µ) we define In the proof of the next proposition we will also make reference to the (coarser) Gromov-Prohorov topology, which is defined as follows.
The key result is as follows.
Proposition 2.13.Let (X, d, µ) be an element of X c with law P such that µ has full support almost surely.Let ((X n , d n , µ n )) n≥1 be a sequence in X c with respective laws (P n ) n≥1 and suppose that: (a) For all m ≥ 0, νm (P n ) → νm (P) as n → ∞.
(b) For any c > 0, the sequence m c (( → (X, d, µ) with respect to the GHP topology.
Proof.First we show that part (a) and (b) together imply that (X n , d n , µ n ) −→ (X, d, µ) with respect to the GP topology, by verifying the two conditions of [15,Corollary 3.1].The second condition of [15, Corollary 3.1] is precisely (a).To verify the first condition we further use [15,Theorem 3] (recall that by Prohorov's Theorem the relative compactness of the measures is equivalent to their tightness) and verify conditions (i) and (ii) there (see also Proposition 8.1 in [15]).Condition (i) is just saying that ν2 is a tight sequence of measures on R, which follows from (a).Lastly, (b) directly implies condition (ii).
Therefore, by [7,Theorem 6.5], the spaces convergence with respect to the GHP topology.

Stick-breaking construction of trees
Our first goal will be to prove condition (a) of Proposition 2.13 which is equivalent to the following statement.Theorem 3.1.Take γ > 0 and δ > 0 and let (G n ) n≥1 be a dense sequence of γ-expanders, where each G n has n vertices and minimal degree at least δn.Denote by d Tn the graph distance on T n and by (T , d, µ) the CRT.Then there exists a sequence (β n ) n≥1 , satisfying √ δ ≤ β n ≤ 1 for all n ≥ 1, such that for any fixed k ≥ 1, if {x 1 , . . ., x k } are uniformly chosen independent vertices of G n , then the distances converge jointly in distribution to the k 2 distances in T between k i.i.d. points drawn according to µ.
To prove this theorem, we will use Aldous' stick-breaking construction of the CRT which is particularly well adapted to dealing with the pairwise distances between a set of k uniform points.Our strategy will be to show that the first k steps of Wilson's algorithm on G n closely approximate those of this stick-breaking process when n is large.In this section we briefly recall the stick-breaking construction of the CRT and some of its key properties.
We start with a more general description of how one can construct a sequence of trees from sticks on the real line.Definition 3.2.(Stick-breaking construction of a tree sequence).Set y 0 = z 0 = 0, and suppose that we have a sequence of points y 1 , y 2 , . . .∈ [0, ∞) and z 1 , z 2 , . . .∈ [0, ∞) such that y i−1 < y i and z i ≤ y i for all i ≥ 1. Construct trees as follows.Start by taking the line segment [y 0 , y 1 ) at time 1.This is T (2) (as it contains two marked points).We proceed inductively.At time i ≥ 2, take the interval [y i−1 , y i ) and attach the base of the interval [y i−1 , y i ) to the point on T (i) corresponding to z i−1 .This gives a new tree with i + 1 marked points (in bijection with the set (y j ) i j=0 ), which we call T (i+1) .Given two such sequences and any k ≥ 2 we define SB (k) ((y 0 , y 1 , y 2 , . ..), (z 0 , z 1 , z 2 , . ..)) or equivalently SB (k) ((y 0 , y 1 , y 2 , . . ., y k−1 ), (z 0 , z 1 , z 2 , . . ., z k−2 )) to be equal to the tree T (k) .
In general, the sequence of trees constructed in this way above may not converge, but Aldous showed that by choosing the points in the right way, we can in fact construct the CRT via stick-breaking.
. .) denote the ordered set of points of a non-homogeneous Poisson process on [0, ∞) with intensity t dt, and let Z i be chosen uniformly on the interval [0, Y i ) for each i ≥ 1. Construct the sequence (T (k) ) ∞ k=2 as in Definition 3.2.Then the closure of the limit of T (k) is equal in distribution to the CRT.Moreover, if one stops the process after k − 1 steps, then the resulting tree T (k) has the same distribution as the subtree spanned by k uniform points in the CRT, and the points corresponding to the set (Y i ) k−1 i=0 can be identified with k uniform points in the CRT.
In particular, the set of k 2 pairwise distances between points corresponding (Y i ) k i=1 is equal in distribution to the set of k 2 pairwise distances between k uniform points in the CRT.The following proposition will be important for the comparison with Wilson's algorithm later on.It can be verified by a direct computation.
The following lemma will also be useful.
Fix some ε > 0 and suppose that the following holds. (i) Then, for all 0 ≤ i, j ≤ k, it holds that Proof.When conditions (i) and (ii) hold, we have for all i ≤ k − 1, j ≤ k that y j ≤ z i ≤ y j+1 if and only if y ′ j ≤ z ′ i ≤ y ′ j+1 .We claim that this implies that |d(y i , y j ) − d ′ (y ′ i , y ′ j )| ≤ 2kε for all i, j ≤ k + 1.Indeed, it follows by construction that d(y i , y j ) is the sum of lengths of at most k branch segments in T (k+1) , and all of their lengths can be written in the form |y j − y j−1 |, |z j − y ℓ | or |z j − z ℓ |.Moreover, by construction, when the conditions (i) and (ii) hold, d ′ (y ′ i , y ′ j ) can be written as the same sum but replacing each z j with z ′ j and replacing each y j with y ′ j .It therefore follows from the triangle inequality that |d(y i , y j )−d ′ (y ′ i , y ′ j )| ≤ 2kε.

Random walk properties
In this section we prove some results on random walk hitting probabilities and capacity, which we will later transfer to segments of LERW using the Laplacian random walk representation of Section 2.5.
Throughout the section we fix a small κ ∈ (0, 1  32 ) and for n ≥ 1 we set In what follows we will simply write M instead of M n .
Notational remark.For the statements in this section, we will take a sequence of graphs satisfying the assumptions of Theorem 1.3 which is therefore associated with two positive constants γ > 0 and δ > 0. In this section we will treat these constants as fixed, and therefore o(•) and O(•) quantities may also depend on γ and δ.
where the o δ,γ (1) term is uniform over all A and B but may depend on δ and γ.Similarly, for an upper bound on P π (τ A < τ B ) we simply exchange the roles of A and B. We deduce that, uniformly over all permitted A and B, We will also need the following minor adaptation.
Lemma 4.2.Take γ > 0 and δ > 0 and let (G n ) n≥1 be a dense sequence of γ-expanders, where each G n has n vertices and minimal degree at least δn.Take κ and M as defined at the start of Section 4.Then, for any disjoint Proof.We start by proving the first statement for a lazy random walk, since this is equivalent, and we denote such a lazy random walk by X.Throughout this proof, we will also use the following notation.For a set C ⊂ G n and some time t ≥ 0 we write τ (C, t) for the first time s strictly larger than t such that X s ∈ C. Furthermore, write t + mix for log 2 2 (n)t mix so that by (4) we have that for every u ∈ G n , We start with a lower bound on P u (τ A < τ B ).We have that Note that by ( 4), the first term can be lower bounded by n) .For the second term, let us upper bound the probability of the event {τ A∪B < t + mix < τ (A, t mix ) < τ (B, t + mix )}.Using a union bound we obtain Note that, by Lemma 2.6 and Lemma 4.1 we have that so that, by Claim 2.5 Substituting everything back into (11), we therefore deduce that . For an upper bound on P u (τ A < τ B ), we simply write Using (12) again we obtain that ).For the second statement, it is again enough to prove it for the lazy random walk, replacing τ + B with the first hitting time of B after making at least one non-lazy step using the exact same proof.

Capacity
Here we prove some similar properties for the capacity and closeness of a random walk.
In this section we can also introduce the sequence (α n ) n≥1 appearing in Theorem 1.3.Given the graph sequence (G n ) n≥1 , take M = n κ as defined at the start of Section 4, let X be a random walk on G n , and for each n ≥ 1 set Proposition 4.3.Take γ > 0 and δ > 0 and let (G n ) n≥1 be a dense sequence of γ-expanders, where each G n has n vertices and minimal degree at least δn.Let u ∈ G n and let (X i ) i≥0 denote a random walk on G n started at u.Take M = n κ as defined at the start of Section 4. Then for all sufficiently large n, Proof.The proof is a simplified version of that of [31,Lemma 5.3].First recall from Lemma 4.2 that t + mix = log 2 2 (n)t mix .Let (T j ) n κ/2 j=1 be a sequence of i.i.d random variables with distribution Bin(t + mix , 1/2).Then, for all 1 ≤ j ≤ n κ/2 let Note that by (4) we have that for all j ≤ n κ/2 , given X[0, jn κ/2 ], the starting point of B j+1 is nearly stationary.Also let (X ind,j ) n κ/2 j=1 denote a sequence of independent random walk segments each of length n κ/2 − t + mix , and each started from stationarity.Note that, by (4), the segments (X Bj ) n κ/2 j=1 can be coupled with the segments (X ind,j ) n κ/2 j=1 so that the segments coincide for all j ≤ n κ/2 with probability at least Note that the segments (X ind,j ) j are i.i.d.and, by definition, Moreover, by a union bound, we also have the deterministic bound It therefore follows from a Hoeffding bound [17, Theorem 1] that there exist C < ∞, c > 0 such that for any t > 0, In particular, since it follows from (14) that we deduce that We would like to approximate the capacity of the whole segment X[iM, (i+1)M ) by the sum of the capacities of the smaller segments, but this is potentially a slight overestimate, since we are double-counting random walk trajectories that hit more than one smaller segment.To account for this, we use the concept of closeness defined in Section 2.6.For each J ≤ n κ/2 , note that conditionally on (X ind,j ) j≤J all being disjoint, which happens with probability at least we have by Corollary 2.8 that Close M (X ind,J , ∪ j<J (X ind,j ) j ) ≤ 2M 3 n κ/2 δ 2 n 2 .
Equally, approximating Cap M (X[0, M )) by n κ 2 j=1 Cap M (X Bj ) might be undercounting slightly, since there is also a contribution to the capacity from the set X[0, M ) \ ∪ j<n κ/2 X Bj .
We now combine the above estimates as follows.Note that, on the event X Bj = X ind,j for all j ≤ J, we have (also using (15) and a union bound) that for all sufficiently large n: Therefore, combining with the estimates of ( 13), ( 16), (17) in a union bound, applying (18) and using that t + mix ≪ n κ/32 , we see that with probability at least 1 − 2M 2 δn we have that

Laplacian random walk representation and Wilson's algorithm
Throughout all of this section, we let (G n ) n≥1 be a sequence of graphs satisfying the assumptions of Theorem 1.3 with parameters δ > 0 and γ > By Claim 2.5, this implies that t mix = O(log n).For each n, k ≥ 1, T (k−1) n will denote the tree obtained after running Wilson's algorithm on G n on the vertex set (v 1 , . . ., v k−1 ).Given such a sequence (G n ) n≥0 , we set where X is a random walk on G n and κ is as defined at the start of Section 4. Lastly, if A ⊂ G n , we will use the notation τ A to denote the hitting time of A for a random walk on G n .
The goal of this section is to prove the forthcoming Proposition 5.2 for such a sequence of graphs, for which we will need the following definition.Definition 5.1.We say that a subgraph T ⊂ G n is good if 1. T is a tree.Proof.For each K < n 2κ , we will use the Laplacian random walk representation to bound the probability that P(|Y To this end, we have by (20) for every v ∈ T and every simple path ϕ of length K with ϕ ∩ T = ∅ that By Lemma 4.1 and Lemma 4.2, the second term can be bounded by estimating the capacities of T and of ϕ.We claim that up to some constants depending on the minimal degree, they both can be estimated by their sizes.Indeed, for any set of A size less than n 1/2+κ , we have that δM|A| 2n ≤ Cap M (A) ≤ |A|M δn when n is large enough by Lemma 2.6.
Therefore, as ϕ is of size K < n 2κ < n 1/2+κ and T is of size at most n 1/2+κ we have that Therefore, by Lemma 4.1, Lemma 4.2 and summing over all v ∈ T we have for all sufficiently large n that By a union bound, we can thus conclude that as required.
We will also need the following result to control the constant C defined by (21).The reader should have in mind that we will eventually apply the result with T = T M .Then, for all simple paths {Y m } iM m=0 , all simple paths {u m } H m=0 such that u 0 = Y iM and H ≤ M , and all connected subgraphs T ⊂ G n such that {u m } H−1 m=1 , {Y m } iM m=0 and T are disjoint and such that |T | ≤ n 1/2+κ we have the following .
Proof.Fix some 1 ≤ h ≤ H.In case (a), in order to bound a term appearing in the product in (21), we would like to compare the probabilities By Lemma 4.2 and by the triangle inequality, we have that Lemma 5.7.Let X be a random walk on G n .Let T ⊂ G n be a subgraph such that |T | ≤ n 1/2+κ , and let A be a connected subset of T .Then for any 0 ≤ i ≤ n 1/2+κ M , for any simple path y[0, iM ] on G n disjoint from T and any u / ∈ T ∪ y[0, iM ] Proof.Upper bound.Recall that t + mix = log 2 2 (n)t mix .First note that by Claim 2.7, we have that Here the final line follows because of Lemma 2.6, which implies that Cap M (A) ≥ δM|A| 2n on the event |T | ≤ n 1/2+κ .Lower bound.We first note that We will now bound all three terms on the left hand side.First, we lower bound by Claim 2.7.For the second term, we upper bound it by the product of the probabilities for X to self intersect in M steps, and then to hit A in another M steps.This is upper bounded by The third term can be bounded by the probability to hit y[0, iM ] ∪ (T \ A) in at most M steps, and then to hit A in at most another M steps, which is upper bounded by

We conclude that Cap
We now have estimates for all the quantities appearing in (20).We combine these in the next corollary.M and for any simple path (y m ) iM m=0 not intersecting T , satisfying such that the sum of their lengths is εβ n √ n (which is much larger than n 3ε ).By discarding those that are of length less than n 3ε we can apply Proposition 5.2(2) to the remaining subsets (by decomposing them and I into smaller intervals if necessary) to deduce that Now take N large enough such that the o(1) error is bounded by ε.Then, take some set A in [0, I max ] and let I A be the set of intervals of the form [iε, (i + 1)ε) intersecting A. Now, we have that Hence the Prohorov distance between these two measures is at most ε.
The main claim of this section is now as follows.
for all n.As remarked at the end of [7], it is straightforward to extend the proof of the lower mass bound to this setting by carrying the constant D through all of the computations in [7]; we do not provide the details as they are not illuminating.Under the assumptions of Theorem 1.3, we can take D = δ −1 so this easily verifies Claim 6.4 and therefore Proposition 2.13(b).Moreover, Theorem 3.1 ensures that Proposition 2.13(a) is also fulfilled.Theorem 1.3 therefore follows directly.
7 Proof of Theorem 1.1 and Corollary 1.2 Recall from the introduction that that a graphon W is non-degenerate if the function W (x, y)dy is defined and strictly positive for every x ∈ [0, 1], and that a non-degenerate graphon W is connected if for every measurable A ⊂ [0, 1] we have that W (x, y)dxdy > 0.
In order to verify Theorem 1.1 as consequence of Theorem 1.3, we need to verify that under the assumptions of Theorem 1.1, the graph sequence is an expander sequence and that α n → α W .
We start with the first of these.Recall the definition of a γ-expander sequence is given in Definition 2.4.Claim 7.1.Let W : [0, 1] 2 → [0, 1] be a connected graphon and let G n be a sequence of weighted graphs with minimal degree at least δn converging in cut-distance to W . Then there exists γ = γ(W, δ) > 0 such (G n ) n≥1 is a γ-expander sequence.
Proof.Take U ⊂ G n .We split the proof into two cases depending on whether |U | ≥ 1 2 δn or not.Case 1: |U | ≤ 1  2 δn.Since G n has minimal degree at least δn and the maximal weight of every edge is 1, it follows that there is a total weight of 1  2 δn emanating from every vertex leading to V (G) \ U , so that In particular, since G n converges to W this implies that there exists N > ∞ such that for all such U and all n ≥ N , We now turn to verifying the convergence α n → α W . Recall that in Section 4 we defined where X is a RW on G n started from stationarity, and showed that under some assumptions, the sequence with respect to the GHP topology.We also let U πn denote a random stationary vertex of G n , and define αn = nE[π n (U πn )].
In fact it is more convenient to deal with αn rather than α n .This is sufficient as we show in the following claim (we write the proof for completeness, but really it follows directly just from linearity of expectation and Corollary 2.8).
Claim 7.2.Let (G n ) n≥0 be a sequence of weighted graphs on n vertices with minimal degree δn.Let α n and αn be defined as above.Then α n = αn (1 + o(1)) as n → ∞.
Proof.By the Bonferroni inequalities and linearity of expectation, and letting Z denote an independent RW started from stationarity, we can write (recalling also from Lemma 2.6 that Cap M (U π ) ≥ Mδ 2n deterministically): Similarly, then note that, since π(v) ≥ δ n for all v ∈ G n deterministically, To conclude, we combine to get that It therefore follows that the convergence of (26) holds with the sequence (α n ) n≥1 in place of (α n ) n≥1 .To prove main convergence theorem, it is therefore sufficient to show that, under the assumptions of Theorem 1.1, where α W is as in (1).
Remark 7.3.Note that αn is 1 when G n is regular, so clearly (27) will entail that α W = 1 for a regular graph sequence.
Our next goal is to show the following.