A down-up chain with persistent labels on multifurcating trees

In this article, we propose to study a general notion of a down-up Markov chain for multifurcating trees with n labeled leaves. We study in detail down-up chains associated with the ( 𝛼, 𝛾 ) -model of Chen et al.


Introduction
Let [n] := {1, . . ., n}.An [n]-tree is a (non-planar) rooted, multifurcating tree with n ∈ N leaves labelled by 1, . . ., n, an additional vertex of degree 1 called the root, and in which the degree of any other vertex, called a branch point, will be at least 3.We will denote the space of such trees by T [n] .Occasionally, we will only allow binary trees, also called n-leaf cladograms, and will denote this space by T bin [n] .The Aldous chain [4,28] is a Markov chain on T bin [n] with a transition kernel consisting of a down-step followed by an up-step defined as follows: Down-step: Delete a leaf uniformly at random and contract the parent branch point away.Up-step: Insert a new leaf with the same label by selecting an edge uniformly at random, inserting a new branch point on that edge and attaching the new leaf onto that branch point.It is natural to extend the above type of Markov chain by introducing alternative ways of deleting and inserting leaves as well as by extending it to the space of multifurcating trees.We will refer to any Markov chain on T [n] (or a subset thereof) as a down-up chain if its transition kernel can be decomposed into a down-step where a random (but not necessarily uniformly distributed) leaf is deleted, and an up-step where a new leaf is inserted into a randomly picked edge or branch point.
Noting that the up-step described above is identical to one step of Rémy's growth process [26], a specific problem is to define new down-up chains where the up-step is governed by other growth processes from the literature.For our purposes Ford's α-model for binary trees [8], extending Rémy's growth process [26], and the (α, γ)growth rule for multifurcating trees [7], of which both Ford's α-model and Marchal's stable tree growth process [20] are special cases, are of highest importance.The (α, γ)-growth process for 0 ≤ γ ≤ α ≤ 1, is characterized by, for each n ∈ N, obtaining an element of T [n+1] from an element t ∈ T [n] by inserting a new leaf labelled by n + 1 into an insertable part of t, denoted by ins(t), where an insertable part x ∈ ins(t) is picked with probability proportional to an associated weight, w x , with x is an internal edge, (c x − 1) α − γ if x is a branch point with c x children.(1) It is rather natural to construct a down-step such that the stationary distribution of the down-up chain is that induced by the growth process that governs the up-step.To be more precise, let (T n ) n∈N denote a growth process.We are then looking to devise a down-step, such then when carrying out the down-step from T n we obtain T ↓ n with the property that T ↓ n D = T n−1 for each n ≥ 2. Combinining the downstep with the up-step from the growth process yields a stationary down-up Markov chain, (T n (m)) m∈N0 , if T n (0) D = T n .If the leaf labels are exchangeable, i.e. if for any permutation σ of [n] the probability of obtaining an [n]-tree, t, is the same as the probability of obtaining t where leaf i is labelled by σ(i) for each i ∈ [n], a down-step with a simple uniform deletion of the leaves will suffice.However, in the absence of exchangeability the situation is more complicated.
In [9,11] a modified version of the Aldous chain, the α-chain, was introduced, where the up-step is governed by Ford's α-model.For α = 1  2 , Ford's α-model is the same as Rémy's growth process, and so the leaf labels are exchangeable, but this is not true for 0 < α = 1  2 < 1, so a more complicated down-step was introduced: (i) Down-step (selection): Select leaf i uniformly at random.(ii) Down-step (local search): Defining a and b to be the smallest leaf labels in the first and second subtree on the ancestral line from i, respectively, let ĩ := max {i, a, b}.(iii) Down-step (swap and delete): Swap i and ĩ, delete leaf ĩ, and contract away the parent branch point.(iv) Down-step (relabel): Relabel the leaves using the increasing bijection [n] \ {ĩ} → [n − 1] to obtain an element of T bin n−1 .(v) Up-step: Insert leaf n according to Ford's α-growth rule [8] (the binary specification of the (α, γ)-growth rule obtained for γ = α).
In the exchangeable case where α = 1 2 , the relabelling in step (iv) is unnecessary and one can therefore adjust (v) accordingly and just insert the deleted leaf again, which yields the uniform chain from [11].
Irrespective of the exchangeability of the leaf labels, the above down-step serves a key purpose in a broader context which we will now outline.It is well known that the stationary distribution of the Aldous chain, the uniform distribution on the space of binary trees [4], converges when suitably scaled to that of the Brownian Continuum Random Tree [1], as the number of leaves tends to infinity.Aldous and Schweinsberg [4,28] showed that the relaxation time of the Aldous chain is of order n 2 .Aldous conjectured that running the chain n 2 times faster than the number of leaves tend to infinity would result in a "diffusion on continuum trees" [3].
In a sequence of papers [14,10,12,13] the authors have carried out a programme to contruct this "Aldous diffusion".Parts of this construction arise by studying projections of the uniform chain onto decorated trees.To be more precise, consider for an [n]-tree the reduced subtree spanned by the leaves labelled by 1 and 2, where all vertices of degree 2 have been contracted away, and consider how many of the n leaves are sitting in subtrees of the branch point and on each of the edges of that reduced tree (for a more precise description of this, see Definition 6.1).This will, for a binary tree, yield three masses summing to n.For each m ∈ N 0 , let T n (m) denote the mth step of the uniform chain on T [n] , let Y 0 n (m), Y 1 n (m), Y 2 n (m) denote the proportion of these masses when projecting T n (m) to a decorated [2]-tree of mass n.In [22] a notion of a Wright-Fisher diffusion with negative mutation rates was developed, and it was shown that as n → ∞, stopped the first time one of the last two coordinates hits 0, where D − → denotes convergence in distribution and (Y 0 , Y 1 , Y 2 ) is a Wright-Fisher diffusion with mutation rates 1  2 , − 1 2 , − 1 2 .If we delete leaves uniformly at random, we will end up deleting the branch point due to one of the masses associated with the leaf edges dropping to 0. In the limit, the Wright-Fisher process does not provide any guidance as to how to continue the process.But with the down-step of the α-chain we have a mechanism by which we can identify another branch point and continue the process.The two key aspects that make this possible, are the label swapping as well as the relabelling of the remaining leaf labels.
The aim of this paper is to study and generalize down-up chains with label swapping to the space of multifurcating trees.Specifically, we will study chains where the up-step is governed by the (α, γ)-growth process [7], which also encompasses the prime special case of the stable tree growth [20].Whilst any (α, γ)-growth process with 0 < γ ≤ α < 1 has a continuum random tree as its scaling limit, the stable tree growth is of particular motivational interest due to the universality of the stable continuum random tree [20], and the fact that the stable tree growth induces exchangeable leaf labels.Hence a down-up chain with an up-step governed by the (α, γ)-growth process will be analogous to the Aldous and uniform chains, in terms of devising a "diffusive limit on the continuum trees".To envisage the construction of such a process on continuum trees it is of key importance to devise a down-step that allows the construction beyond a disappearing branch point, when projecting it to the space of decorated trees.This is the key purpose of this paper.Definition 1.1.Fix n ∈ N. The (α, γ)-chain on T [n] , (T n (m)) m∈N0 , is a Markov chain on T [n] with a transition kernel characterized as follows: (i) Conditional on T n (m) = t, select leaf I = i uniformly at random.(ii) Let c v denote the number of children of v, the parent of i in t.
• If c v = 2, let Ĩ = max{i, a, b} where a = min leaf labels in the first spinal bush on the ancestral line from leaf i , b = min leaf labels in the second spinal bush on the ancestral line from leaf i .
• If c v > 2, let i 1 , . . ., i cv denote the smallest leaf labels in the c v subtrees of v, enumerated in increasing order, and say that i = i j for some j ∈ [c v ].Define the conditional distribution of Ĩ by (iii) Swap i and Ĩ, delete Ĩ.
Formal definitions of the terminology and operations used in Definition 1.1 can be found in Sections 2.4 and 2.5.The significance of Theorem 1.2 is that the unique invariant distribution for the (α, γ)-chain is exactly the distribution induced by the (α, γ)-growth process, proving that the down-step does indeed counteract the up-step governing the (α, γ)-growth process.
We now turn our focus to the investigation of projections of the (α, γ)-chain.Specifically, we will project the (α, γ)-chain on T [n] to the space of decorated [k]trees of mass n, a special case of which was introduced prior to (2).We will show that the projected chain is Markovian and characterize its transition kernel without referencing the (α, γ)-chain.There are several well-known results on when a function of a Markov chain is again Markovian.As in [9], we will utilize both the Kemeny-Snell criterion [18] and the intertwining criterion [27], and we will need to use them in conjunction with one another by considering an intermediary Markov chain.
In general, a decorated , called the tree shape, where all branch points and edges have an affiliated non-negative integer mass such that any leaf edge has mass at least 1 and the sum of all masses is n.An [n]-tree naturally gives rise to such a decorated tree, by letting the [k]-tree be the reduced subtree of the [n]-tree spanned by the leaves labelled by [k] where all branch points of degree 2 are contracted away.Comparing this reduced subtree to the original [n]-tree, the mass associated with an insertable part, i.e. an edge or a branch point, now arise by counting all the leaves for which the insertable part is the first insertable part of the reduced [k]-tree that is reached on the ancestral line from the leaf.We denote the surjective map that yields a decorated [k]-tree of mass n from an [n]-tree by π . We will discuss this in more detail in Section 6.
In the following, let K n denote the transition kernel of the (α, γ)-chain from Definition 1.1.Define the following natural Markov kernel from the space T where T n is the nth step of the (α, γ)-growth process.
In order to introduce a down-up chain on T , let us abuse notation so that K n and Π •n k denote both the kernel and the corresponding Markovian matrix, whilst π •n [k] is both the projection map and the matrix π .
The decorated (α, γ)-chain on T •n [k] can be described in an autonomous way without referencing the transition kernel of the (α, γ)-chain.This is done in Proposition 6.7.By combining the Kemeny-Snell and intertwining criteria mentioned earlier we obtain another main result of this paper: Theorem 1.4 makes it possible to vary k and obtain a consistent system of projected Markov chains all running in stationarity, provided that the (α, γ)-chain itself is running in stationarity.
In the context of the (α, γ)-chain projected to the decorated [k]-tree spanned by the leaves labelled by [k], let us for each n ≥ k consider a decorated (α, γ)- denote the proportion of mass associated with the insertable part x ∈ ins(s) of the decorated tree at time m, stopped the first time Y e n (•) hits 0 for any leaf edge e, and assume that (Y x n (0)) x∈ins(s) converges in distribution as n → ∞.Then where (Y x ) x∈ins(s) is a Wright-Fisher diffusion with weights ( wx ) x∈ins(s) , where wx = w x − 1 whenever x is an external edge, wx = w x otherwise, and w x is defined by (1).This follows exactly as in [22].
Let 0 < γ ≤ α < 1.Let T • n denote the set of rooted multifurcating trees with n unlabelled leaves and consider the natural projection from T [n] to T • n .Then the (α, γ)-chain satisfies the Kemeny-Snell criterion, and the projected T • n -valued Markov chain evolves by simply (i) selecting a uniform leaf, (ii) deleting it, and (iii) using the growth rule of the unlabelled (α, γ)-Markov branching model [7] to insert a new leaf.The stationarity of this can also be seen directly from the sampling consistency of this Markov branching model.In Corollary 3 of [7] it was shown that, when suitably represented as discrete R-trees with edge lengths n −γ , these stationary distributions converges weakly, in the Gromov-Hausdorff topology to the distribution of a continuum random tree T α,γ .By Theorem 11 of [17] this can be strengthened to Gromov-Hausdorff-Prokhorov convergence of weighted R-trees.In the light of (5) we make the following conjecture: Conjecture.The unlabelled (α, γ)-chain with n 2 steps per time unit and edges scaled by n γ has a scaling limit that is a path-continuous Markov process in the Gromov-Hausdorff-Prokhorov space of weighted R-trees, whose stationary distribution is that of T α,γ .
In the binary case where γ = α, this conjecture strengthens results by Löhr et al. [19], in the case α = 1 2 , and by Nussbaumer and Winter [21], who use a state space of algebraic trees with a weaker topology that, in some sense, disregards the edge lengths.
The main new technique of this paper is the semi-planar (α, γ)-growth process and a down-up chain on what we will call semi-planar trees, which will be two auxiliary processes that we use to derive the results stated above.Crucially, both the (α, γ)-growth process and the (α, γ)-chain arise as projections of their semiplanar counterpart.We define the space of semi-planar trees to be trees where branch points are equipped with a left-to-right ordering of its subtrees (as a planar tree), such that we cannot distinguish the order of the two leftmost subtrees, which we define to be the two subtrees of the branch point with the smallest leaf labels.The processes we define on the space of semi-planar trees are furthermore of mathematical interest in their own right.The semi-planar (α, γ)-growth process is defined similarly to the (α, γ)-growth process, but the weight (c − 1)α − γ in a branch point with c children is split up so that there is weight α to the left of any subtree not leftmost, and weight α − γ to the right of the rightmost subtree.(α, γ)growth processes are known to have intimate links to random partitions induced by Chinese restaurant processes [2,24] and ordered Chinese restaurant processes [25], see e.g.[5,7,25], and the semi-planar trees provide exactly the right structure to further hook onto this link efficiently.
Compared to a non-planar tree, the additional structure will allow us to define a local search (as in step (ii) of the α-chain) that coincides with the local search from the α-chain in the cases where the leaf i selected for deletion in step (i) is located in a binary branch point.The key aspect of the down-step is that projecting the (α, γ)-chain onto the space of (non-planar) [n]-trees using the natural surjective projection yields exactly the (α, γ)-chain of Definition 1.1, if the initial distribution is picked according to an appropriate Markov kernel similar to (3).
The paper is structured as follows: Section 2 contains a brief overview of notation and structures relied on throughout the exposition.Most notable are urn schemes, (un)ordered Chinese Restaurant Processes, basic tree growth processes and a general setup for tree-valued down-up chains.We will also cover notation for operations on trees such as deletion, insertion and swapping of leaves.These fundamental notions will be updated to various tree spaces in later sections.
In Section 3 we develop the notion of semi-planar trees, construct and investigate properties of a semi-planar (α, γ)-growth process, and in Section 4 we go on to defining the semi-planar down-up chain and prove that it has a suitable stationary distribution.
The projection of the semi-planar down-up chain onto non-planar trees is carried out in Section 5, and Theorem 5.4 yields that this is the (α, γ)-chain of Definition 1.1.This section outlines the methodology that will be used in later sections.
In Section 6, an in-depth description of the behaviour of the semi-planar down-up chain projected to the space of decorated trees is provided.
Section 7 provides a selection of key results characterizing the link between projections of the semi-planar down-up chain and the semi-planar down-up chain, and Section 8 pulls together the threads from Section 5, 6, and 7, and proves Theorem 1.4.
Acknowledgements.This research is funded by the Lundbeck Foundation (R263-2017-3677) and the Engineering and Physical Sciences Research Council (1930434).

Preliminaries
2.1.Urn schemes.Urn schemes are intimately connected with Chinese Restaurant Processes and other exchangeable processes, and this connection will play a crucial role throughout this exposition.We will here give a brief account of properties of the generalized Pólya urn.Definition 2.1 (Generalized Pólya urn).Fix k ∈ N and w = (w j ) j∈ [k] where where and n ∈ N 0 .We call (X n ) n∈N a generalized Pólya urn with initial weights w.
More loosely, we will use the term urn scheme for any countable sequence of random variables governed by (6).Before summarizing some of the properties of the above urn scheme, let us briefly recall a few distributions related to urn schemes: we say that N has a Dirichlet-multinomial distribution with n trials and weights w, and denote this by N ∼ DM n (w).For k = 2, we will write N 1 ∼ DM n (w 1 , w 2 ), since N 2 = n−N 1 .This distribution is also known as the Beta-binomial distribution, denoted by BetaBin n (w 1 , w 2 ).
We now summarize a couple of properties which will be of value later: Proposition 2.3.Fix k ∈ N and w = (w 1 , . . ., w k ) where w j > 0 for each j ∈ [k], and let X = (X n ) n∈N be a generalized Pólya urn with weights w.Furthermore, for each j ∈ [k] and n ∈ N let N n j be as in Definition 2.1.Then (i) X is exchangeable, (ii) Chinese Restaurant Processes.We briefly recall the concept of Chinese Restaurant Processes.Definition 2.4.Fix α and θ such that either 0 ≤ α ≤ 1 and θ > −α, or α < 0 and θ = l max α for some l max ∈ N. The seating plan for the customers in a Chinese Restaurant Process with parameters (α, θ), an (α, θ)-CRP, is as follows: • The first customer sits at the first table.
• If the first n customers are seated at L tables (where L ≤ l max in the case where α < 0 and θ = l max α) with N 1 , . . ., N L customers seated at each table, customer n + 1 will sit at table l with probability N l −α n+θ , for each l ∈ [L], and at a new table, L + 1, with probability Lα+θ n+θ .The table-seating of the first n customers naturally induces a partition of [n] by saying that j and j ′ belong to the same block of the partition if and only if customers j and j ′ are seated at the same table in the Chinese Restaurant.It is well-known that in an (α, θ)-CRP the number of customers seated at the first table, N 1 , after having seated n customers in total, satisfies that N 1 ∼ BetaBin n (1 − α, α + θ).For a more extensive account of this process, see [24].
We can define an ordered Chinese Restaurant Process with parameters (α, θ) for 0 ≤ α ≤ 1 and θ ≥ 0, denoted by oCRP(α, θ), in a similar fashion.The seating dynamic is the same but we now associate a left-right ordering to the tables.For any number of tables L ∈ N, conditional on customer n + 1 sitting at a new table, L + 1, place this table • to the left of the leftmost table with probability α Lα+θ , • between any pair of neighbouring tables with probability α Lα+θ , and • to the right of the rightmost table with probability θ Lα+θ .This construction not only gives rise to a random partition of [n] but also a random permutation of [L], Σ, corresponding to the ordering of the L tables seating n customers.It holds that for every permutation σ of [L], where is the number of record values in σ.Whenever we wish to refer to the distribution of a random partition induced by an (α, θ)-CRP or (α, θ)-oCRP with n customers, we will denote these distributions by CRP n (α, θ) and oCRP n (α, θ), respectively.
We have the following result: Proposition 2.5 (Proposition 6 of [25]).For each n ∈ N, let N n = (N n 1 , . . ., N n k ) denote the vector of the number of customers seated in an (α, θ)-oCRP with n customers, enumerated in the table order of appearance.It holds that the corresponding sequence (N n ) n∈N is regenerative, i.e. that given the first part of N n is n 1 , then the remaining parts of N n sum to n − n 1 and have the same distribution as N n−n1 .Furthermore, where the decrement matrix, q α,θ , is given by For more information regarding (α, θ)-oCRP and their role in regenerative tree growth processes, see [25].

2.3.
Trees.Recall that T A denotes the space of (non-planar) rooted, multifurcating trees with leaves labelled by the elements of A, and call an element of T A an A-tree.For each j ∈ A we will refer to the leaf labelled by j as leaf j, whilst denoting the root vertex by ∅.Let edge(t) and vert(t) denote the set of edges and vertices of t ∈ T A , respectively.Notation 2.6.Fix a finite set A and t ∈ T A .
(i) Let e ∈ edge(t) be an edge between u, v ∈ vert(t).If u is closer to the root, ∅, than v, in the graph distance, we say that u is the parent of v, and that v is the child of u.For brevity, we will sometimes write u ≺ t v if u is the parent of v in t.We will write e uv if there is any ambiguity about which edge we are discussing, and in cases where we do not wish to assign a name to the parent of v, we will simply denote the edge between v and its parent by e v↓ .For the unique edge connected to root we will write e ∅↑ , and, to ease notation, we shall simply refer to a leaf edge by the label of the associated leaf.(ii) If v 1 , v 2 ∈ vert(t) have the same parent, we say that they are siblings.
(iii) We say that a vertex that does not have any children is a leaf, and will refer to an edge connected to a leaf as a leaf edge or an external edge.It is worth noting that the root is not a leaf, and importantly, we do not consider the edge e ∅↑ external.We will refer to any vertex apart from the root that is not a leaf as a branch point, and refer to the collection of these by bp(t).To ease notation later on, we will define insertable parts of the tree as ins(t) := bp(t) ∪ edge(t).(iv) The ancestral line from a vertex v ∈ vert(t) is the collection of vertices v 0 , . . ., v l , and edges, e 1 , . . ., e l , such that v 0 = v, v l = ∅, whilst for all j ∈ [l] it holds that v j is the parent of v j−1 with e j = e vj−1vj .Now fix t ∈ T A and some i ∈ A. If v 0 , . . ., v l and e vj−1vj j∈[l] denotes the ancestral line from leaf i, this naturally splits t into the ancestral line and a collection of spinal bushes each consisting of a collection of subtrees.To make this precise, note how for each j ∈ [l − 1] there is a collection of subtrees, (t j,j ′ ) j ′ ∈[cv j −1] , rooted at v j , such that each of them only contains descendants of v j that are not also descendants of v j−1 .Here c vj denotes the number of children of v j .We call this collection of subtrees a spinal bush, and will refer to the spinal bush located at v j as the jth spinal bush on the ancestral line from leaf i for each j ∈ [l − 1].Note how the ancestral line from i and the collection of the spinal bushes are disjoint apart from the root-vertices of the subtrees in the spinal bushes, and exhaust t, and are thus referred to as the spinal decomposition of t from i.
2.4.Tree growth processes.In order to define tree growth processes we first need to introduce various operations on trees: Definition 2.7 (Insertion of a leaf).Let A ⊂ N be a finite set, fix s ∈ T A , j ∈ N \ A, and x ∈ ins(s).The operation of inserting leaf j to x in s is the map (s, x, j) ϕ → t ∈ T A∪{j} where t is defined as follows.
• If x = e vw ∈ edge(s), t is obtained by adding new vertices v ′ and j to vert(s), and replacing the edge e vw with the three edges e vv ′ , e v ′ w , and e jv ′ .• If x = v ∈ bp(s), t is obtained by adding a new vertex, j, to vert(s), and an edge, e jv , to edge(s).
Definition 2.8 (Deletion of a leaf).Let A be a finite set and fix t ∈ T A .The operation of deleting leaf j ∈ A from t is the map where s is obtained in the following way.Let v be the parent of j and let c v be the number of children of v.If c v ≥ 3, s is obtained by removing j from vert(t) and e jv from edge(t).If c v = 2, let w be the sibling of j, v ′ the parent of v, and obtain s by removing j and v from vert(t) and e wv , e vv ′ , and e jv from edge(t) and subsequently adding e wv ′ to the latter.We denote by ρ the map that, after applying ρ, subsequently relabels A \ {j} by [#A − 1] using the increasing bijection.
If we wish to delete all leaves of t ∈ T A labelled by the elements of B A, we will denote this by ρ(t, B) and ρ(t, B), respectively, obtaining the resulting tree by successively applying the above definition to the elements of B.
Throughout this exposition we will need the following notion from [25]: Definition 2.9 (Tree growth process).A Markov chain, (T n ) n∈N , is a tree growth process if T n takes values in T [n] and ρ (T n+1 , n + 1) = T n for each n ∈ N.
Equivalently, one could define a tree growth process in terms of a sequence of Markov kernels, G = (G n ) n∈N , where G n is a Markov kernel from T [n] to T n+1 .We will refer to G as a growth rule.The main tree growth process of interest to us is the following: , by, for each x ∈ ins(s), setting and then defining the conditional distribution of T n+1 by The sequence (T n ) n∈N is referred to as the (α, γ)-growth process.
The (α, γ)-growth process was first introduced in [7], and encompasses several other well-known growth processes.For γ = 1 − α and 1 2 ≤ α < 1 the above growth process is Marchal's stable tree growth [20].For 0 ≤ γ = α ≤ 1 it is identical to Ford's α-model [8].And, as mentioned previously, a special case of Ford's α-model is Rémy's uniform binary tree growth process [26], which arises from choosing γ = α = 1 2 .One property of the latter growth process is that T n will be uniformly distributed on T bin [n] , the space of binary trees with n leaves [4], for each n ∈ N. We will refer to the growth rule governing the (α, γ)-growth process as the (α, γ)-growth rule.

Down-up chains.
Let us now introduce a general notion of down-up chain for trees that contains not only the Aldous down-up chain [4] but also the down-up chain proposed in [11].The definition of down-up chains given here is in accordance with down-up and up-down chains on more general state spaces studied in detail in [6,15,16,23].
, some growth rule, G, and a transformation function f : with T n (0) = t and with the following transition kernel.For each m ∈ N 0 obtain T n (m + 1) from T n (m) by (i) applying the transformation function, f , to T n (m) and a random variable (ii) deleting the leaf labelled Ĩ from T ′ n (m), and relabel { Ĩ + 1, . . ., n} by { Ĩ, . . ., n − 1} using the increasing bijection, i.e. defining (iii) and inserting leaf n to T ↓ n (m) according to G n−1 to obtain T n (m + 1).We will only work with the above type of down-up chains, but there are examples (see e.g.[11]) of down-up chains where there is no relabelling.In that case, ρ is replaced by ρ in (ii), and (iii) is replaced with a step where the leaf Ĩ is inserted directly into T ↓ n (m).This is particularly useful when the growth rule induces exchangeable leaf labels, which is the case for Rémy's uniform tree growth and Marchal's stable tree growth, but not for other (α, γ)-growth processes.
Let us focus on the role of the transformation map f .In the above we require I to be uniform on [n].As in step (i), say that s ′ , Ĩ := f (s, I).Even though I is uniformly distributed on the leaves of s, it will be dependent on f whether or not Ĩ is uniformly distributed on s ′ .This will allow us to make non-uniform deletions, which will be of vital importance.
In the basic cases, f will simply be the identity function, in which case we will call the chain an (id, G)-down-up chain and then specify G.For our purposes, it will be important to use more complicated function, in which case we will characterize the output of f explicitly in terms of operations on s and the value of I. Furthermore, if G is the (α, γ)-growth rule we will denote the associated chains by (f, (α, γ))down-up chain, and then specify f .We note that the (id, G)-down-up chain without relabelling, where G is the uniform growth rule, corresponds to Aldous' down-up chain [4].To prepare a generalization of down-up chains related to the (α, γ)-growth process we need the following definition: Definition 2.12 (Label swapping).Let A be a finite set.The operation of swapping labels i ∈ A and j ∈ A in s ∈ T A is the map where, if we let v i and v j denote the parents of i and j, respectively, t is obtained from s by replacing e ivi and e jvj with e ivj and e jvi , respectively, in edge(s).
Let us discuss two examples in the binary setting studied in [9] and [11].To this end, let s ∈ T bin [n] , and define a transformation function, f , by setting f (s, i) = (τ (s, i,ĩ) ,ĩ), where a : = min {leaves in 1st subtree on the ancestral line from i in s} , b : = min {leaves in 2nd subtree on the ancestral line from i in s} , and ĩ : = max {i, a, b}.Then the (f, G)-down-up chain where G is the uniform growth rule is exactly the uniform chain on T bin [n] without relabelling described in [9].If instead G is the α-growth rule, then the (f, G)-down-up chain with relabelling is the α-chain of [11].
Theorem 2.13 (Theorem 1 of [11]).Let (T n (m)) m∈N0 denote the uniform chain on T bin [n] and let T n denote the nth step of the uniform growth process.Then the unique invariant distribution of the uniform chain is that of T n , i.e. the uniform distribution on T bin [n] .

A semi-planar growth process
In order to introduce an appropriate down-step in the multifurcating case we start by introducing a refined version of the (α, γ)-growth process.In the following we will, for any c ∈ N, let S [c] denote the space of permutations of [c].We will require the existence of an element of S ∅ , and we will use the convention that this element is denoted by 0. Additionally, we will use the notation that Recall one of the many ways of defining a planar tree: , where c v is the number of children of v.We call t * = (t, σ * ) a planar tree with tree shape t, and denote by T * A the space of planar trees with leaves labelled by A. Let us introduce some vocabulary for planar trees that will enable us to work efficiently with these objects.Let t * = (t, σ * ) ∈ T * A be given, fix v ∈ bp(t), and let t * v 1 , . . ., t * v cv denote the planar subtrees rooted at v, enumerated in the order of least elements of the associated sets of leaf labels.We say that t * v i is to the right Illustration of a semi-planar tree with the allocation of weights within the branch points.The horizontal orange lines are expansions of the branch points illustrating the c v − 1 insertable parts associated with branch point v.The only non-trivial element of σ is σ v characterized by σ v (1) = 2 and σ v (2) = 1, and the only other semi-planar tree with tree shape t, would come from swapping the position of the subtrees labelled by {3} and {6, 8}, such that v would have the associated permutation From the above it is also clear what we mean by the terms rightmost, left and leftmost which are defined in the obvious fashion.
Any planar version of the (α, γ)-growth process will have to have a total weight of α − γ in a branch point with two children, and then iteratively adding a weight of α for every subsequent leaf being inserted into that branch point.Hence, in a branch point with c children there are only c − 1 natural weights to use.One way to incorporate this dynamic, is to designate the original two subtrees of the branch point a special role, and only allow insertions to the right of those subtrees.These two subtrees are, by definition of a growth process, easily identifiable as the two subtrees with the lowest leaf labels.If we insist that these two subtrees are located leftmost in the branch point, we can limit our attention to the following state space: Definition 3.2 (Semi-planar Tree).Let A be a finite set and fix t where c v is the number of children of v.We call t = (t, σ) a semi-planar tree, and denote by T A the space of semi-planar trees with leaves labelled by A.
For an illustration of an element of T [8] , see Figure 1.The space of semi-planar trees, T A , arises from T * A as follows: (i) consider only the elements ŝ * ∈ T * A such that ŝ * = (s, σ * ) where for all v ∈ bp(s) and l ∈ {1, 2} we have σ * v (l) ∈ {1, 2}, and (ii) identify For each element t = (t, σ) ∈ T A we will systematically pick a representative , where c v denotes the number of children of v.
It is clear that there is a surjective projection map from the space of semi-planar (or planar) trees to the space of trees, π : T A → T A for any A ⊆ N, defined by We will continue to use all the previous terminology such as edge set, set of branch points, and ancestral path for a semi-planar tree, t, by which we will mean the corresponding versions in π( t).Before introducing a semi-planar version of the (α, γ)-growth process, let us update the notion of inserting a leaf to the semi-planar setting, recalling that ϕ denotes the insertion map of Definition 2.7.
x ∈ ins(s) and l ∈ N 0 .The operation of inserting leaf j to x in location l in s is the map (s, x, l, j) φ → t ∈ T A∪{j} defined as follows.
• If x = e ∈ edge(s), only l = 0 is allowed and t = (ϕ (s, e, j) , σ ϕ ) where is the parent of leaf j in ϕ(s, e, j), and σ ϕ v = σ v for all other v ∈ bp(ϕ(s, e, j)); where and Just as in the previous section, we will write φ(ŝ, x, j, l) in both of the above cases, and then specify x ∈ edge(s) or x ∈ bp(s) for the leaf insertion into the edge or vertex, respectively.Definition 3.3 simply states, that whenever we insert a new leaf into a vertex, we can insert it to the right of the rightmost or in between any two children, with the exeption of the two leftmost, of that vertex.We are now ready to define a semi-planar version of the (α, γ)-growth process: Definition 3.4 (Semi-planar (α, γ)-growth process).Fix 0 ≤ γ ≤ α ≤ 1.Let T 1 be the unique element of T 1 .For any n ∈ N construct T n+1 conditional on T n = t, by setting and then defining the conditional distribution of T n+1 by for each x ∈ ins t and l ∈ [c x − 1] for x ∈ bp( t) and l = 0 for x ∈ edge( t).The sequence T n n∈N is referred to as the semi-planar (α, γ)-growth process.
The weights specified in (12) are such that the projection of the semi-planar (α, γ)-growth process from T [n] onto T [n] exactly yields the (non-planar) (α, γ)growth process from Definition 2.10.
In the semi-planar growth process, one can think of the additional structure of a branch point as a small comb tree rooted at the right end and with a leaf for every subtree, where naturally the two leftmost leaves connect to the same branch point, and where internal edges between branch points have weight α and the edge to the root has weight α − γ (see Figure 1).In particular, we do not allow any insertions between or to the left of the two leftmost subtrees.If T n = t ∈ T [n] , it thus holds for any v ∈ bp( t) that if i 1 , . . ., i c denotes the least label in each of the subtrees rooted at v, enumerated in the order of least element, then i 1 and i 2 are always located in the two leftmost subtrees.This is not the first definition of a binary tree growth process within the (α, γ)chain.In [7], the authors use a colouring scheme of the edges to make the translation between the multifurcating and the binary setting, whereas we work with a semiplanar structure for the multifurcating trees.
Let us now investigate the left-to-right ordering of the subtrees in a branch point further.For each v ∈ bp( t), let i 1 , . . ., i cv denote the smallest leaf label in each of the c v subtrees rooted at v, enumerated in the order of least element, and let t1 , . . ., tcv denote the corresponding subtrees of t.Then, by definition of semiplanar trees, t1 and t2 , are located leftmost in v.The semi-planar (α, γ)-growth process assigns a random location to each of the remaining c v − 2 subtrees within the branch point, by inducing a permutation of [c v − 2], σ v , associated with v. Letting Σ v denote the random permutation induced by the growth process associated with v, and disregarding the two leftmost subtrees, we have that tl is the We can characterize Σ v in the following way: Lemma 3.5.Let T n be the nth step of the semi-planar (α, γ)-growth process, and where p cv−2 α,α−γ is the probability function of the random permutation of [c v −2] induced by the random order of the tables in an (α, α−γ)-oCRP with c v −2 tables, described by (8).
Proof.The assertion that (Σ v ) v∈bp t is conditionally independent given π T n = t is a consequence of the independence of the leaf insertions.Now fix v ∈ bp(t), and observe how the subtrees rooted at v are analogous to the tables in an ordered Chinese Restaurant.Firstly, note that there is nothing to prove for c v = 2, so fix c v = 3.By construction the two subtrees with the smallest labels will be located leftmost in v and there will be a single subtree to the right of them.That subtree will have weight α − γ to the right and α to the left.This corresponds exactly to the (α, α − γ)-oCRP with 1 table.For any other c v > 3, let us disregard the two leftmost subtrees, only focus on the remaining c v − 2, and consider how these have been placed by the growth process.Each insertion corresponds to a new customer in the ordered Chinese Restaurant analogy, and each subtree corresponds to a table.Amongst these subtrees, there will be weight α to the left of the leftmost subtree as well as in between any neighbouring pair.Furthermore there will be weight α − γ to the right of the rightmost subtree.This is exactly the same weight specification as in an (α, α − γ)-oCRP with c v − 2 tables, which finishes the proof.

Lemma 3.5 naturally characterizes a conditional distribution on T [n]
. Fix some t ∈ T [n] and, for each v ∈ bp(t), let c v denote the number of children of v. Then

A semi-planar down-up chain
The motivation for introducing the semi-planar version of the (α, γ)-growth model was fundamentally to get an ordering of the subtrees within a multifurcating branch point.This was done to enable a generalization of the α-chain described in the introduction, by generalizing the local search from a designated leaf to a multifurcating setting where there might be more than one subtree to search in the first branch point on the ancestral line.
To avoid any ambiguity, we will update the notions of swapping and deleting leaves, respectively, before defining a semi-planar down-up chain.The sole reason for defining the label swapping map τ on the space of planar trees, T * A , rather than directly on T A , is that certain label swaps will bring us outside of the latter state space.However, the definition can easily be applied to elements of T A by using the identification described just after Definition 3.2.See Figure 2 where b is the smallest leaf label in the second spinal bush on the ancestral line from i. and , let w denote the grandparent of i, and define σ * w analogously to the above, with the slight alteration that l i and l j denotes the ranks of the subtrees containing i and j, respectively, in w, in the order of least elements.
The above definition seems technical, but it simply states that all subtrees in v i and v j keep the positions, with the subtree containing j in v j taking the position of the subtree containing i in v i and vice versa.The label swapping will be used prior to deleting a leaf in the semi-planar down-up chain defined in this section.Swaps like this will occur in the down-up chain, but the subsequent deletion of a leaf will make sure we end up with a semi-planar tree again. where , where c v is the number of children of v in t, and i j is the rank of j amongst the smallest leaf labels of the subtrees rooted at v in t.As in Definition 2.8, we denote by ρ * the map that, after applying ρ * , subsequently relabels A \ {j} by [#A − 1] using the increasing bijection.
Forget the internal order between the two leftmost subtrees.  .The orange "edges" is an illustrative expansion of the branch point v so that the left-to-right ordering of the subtrees of v becomes apparent.Thus, all semiplanar and planar trees depicted above have the same tree shape, but have different permutations associated to v. Label swapping in t (bottom left) occurs by constructing the planar version, t * (top left), and then swapping the labels in the planar version.Swapping labels 1 and 2 yields a planar tree corresponding to a semi-planar tree since the two subtrees with the lowest labels are located leftmost.This is not the case if we swap labels 2 and 3, and thus the bottom right is not a semi-planar tree, i.e. not an element of T [3] .
Of course, (17) is not needed in the case where c v = 2 since the branch point will completely disappear upon the deletion of leaf j, and the corresponding permutation will thus not be included in σ * ρ .The above definition states, that if t * l and t * r are the neighbouring subtrees to the left and right, respectively, of leaf j in t * before deletion, then t * l is the left neighbour of t * r after the deletion.
is defined by setting f ( t, i) = τ * t, i,ĩ ,ĩ , with ĩ = max {i, a, b}, where we by l i denote the rank of i amongst the least leaf labels of the c v subtrees of the parent v of i in t, set a = min leaf labels in the first spinal bush on the ancestral line from leaf i , with the convention that b = 0 if c v = 2 and the ancestral line from leaf i has only one spinal bush.
Firstly, recall that we can apply τ , and hence f , to elements of T [n] (instead of T * [n] ) by using the canonical way of translating a semi-planar tree to a planar tree described just after Definition 3.2.However, it is not clear a priori that carrying out the down-step of the above chain started from a semi-planar tree again yields one such.This is needed for the above to be well-defined as a Markov chain on T [n] .The following result shows that this is indeed the case: Lemma 4.4.Let A be a finite set and fix t ∈ T A .For any i ∈ A it holds that ρ * τ * ( t, i,ĩ),ĩ ∈ T A\{ĩ} , where ĩ is defined as in Definition 4.3.
Proof.We have to check that for every branch point in ρ τ * ( t, i,ĩ),ĩ , the two subtrees with the smallest leaf labels, respectively, are located leftmost.From the definition of the semi-planar (α, γ)-chain it is clear that the only two branch points in t where any changes can take place are the parent v and potentially the grandparent v ′ of leaf i in t, since all other branch points will remain unchanged as i ≤ ĩ and ĩ is the smallest leaf in its subtree.We now split up into cases based on the location of leaf i in v ∈ bp( t * ), where t * is the canonical representative of t in A described just after Definition 3.2.
If i is not one of the two leftmost subtrees in v, there is no option of swapping with the smallest leaf label in either of the two leftmost subtrees, as they are both smaller than i by definition of a semi-planar tree.Now consider the case where i is placed leftmost in v, with v having more than two children.In t * , i will then be placed leftmost or second leftmost in v.In either case ĩ = b will be the smallest label in the third leftmost subtree of v in t * .τ * ( t, i,ĩ) ∈ T * A will not correspond to an element of T A , but consider the ordering of the subtrees of v in this planar tree.i will have retained its rank (1 or 2) amongst the least labels of the subtrees of v, and will now be located in the third leftmost subtree.Upon the deletion of ĩ in τ * ( t, i,ĩ) the two leftmost subtrees will again be the ones containing the two smallest of the least leaf labels of the subtrees of v.
The last case is when i is placed leftmost in v, with v only having two children.Here, b will always be found in v ′ , the parent of v, and no matter which swap occurs, the branch point v will disappear upon deletion of ĩ.Hence we only need to check that the new ordering of subtrees in v ′ is acceptable.If ĩ ∈ {a, i} there is nothing to prove, as this means that the two subtrees with the lowest labels in v ′ will not be affected by the swapping of i and ĩ and subsequent deletion of ĩ.If ĩ = b, this means that b was the smallest leaf label (not in a subtree of v) in v ′ and that a was even smaller.Hence, by definition of the (α, γ)-growth rule, the smallest labels in the two leftmost subtrees of v ′ after swapping and deleting, respectively, will be a and i.
We will approach the proof of Theorem 4.6 via the following technical result.The statement of the lemma as well as the proof technique is an adaptation of Lemma 4 and the proof of Theorem 1 in [11], designed to work in the case where α = γ = 1 2 .It is possible to prove that the stationary distribution of a down-up with the transition kernel described by Proposition 5.2 is the distribution of the the nth step of the (α, γ)-growth rule without any reference to a semi-planar structure, thus making this lemma obsolete, in the case where γ = 1 − α for α ∈ [0, 1].However, this falls outside the scope of this exposition.
. Illustration of three of the characterizations of the event E i,ĩ in the case where i < ĩ and i is located in a multifurcating branch point, from the proof of Lemma 4.5.The horizontal lines in each illustration correspond to the expansion of branch points described just after Definition 3.4.Each row is an insertion scenario, where the left column is an illustration of T ĩ −1 where the insertion location of ĩ is colored green, while the right column is an illustration of T ĩ where the edges colored in red are the ones where insertions of ĩ +1, . . ., n are disallowed.
where a and b are as in Definition 4.3.Then E i,ĩ ⊥ ⊥ T ĩ −1 and, conditional on the event E i,ĩ , ρ * τ * T n , i,ĩ ,ĩ and T n−1 have the same distribution.
Proof.We will prove the first assertion by splitting into two cases.The overall proof technique will be to translate E i,ĩ to restrictions regarding the growth process and to use this characterization to obtain the result.
Case A: i < ĩ.In this case, E i,ĩ is equivalent to one set of the following disjoint events, some of them illustrated in Figure 3, where we will denote the parent branch point of i in T ĩ −1 by v and the number of children of v in T ĩ −1 by c v .
• if c v = 2 then either (i) ĩ is inserted into v (as right neighbour of i), and (ii) ĩ +1, . . ., n are not inserted on the leaf edge i or as right neighbour i in v in T ĩ, . . ., T n−1 , respectively, or (i) ĩ is inserted on the parent edge of i, e v↓ , and (ii) ĩ +1, . . ., n are not inserted on the leaf edge i, into the branch point v, nor on the edge e v↓ in T ĩ, . . ., T n−1 , respectively; (i) ĩ is inserted as the right neighbour of i in v, and (ii) ĩ +1, . . ., n are not inserted on leaf edge i or between i and the subtree containing ĩ in the branch point v in T ĩ, . . ., T n−1 , respectively.This situation is depicted in the top row of Figure 3.
• if c v > 2 and i is not leftmost in v, then (i) ĩ is as the left neighbour of i in v, and (ii) ĩ +1, . . ., n are not inserted on leaf edge i or between i and the subtree containing ĩ in the branch point v in T ĩ, . . ., T n−1 , respectively.This situation is depicted in the middle row of Figure 3.In addition to the above, there is, regardless of which of the above scenarios we are in, the option to (i) insert ĩ on the leaf edge i in T ĩ −1 , and (ii) not insert ĩ +1, . . ., n on the first two edges nor in the first branch point on the ancestral line from leaf i in T ĩ, . . ., T n−1 , respectively, which is illustrated in the bottom row of Figure 3.
Case B: i = ĩ.By the same reasoning as earlier, E i,ĩ is equivalent to • not inserting ĩ +1, . . ., n on the first two edges, nor in the parent branch point of i, on the ancestral line from i in T ĩ, . . ., T n−1 , if i = ĩ was inserted into an edge of T ĩ −1 , or • not inserting ĩ +1, . . ., n on the leaf edge i = ĩ or as the left neighbour of i = ĩ in the first branch point on the ancestral line from i in T ĩ, . . .T n−1 , respectively, if i = ĩ is inserted into a branch point of T ĩ −1 .We have now characterized the event E i,ĩ in terms of making or not making insertions, respectively, to specific edges with weights summing to 1 in all cases.This implies that the probability of making or not making these insertions, respectively, is the same in all cases, and resolving the telescoping products, we find For the second assertion we start by noting that due to the independence proven above.Once again using the characterization of E i,ĩ , we note that we can construct ρ * τ * T n , i,ĩ ,ĩ from T ĩ −1 under the conditional law of E i,ĩ : The leaf ĩ is never attached and so the "forbidden edges", characterized above and coloured red in Figure 3, do not exist.But every leaf that was previously labelled ĩ +1, . . ., n will still be attached according to the (α, γ)growth rule since the total weight of the forbidden edges are exactly 1 in all cases.This finishes the proof.Proof.Fix n ∈ N.That the semi-planar (α, γ)-chain on T [n] has a unique stationary distribution is easily seen, since the state space is finite and there is a unique, recurrent communicating class for • α = 1 and all γ ∈ (0, 1) (only insertions in branch points or internal edges), • γ = 0 and all α ∈ (0, 1) (only insertions in branch points or leaf edges), • γ = α and all α ∈ [0, 1] (binary trees), • α = 1 and γ = 0 (only insertions in branch points), whilst the Markov chain is irreducible otherwise.Thus we are done if we can show that T n (1) D = T n (0) := T n , which we will do by utilizing Lemma 4.5.Let I be a uniform random variable on [n], independent of everything else, let f denote the transformation function from Definition 4.3 and note that for all t ∈ T n−1 where we have used the independence of I, the definition of f as well as Lemma 4.5.This shows that ρ * f T n , I D = T n−1 , and since T n (1) is constructed from the former using the same growth rule used to construct T n from the latter, this finishes our proof.

Projecting to a non-planar down-up chain
In the previous section we constructed a semi-planar down-up chain on T [n] with the distribution of T n as its stationary distribution, where T n denotes the nth step of the semi-planar (α, γ)-growth process of Definition 3.4.Recalling that π : T [n] → T [n] denotes the projection map from the space of semi-planar trees to that of non-planar trees, we have already noted that π( T n ) D = T n , where the latter denotes the nth step of the (α, γ)-growth process of Definition 2.10.Now let us construct a non-planar down-up chain on T [n] from the semi-planar version.
Fix n ∈ N. Let Kn denote the transition kernel of the semi-planar (α, γ)-chain on T [n] and define a π-induced Markov kernel from T for each t ∈ T [n] and t ∈ T [n] , where c v denotes the number of children of a branch point v ∈ bp(t) and σ v is the permutation of [c v − 2] associated with the same branch point in t.The latter expression stems from Lemma 3.5 and ( 15).Let us abuse notation, so that Πn also denotes the induced Markovian matrix Πn := Πn t, t , whilst π := 1 π( t)=t t∈ T [n] ,t∈T denotes the matrix linking a semi-planar tree to its non-planar counterpart.We now give an alternative definition of the (α, γ)-chain on T [n] which we identify with Definition 1.1 in Proposition 5.2: with transition kernel The transition kernel K n is defined so that the diagram in Figure 4 commutes, and a more detailed explanation of the transitions of the (α, γ)-chain, (T n (m)) m∈N0 , is consequently the following: according to Πn (t, •) defined in (19).(ii) Make one transition of the semi-planar (α, γ)-chain started from t, to obtain t′ .(iii) Project t′ to the space of non-planar trees, i.e. set T n (m + 1) = π( t′ ).
The following result ensures that the two ways of defining the (α, γ)-chain (Definitions 1.1 and 1.1) are compatible.Proof.Before we consider the down-step itself, note that it does not make any difference if we perform the up-step for the semi-planar (α, γ)-chain and then project to the non-planar trees, or if we project to the non-planar trees immediately after the down-step, and then perform the up-step by using the non-planar (α, γ)-growth process, see the brief discussion following Definition 3.4.This ensures, that we in the following only have to argue that the down-step outlined in Steps 1 and 2 is the correct characterization.So fix t ∈ T [n] and sample T from Πn (t, If i is in a binary branch point in t, the same is the case in T .Hence the characterization of Ĩ is trivial, as it is exactly the same here as in Definition 4.3.
If i is in a multifurcating branch point with c v > 2 children in t, we need to split up into cases based on the location of i within the branch point in T .So let v denote the parent branch point of i, let i 1 , . . ., i cv denote the smallest leaf labels in the c v subtrees of v, enumerated in increasing order, and say that i = i j for some j ∈ [c v ].This is all deterministic based on t and i.Furthermore, recall the definition of a and b from Definition 4.3.
If i is placed leftmost in v in T , i.e. if j ∈ [2], then the b of Definition 3.4 is the smallest leaf to the right of the two leftmost subtrees, which will always be the largest of i, a and b, implying that ĩ = max{i, a, b} = b.Hence, conditional on T n (m) = t and I = i, the probability of Ĩ = i j ′ for some 2 < j ′ ≤ c v is the probability that T has the subtree containing i j ′ placed to the immediate right of i in v. Considering how these subtrees were formed by the growth rule in the ordered Chinese Restaurant analogy outlined in Lemma 3.5, this is the probability having table j ′ − 2, enumerated in the order of least element, placed leftmost in an (α, α − γ)-oCRP with c v − 2 tables.
If i is not places leftmost in v in T , i.e. if 2 < j ≤ c v , b is the smallest leaf in the tree to the left of i in v.In this case we always find that ĩ = max{i, b} as a by definition is equal to i 1 .Consequently, conditional on T n (m) = t and I = i, the probability of Ĩ = i = i j stems from the probability that T has one of the subtrees containing i 1 , . . ., i j−1 , respectively, placed to the immediate left of i in v. Hence in the ordered Chinese Restaurant analogy, this corresponds to not opening a new table to the immediate left of table j − 2, enumerated in the order of least element, in an (α, α − γ)-oCRP with c v − 2 tables.Similarly the probability of getting Ĩ = i j ′ for any j < j ′ ≤ c v corresponds to having table j ′ − 2 to the immediate left of table j − 2 in the same ordered Chinese Restaurant.This defines the conditional distribution of Ĩ outlined above.
The following result now follows immediately from Theorem 4.6.
Proposition 5.3.Fix n ∈ N and let T n denote the nth step of the (α, γ)-growth process.The (α, γ)-chain on T [n] started from T n is stationary.
Proof.First note that the distribution of T n is pushed forward to the distribution of T n by Πn , where Πn is defined by (19) and T n denotes the nth step of the semi-planar (α, γ)-growth process.By Theorem 4.6 the semi-planar (α, γ)chain started from T n is stationary, so if T n (m) m∈N0 denotes the semi-planar (α, γ)-chain started from T n , then T n (1) Lastly, by construction of the semi-planar (α, γ)-growth process, we have that π( T n ) D = T n , and so Proposition 5.3 is a fairly easy result to obtain due to the construction of the (α, γ)-chain on T [n] and the results regarding the semi-planar (α, γ)-chain at our current disposal.However, the link between the two down-up chains is somewhat deeper than what this result eludes to., then (T n (m)) m∈N0 is the (α, γ)-chain started from t with stationary distribution being that of T n , the nth step of the (α, γ)-growth process.
Theorem 5.4 is substantially deeper result than both Propositions 5.2 and 5.3, but will defer the proof as we have yet to develop the methodology to prove it, and as it is a special case of Proposition 6.7 and Theorem 8.4.

A Markov Chain on decorated trees
So far we have introduced a down-up Markov chain on semi-planar trees, and shown that there is a corresponding Markov chain on the space of (non-planar) trees.We now continue our endeavour to define down-up chains on various spaces of trees that arise as projections of [n]-trees.To make this precise, we introduce the notion of decorated trees, as well as an intermediary known as collapsed trees, which arise from affiliating every element of ins(t), the collection of edges and branch points of the tree t ∈ T [n] , with an integer or a set of integers, respectively.Definition 6.1 (Decorated Trees).Let A be a finite set, and fix some #A ≤ n ∈ N. Let s ∈ T A be given, and let y = (y x ) x∈ins(s) be a collection of non-negative integers such that x y x = n where y x ≥ 1 for every external x ∈ edge(s).We call t • = (s, y) a decorated A-tree of mass n, and we denote the space of such trees by A , and will refer to s as the tree shape of t.
For any finite sets A ⊆ B with #B =: n we can easily obtain a decorated A-tree of mass n from any t ∈ T B by projection onto the subtree spanned by the leaves with labels in A.
Let us explain this more formally.In the remainder of this exposition we will use the notation, s ⊆ t ∈ T B , to mean that there exists a set A ⊆ B such that ρ(t, B \ A) = s.Consequently, we will insist that vert(s) ⊆ vert(t) in the usual sense, and hence it is meaningful to talk about a branch point of s being on the ancestral line from a leaf in t.Note however that, unless we are in the degenerate situation where A = B or where all leaves labelled by B \ A have a parent branch point of s in t, we will have edge(s) edge(t).
So fix t ∈ T B , and define s := ρ (t, B \ A), where we recall that the latter means that we delete all leaves with labels of B not in A from t, see Definition 2.8.Recalling that u ≺ t v denotes that u is the parent of v in t, we now define the decoration function g : B → s ⊆ t by setting in a subtree of v in t which only contains leaves labelled by elements of B \ A, and and v 1 , . . ., v m−1 ∈ bp(t) \ bp(s) such that for all j ∈ [m] it holds that v j−1 ≺ t v j , v 0 ≺ s v m , and that there is a j ∈ [m − 1] such that i is in a subtree of v j in t which only contains leaves labelled by elements of B \ A.
Then define for each x ∈ ins(s), where δ xy is the Kronecker delta, and define y = (y x ) x∈ins(s) .Finally, define t • = (s, y).For each k ≤ n ∈ N, we will denote the map . This construction yields another family of trees as well: Definition 6.2 (Collapsed Trees).Let A be a finite set, and fix some #A ≤ n ∈ N.With g being the decoration function, we denote the map that sends t ∈ T [n] to its associated collapsed A-tree of mass n by π * n A : where s = ρ (t, [n] \ A).We denote the space of such trees by The aim of this section is similar to that of the previous one, where we studied a Markov chain on T [n] by using a kernel to lift an element t ∈ T Paraphrasing, we sampled the order of the subtrees within each branch point of t, then carried out both the down-and up-step, respectively, of the semi-planar (α, γ)-chain on T [n] , and finally projected back to T [n] .
Constructing a Markov chain on decorated trees corresponding to the semi-planar (α, γ)-chain on T [n] , is done by the following procedure.Fix t . Similar to the construction of Πn in ( 19), we will construct the kernel Π•n k , by setting where T n is the nth step of the semi-planar (α, γ)-growth process, and π•n • π denotes the projection from semi-planar n-trees to decorated [k]-trees of mass n, for brevity.Define a transition kernel on T where Kn is the transition kernel for the semi-planar (α, γ)-chain and we have used the same notation as in (20) for the composition of kernels.
k denote the transition kernel defined by (25).The decorated (α, γ)-chain on T The goal of the remainder of this exposition is two-fold.On one hand we wish to show that if we project semi-planar (α, γ)-chains of Definition 4.3 to the space of decorated [k]-trees of mass n, we obtain the decorated (α, γ)-chain of Definition 6.3.On the other hand we wish to describe the transition kernel of the decorated (α, γ)chain without referencing the semi-planar transition kernel.It turns out that the latter is much easier, and so we will take on this task first.
So consider the behaviour of the semi-planar (α, γ)-chain projected to decorated trees, i.e. for k < n consider observing π , and recall that the semi-planar (α, γ)-chain is characterized by composing two kernels in a down-and an up-step.The focus in this section is to specify similar steps on the space of decorated trees, so that doing a down-or up-step, respectively, in the semi-planar (α, γ)-chain and then projecting to a decorated tree, is the same as projecting to a decorated tree and then performing a down-or up-step, respectively.It turns out that the down-step is significantly more complicated than the up-step, and so we start with doing the latter.Definition 6.5 ((α, γ)-growth process for decorated trees and let We immediately note that the above is simply a Pólya urn scheme with initial weights labelled by the elements of ins(s).Specifically, the initial weights will be 1 − α for an external edge, γ for an internal edge, and (c − 1)α − γ for a branch point with c children in s.This ensures that if (T n ) n∈N is the (α, γ)-growth process then for all n ≥ k.Additionally it holds that and let T ↓ n (m) denote the semi-planar chain after the down-step.Then (26) ensures that we end up with the same distribution on the space of decorated [k]-trees of mass n by either (i) projecting T ↓ n (m) to the space of decorated [k]-trees of mass n− 1, and then performing the up-step from the (α, γ)-growth process for decorated trees, or (ii) performing the up-step from the semi-planar (α, γ)-growth process, and then projecting the resulting tree to the space of decorated [k]-trees of mass n.
What is more technical is how to construct a down-step on the space of decorated trees that is consistent with the down-step defined in the semi-planar (α, γ)-chain.
For intuition, let us consider a semi-planar (α, γ)-chain, T n (m) m∈N0 , and assume that, before carrying out the down-step, T n (m) = t and π•n . Now observe how there are three separate scenarios for the deletion of a leaf in the down-step: • We select a leaf with label i > k in t that contributes to exactly one of the decorated masses y x for some x ∈ ins(s), in which case we reduce that mass by one, but the tree shape s and all other decorating masses stay unchanged.• We initially select a leaf with label i ∈ [k] in t, search for a and b as defined in Definition 4.3, and define ĩ = max{i, a, b}: -If ĩ ≤ k we will swap i and ĩ, delete ĩ, and relabel (ĩ +1, . . ., k + 1) by (ĩ, . . ., k) using the increasing bijection.Depending on which y x the leaf labelled by k + 1 contributed to in (s, y), this can result in the tree shape changing.The decorating masses need to change accordingly.-If ĩ > k we will retain the same tree shape, irrespective of the location of i in t prior to the swap.Indeed, if i was located in a multifurcating branch point of t, our search rule guarantees that ĩ is found in a subtree of the same branch point.If i was located in a binary branch point of t, ĩ will either be the smallest label in the other subtree of that branch point, or be the smallest leaf in the second spinal bush on the ancestral line from i, but due to the deletion of the binary branch point after the swap, this will not change the tree shape of the projected tree.See Figure 5 for an illustration.
From the above description it is clear, that the most complicated scenario of the down-step is when ĩ ≤ k, whereas the other scenarios are easier to deal with.For an example of one of down-steps, see Figure 5.
In the following definition, we will for fixed n ∈ N and t ∈ T [k] let w x denote the insertion weight associated with x ∈ ins(s) from Figure 5.An illustration of the down-step in the semi-planar (α, γ)-chain on T [8] .Starting from t ∈ T [8] (top left), leaf i = 3 is selected for deletion, whereby a = 1 and b = 6 so that ĩ = max{i, a, b} = 6.Hence the leaves labelled by 3 and 6, respectively, are swapped, and subsequently the leaf labelled by 6 is deleted, and ) is relabelled by (6, 7), yielding the top right semi-planar tree.
In the bottom row the projection of the two trees onto T •8 [3] and T •7  [3] , respectively, are depicted with the decorated masses marked in orange.Definition 2.10.To ease notation, we will for (s, y) ∈ T for all x ∈ ins(s).
from t • in the following way: 1. Select x ∈ ins (s) with probability ỹx n−k , and define s re = ϕ(s, x, k). 2. Define t • re = (s re , y re ), where y re = (y re x ′ ) x ′ ∈ins(s re ) is constructed by setting y re x ′ = y x ′ for all x ′ ∈ ins(s) \ {x} and splitting up y x in the following way: We will use the above resampling mechanism to give an autonomous description of the transition kernel of the decorated (α, γ)-chain from Definition 6.3.The arguments used in the proof of the following proposition are similar to the ones used to argue the spinal decomposition for the (α, γ)-growth model in [7].In the following, recall that q α,θ denotes the decrement matrix of the (α, θ)-oCRP defined by (10).
, can be characterized as follows: -Given N = n g > 0, sample Y new from q α,α (n g , •), and replace y x with Y new and y v with y v − Y new .
-If N = 0, perform the down-step of the (α, γ)-chain on if y v = 0 and y e v↓ > 0, define s ′ = ϕ (ρ(s, i), e v↓ , i), and let if y v = y e v↓ = 0, perform the down-step of the (α, γ)-chain on , and resample leaf k in t • ↓ .3. Perform an up-step using the Markov kernel of the decorated (α, γ)-growth process from Definition 6.5.
Proof.Before we consider the down-step itself, note that it does not make any difference if we perform the up-step for the semi-planar (α, γ)-chain and then project to the decorated trees, or if we project to the decorated trees immediately after the down-step, and then perform the up-step by using the decorated (α, γ)-growth process, see the brief discussion following Definition 6.5.This ensures, that we in the following only have to argue that the down-step outlined in Step 1 and 2 is the correct characterization.Consider the decorated (α, γ)-chain started at t ) in the down-step of the semi-planar (α, γ)-chain from T , condition on I = i and Ĩ = ĩ, delete ĩ, and relabel ĩ +1, . . ., n by ĩ, . . ., n−1 using the increasing bijection, obtaining after the down-step.Our aim is to prove that Fundamentally, we now split up into cases based on t • and i where we recall the notation B x = g −1 ({x}) for x ∈ ins(s) where g is the decoration function from T to s.Since T is random, the sets (B x ) x∈ins(s) form a random partition of [n].
However, we note that in all cases Step 1 is consistent with how we select leaf I in the semi-planar (α, γ)-chain, since I ∼ Unif([n]) immediately implies for all x ∈ ins(s).
In the following we will let a, b and ĩ = max{i, a, b} be defined as in Definition 4.3, and will let (S ↓ , Y ↓ ) := π•(n−1) [k] T ↓ n be the decorated [k]-tree of mass n − 1 obtained by projecting the semi-planar (α, γ)-chain after the down-step.
Case A1: i ∈ B x for external x ∈ edge(s), y x > 1.Note how in this case i ∈ B x implies that ĩ = max{i, a, b} ∈ B x as well, implying that ĩ > k.Hence the tree shape does not change during the down-step, and (S ↓ , Y ↓ ) will have the property that S ↓ = s, Y ↓ x = y x − 1, and Y ↓ x ′ = y x ′ for all ins(s) ∋ x ′ = x.This corresponds to Step 2.a where x is an external edge.
Case A2: i ∈ B x for internal x ∈ edge(s).Noting that i > k, the definition of a and b implies that ĩ ∈ B x as well.Hence this case is similar to Case A1, and corresponds to Step 2.a where x is an internal edge.
Case A3: i ∈ B x for x ∈ bp(s).This is completely analogous to Case A2, and corresponds to Step 2.a where x is a branch point.
In all of the following cases, v ∈ bp(s) refers to the parent branch point of i in s, c v will refer to the number of children of v in s, and we construct Firstly, we note that the restrictions on i and y x implies that i ≤ k.Now consider the semi-planar structure in the branch point v, and note how y x = 1 and y v > 0 implies that max{i, a, b} = max{i, b}.Hence this case covers the situations where (1) c v > 2 and a leaf label larger than k is found to the left (or right, if i is placed leftmost in v) of i in T , and (2) c v = 2, in which case we are guaranteed to find a label larger than k to the right of i in T .In order to specify the decorated tree after the down-step, we only need to characterize the number of leaves found in the subtree of v containing ĩ in T .
From the decoration, it is clear that y v leaves have been inserted to form subtrees of v, different from the ones already in ŝ.For c v > 2, with i not located leftmost in v, these subtrees will have grown in the following way.Initially there will be weight α to the immediate left of i, and an accumulated weight of (c v −2)α−γ in the rest of the branch point.Thus, the numbers of insertions happening to the immediate left of i versus everywhere else in v, respectively, will follow a Pólya urn scheme with these initial weights.Now, say that n g > 0 leaves have been inserted in the "gap" between i and the left neighbour of i in ŝ.These n g leaves will form subtrees of v, and the location and sizes of these subtrees will be governed by an (α, α)-oCRP.Specifically, b will be located in the first subtree to the left of i in T , so denoting the size of this subtree by Y new , we observe that Y new is distributed as the size of the rightmost table in an (α, α)-oCRP with n g customers.Hence swapping i and ĩ, deleting ĩ, relabelling {ĩ +1, . . ., n} by {ĩ, . . ., n − 1} using the increasing bijection, and projecting to decorated [k]-trees of mass n − 1, implies that we obtain S ↓ = s, The argument above is exactly the same if i is placed leftmost in v in T , as we then search for b to the right of leaf i instead of to the left, but still encounter an (α, α)-oCRP.
If c = 2, the argument is the same, except that we are guaranteed to find something to the right of i in v in T .Contrary to the above, the location and sizes of the subtrees of v in T with smallest leaf label larger than k is governed by an (α, α − γ)-oCRP, but the remainder of the argument carries over.
Case B2: i ∈ B x for external x ∈ edge(s), y x = 1, y v > 0,ĩ ≤ k.Following the same reasoning as above, this case only contains the event where n g = 0 leaves with labels larger than k have been inserted in the gap between i and the left (or right if i is placed leftmost in v) neighbour of i in v, implying that we have ĩ = max{i, b} ≤ k.By swapping i and ĩ and subsequently deleting ĩ, we end up removing one of the leaves that defines the tree shape s.Consider how T was grown from any semi-planar representative ŝ of s using the semi-planar (α, γ)growth process.The sets (B x ) x∈ins(s) form a random partition of [n] induced by a Pólya urn scheme, and it follows from Proposition 2.3 that the probability of seeing k + 1 ∈ B x is exactly ỹx n−k for any x ∈ ins(s).In T , swap i and ĩ, delete ĩ and relabel ĩ +1, . . ., n by ĩ, . . ., n − 1 using the increasing bijection, and consider the tree shape and decoration masses of the projected tree (these operations below are well-defined as no edges or vertices of s apart from the leaf edge ĩ are deleted).Say that k + 1 ∈ B X0 for a random X 0 ∈ ins(s), the tree shape after projecting to decorated trees will thus be ϕ (ρ(τ (s, i,ĩ),ĩ), X 0 , k), i.e.
(i) if X 0 = v ∈ bp(s) the new tree shape will have k as a child of v, and (ii) if X 0 = e ∈ edge(s), k will split the edge e up into a new branch point and three edges.
Further considerations of the correspondence between urn schemes and inserting the leaves ) , or more precisely into the insertable parts of the projected tree specified above, shows that resampling ĩ exactly corresponds to deleting ĩ, relabelling using the increasing bijection, and subsequently projecting to decorated [k]-trees of mass n − 1.
Case B3: i ∈ B x for external x ∈ edge(s), y x = 1, y v = 0, ĩ > k.This case is where we have a binary branch point, and there is mass on the edge e v↓ in s, but is otherwise similar to Case B1, with the notable difference that the leaf insertions into e v↓ ∈ edge(s) are governed by a (γ, γ)-oCRP.In this case, ĩ = b and b will be the smallest leaf in the second spinal bush on the ancestral line from leaf i in T .Letting N b denote the random number of leaves in this spinal bush, by Proposition 2.5 we have that N b ∼ q γ,γ (e v↓ , •).Further considerations of the insertions, as in Case B2, show that conditional on N b = n b , the number of leaves in each of the subtrees of of the spinal bush will be governed by a Chinese Restaurant process with parameters (α, α−γ) with n b −1 customers.This is also an immediate consequence of the spinal decomposition of an (α, γ)-tree (see Lemma 11 of [7]).
Case B4: i ∈ B x for external x ∈ edge(s), y x = 1, y v = 0, ĩ ≤ k.This case is where we have a binary branch point, and there is no mass on e v↓ in s, and so ĩ < k and the argument presented in Case B2 can be used verbatim.
With an autonomous description of the transition kernel for the decorated (α, γ)chain on T •n [k] we turn to the more complicated task of proving that the projection of the semi-planar (α, γ)-chain on T with the transition kernel characterized in Proposition 6.7.

Sampling semi-planar (α, γ)-trees
Similar to how we characterized Π in (19), we now wish to characterize Π•n k .In order to do this, let us define the concept of internal structures which will allow us to efficiently sample non-planar or semi-planar [n]-trees from a collapsed or decorated tree of mass n.

So fix t ∈ T [n] and let π •n
[k] (t) = t • = (s, y) for some s ∈ T [k] , and let g be the decorating function.There is a partition of all edges and vertices of t indexed by the elements of ins(s).Define for each x ∈ ins(s) the set of all vertices of t that are not vertices of s, and from which x is the "first" edge or vertex of s, respectively, that is reached on the ancestral line.More precisely, for each v ∈ bp(s) define and whilst for each e ∈ edge(s) with, say, e = e uv , define and and Ee s,t = e ′ ∈ edge(t) .
We note that V x s,t either is empty, contains only {x} if x ∈ bp(s), or will contain at least one leaf for each x ∈ ins(s), and in the latter case we continue to label these leaves by their corresponding label in t.Additionally, the only situation where V x s,t is empty, but Ex s,t is not, is when x ∈ edge(s) is internal and g −1 ({x}) = ∅, in which case Ex s,t = {x}.Likewise, whenever x ∈ bp(t) and g −1 ({x}) = ∅, V x s,t = {x} and Ex s,t = ∅.The above sets constitute a partition of t, i.e. a partition of both the vertex and edge sets of t, labelled by the elements of s, but it is only almost true that ( V x s,t , Ex s,t ) constitutes a tree for each x ∈ ins(s).To ease stating later results, we amend this by adding some missing parts: [k] (t) = (s, y).For each x ∈ ins(s), make the following alterations to V x s,t and Ex s,t defined above: uniquely characterized by u ≺ t u ′ and v ′ ≺ t v. • If x = v ∈ bp(s) with c children in s then (i) add ∅ and 1, . . ., c to V x s,t , and (ii) add the edges e ∅v , e v1 , . . ., e vc to Ex s,t .In all three cases, replace each of the leaf labels in the resulting tree with their ranks to obtain V x s,t and E x s,t , and define the internal structure of x in t to be the tree int(x) := V x s,t , E x s,t .int (e 1↓ ) int (e 2↓ ) int (e 3↓ ) π π * 8 [3] π•8 [3] Figure 6.An illustration of the internal structures of an element of T [8] .
We note that if x ∈ edge(s) is external then int . Additionally, note how all internal structures above could have been defined equivalently from any semi-planar tree, t, such that π( t) = t, and even turned into semi-planar trees, simply by letting each vertex in the internal structure inherit the permutation from t as all ranks of subtrees are preserved (note how this is well-defined by the addition of the leaves {1, . . ., c} to V v s,t before relabelling).In most cases, it will be completely clear what t and s are in the above setup, but whenever that is not the case, we will write int t k (x) to refer to the internal structure of x ∈ ins(s) in t ∈ T [n] , where s = ρ (t, [n] \ [k]).
Before moving on to characterizing the (random) internal structure of a (random) decorated tree, we will need the following growth procedures, which are only slightly different from the (α, γ)-growth process, and arise from the internal structure of internal edges and branch points, respectively.Definition 7.2 (Internal (α, γ)-growth process).Fix 0 < γ ≤ α ≤ 1.Let T γ 0 be the unique element of T [1] .For any n ∈ N 0 construct T γ n+1 conditional on , by and then defining the conditional distribution of T γ n+1 by for each x ∈ ins(t).The sequence (T γ n ) n∈N0 is referred to as the internal (α, γ)growth process.As the growth processes defined above are very similar to the (α, γ)-growth process described in Section 3, a wide range of results for the (α, γ)-growth process can can be modified to fit the internal growth processes of Definitions 7.2 and 7.3.It should also be clear that we can define a semi-planar version of the above growth procedures, just as we did for the (α, γ)-growth process.For the c-order branch point process, this entails sampling an initial distribution to provide the order of the initial c leaves in the first branch point, as described by (14).We will refer to these processes as the semi-planar internal (α, γ)-growth process and the semi-planar c-order branch point (α, γ)-growth process.
From the spinal decomposition of the (α, γ)-growth process [7], it is clear that there is an equally intimate link between (un)ordered Chinese Restaurant Processes and the internal (α, γ)-process.The only difference is that the ordered Chinese Restaurant involved in the spinal decomposition outlined in Lemma 11 of [7] is a (γ, γ)-oCRP rather than a (γ, 1 − α)-oCRP.This (γ, γ)-oCRP is the same as the one appearing in Proposition 6.7.From similar considerations we get the following characterization of the first split in the c-order branch point (α, γ)-growth process: .In addition, let S n l = S n l,1 , . . ., S n l,M l , enumerated from left to right, denote the subtrees in the l'th gap of that same branch point, let X n l denote the associated ordered partition relabelled using the increasing bijection {c + 1, . . ., c + n} → {1, . . ., n}, and let S n,re l,j denote S n l,j relabelled by {1, . . ., # S n l,j } using the increasing bijection for each j ∈ Further conditioning on # S n l,1 , . . ., # S n l,M l = (n l,1 , . . ., n l,M l ) yields that S n,re l,1 , . . ., S n,re l,M l are independent with S n,re l,j Proof.All of the above assertions follows from (by now) standard growth process arguments, as one notes that the only possible insertions in these processes are either into the branch point described above, or into a subtree of that branch point with a least label larger than k.In the semi-planar tree, consider the insertion of these leaves into the "gaps" of the branch point between the subtrees with least leaf label smaller than k.There are c − 1 such gaps, of which c − 2 has weight α and the rightmost one has weight α − γ.Hence the numbers of leaves being inserted into the gaps follow a Pólya urn scheme with initial weights w = (α, . . ., α, α − γ) ∈ [0, 1] c−1 , and Proposition 2.3 directly yields that Y n 1 , . . ., Y n c−1 ∼ DM n (w).When conditioning on which gap a new leaf is inserted into, the location within that gap is clearly governed by an oCRP, the distribution of which is detailed by the same argument as in Lemma 3.5.And lastly, when conditioning on which leaves end up in the same subtree of v, each of these subtrees will after relabelling be a semi-planar (α, γ)-tree with the corresponding number of leaves.
where s re l ′ is s l ′ relabelled by [n l ′ ] using the increasing bijection and are independent, and (N 1 , . . ., N L ) are the numbers of customers sitting at the L tables of an (α, (c − 2)α − γ)-CRP with n customers.
We will now prove some useful versions of Lemma 4.5 and Theorem 4.6 for the semi-planar (α, γ)-growth process with a special tree as a starting point t0 ∈ T with associated weights w 0 = w 0 x,l x∈ins( t0),0≤l≤cx−1 , where we use the convention that c x = 1 for x ∈ edge( t0 ) and 1 ≤ l ≤ c x − 1 if x ∈ bp( t) with c x children.Crucially, we allow these weights to be different from the weight specification of the growth process.The result that we will now outline have wider implications for label swapping in more general structures that induce partitions and come about from a growth procedure, such as ordered Chinese Restaurant Processes.However, we will not investigate this further in this paper.
In the following we need to be careful about 'where' the modified weights appear after the insertion of a leaf according to the growth rule.Hence for any t ∈ T [n] consider how t′ = φ t, n + 1, x, l is formed by carrying out one step of the semiplanar (α, γ)-growth rule, where x ∈ ins( t) and 0 ≤ l ≤ c x − 1.We will organize the weights of t′ such that for every x ′ ∈ ins( t′ ) and 0 ≤ l ′ ≤ c x ′ − 1.This is illustrated in Figure .The semi-planar (α, γ)-growth process started from t0 is simply the same growth process as in Definition 3.4, started from t0 ∈ T [n0] , i.e.where and we iteratively use the weight scheme defined in (28) to form T n+1 from T n for every n ∈ N. We note that the above specification of the weights is consistent with the original definition of the semi-planar (α, γ)-growth process, if the weights w 0 above are the same as those specified in Definition 3.4.The importance of organizing the weights as in ( 28) is that we can handle explicitly starting a semi-planar (α, γ)-growth process from a tree with associated weights different from the ones coming out of the semi-planar (α, γ)-growth process.This is the case for the semi-planar internal (α, γ)-growth process as well as the semi-planar c-order branch points (α, γ)-growth process.We have the following result, where we recall that f is the transformation function from Definition 4.3.Then E i,ĩ ⊥ ⊥ T ĩ −1 and, conditional on the event E i,ĩ , ρ * τ * T n , i,ĩ ,ĩ and T n−1 have the same distribution.Furthermore, if Proof.The proof of Lemma 4.5 can used almost verbatim here, as we note that the restrictions on i and ĩ ensure none of the insertable parts of the trees T ĩ −1 , . . ., T n that contribute to the characterization of E i,ĩ have w 0 as associated weights.This is a consequence of the way we have organized the weights in the growth process, combined with the definition of a and b from Definition 4.3.
Note how Proposition 7.6 generalises Lemma 4.5.This generalisation is useful, as have have the following corollaries, which are easily checked to satisfy the assumptions of Proposition 7.6.
Proof.Firstly note that the conditional independence in (ii) follows directly from the independence of the insertions of new leaves in the (α, γ)-growth process.This observation directly implies (i).For (iii), fix x ∈ ins(s), and note that the probability of int(x) being equal to a specific tree is independent of the labels, by virtue of the growth process.Next, (iv) and (v) follows by noting how T n is produced from T k = s, since the latter is a part of what we condition on.For any x ∈ edge(s), note how int T k k (x) will be the unique element of T [1] , where the edge has associated weight γ if x is an internal edge and 1 − α if x is a leaf edge, respectively.Now observe how each int(x) grows according to either the (α, γ)-or the internal (α, γ)growth process, according to the initial weight of the leaf.For (vi), note how for (ii) Let q be a stationary distribution of (X m ) m∈N0 , and further assume that X 0 ∼ Λ(Y 0 , •) for some Y-valued random variable Y 0 , where Λ(y, •) = q (• | λ = y).If for each y ∈ Y it holds that P (X 1 = x | λ(X 0 ) = y 0 , λ(X 1 ) = y) = Λ (y, {x}) (34) for all x ∈ X with λ(x) = y, define Q = ΛKλ using the same notation as in (20).
In either case (λ(X m )) m∈N0 is a Markov chain with transition kernel, Q, and initial distribution given by λ(X 0 ).
In the above, (i) is referred to as the Kemeny-Snell criterion [18], whilst (ii) is known as the intertwining criterion [27].The goal for this section is to combine the two criteria to prove that π •n [k] (T n (m)) m∈N0 is a Markov chain with transition kernel described in Proposition 6.7.Illustrated in Figure 7, we seek to do this by splitting this up, so rather than projecting all the way to decorated trees we will use collapsed trees as an intermediary.In doing this we note that π , where h denotes the natural projection from T * n The goal is thus to show that both the upper and lower part of the diagram in Figure 7 commutes, respectively.We will deploy the Kemeny-Snell criterion to show that the lower diagram commutes, whilst the intertwining criterion will be used for the upper part of the diagram.We will start by proving that the projection from collapsed trees to decorated trees satisfies the Kemeny-Snell criterion.Proof.Say that t * 1 = s 1 , B x , for every x ∈ ins(s).
T n (m) T n (m + 1) Commutative diagram illustrating the construction of K •n k described in Proposition 6.7 and the proof tactic to show that the projection of the semi-planar (α, γ)-chain is indeed a Markov chain with that same transition kernel.Now observe that there is a unique bijection τ : [n] → [n] which is increasing on B 1 x for each x ∈ ins (s 1 ) such that B 2 x = τ B 1 x for every x ∈ ins (s), and use this bijection to obtain τ (t) from t ∈ T [n] .Then it holds that 2 ) and that P (T n = t) = P (T n = τ (t)) for each t ∈ T [n] that satisfies π * n [k] (t) = t * 1 .Crucially, the rank of each element of any set B x is preserved under this relabelling.Note how the internal structure, int(x), is completely preserved under τ by Lemma 7.9 (iii), for each x ∈ ins (s), and that the distribution of each int(x) is described by Lemma 7.9(iv)-(vi).This immediately implies that π •n [k] (T n (1)) = π •n [k] (T τ n (1)) (not just in distribution!),where T n (•) and T τ n (•) are both (non-planar) (α, γ)-chains but T n (0) = t and T τ n (0) = τ (t).This proves that the Kemeny-Snell criterion is satisfied.
With the bottom diagram of Figure 7 commuting, we only need to prove that the upper part of the same diagram commutes as well.This is a slightly more intricate argument, where we will aim to show that the intertwining criterion is satisfied with (X m ) m∈N0 = T n (m) Essentially, the following proof will follow the structure of Proposition 6.7, meaning that we will split up the proof into cases based on the location and value of I in π * n [k] T n : Proof.Consider the collapsed tree t * 0 = s, (B x ) x∈ins(s) ∈ T * n [k] , the event that the projection of the inital state of the semi-planar (α, γ)-chain onto T * n [k] yields t * 0 , π * n [k] T n (0) = t * 0 , and the internal structure as characterized by the spatial Markov property in Lemma 7.9.In light of this result it is sufficient to show that the internal structure is preserved under the down-step if we further condition on the selected leaf I, the deleted leaf Ĩ, and in some cases additional information about label sets, ensuring that we uniquely characterize the collapsed tree after the down-step.Hence, we (i) sample T from P T n ∈ • π * n [k] T n = t * 0 , (ii) condition on I = i and Ĩ = ĩ (and in some cases additional information), delete ĩ, and relabel ĩ +1, . . ., n by ĩ, . . ., n−1 using the increasing bijection, obtaining T ↓ , and (iii) project to T * (n−1) [k] to obtain T * , and study the internal structures in T ↓ .
Even though the statement of the lemma formally involves the up-step, we will as an intermediary result show that T n (0) = t * 0 , π * (n−1) [k] T ↓ = t * ↓ 1 = P T n−1 = t π * (n−1) [k] where t * ↓ 1 denotes t * 1 with label n removed.By definition of the down-up chain the statement of the lemma easily follows from (38), since the up-step is simply defined by adding {n} to one of the B x 's according to the same probabilities as in the decorated (α, γ)-growth process, Definition 6.5.Now, from the spatial Markov Property (Lemma 7.9) we need mainly to focus on the internal structures of t * 0 in T that are affected by the down-step.
We now split up into cases based on t * 0 , i and ĩ: Case A1: i ∈ B x for external x ∈ edge(s), y x > 1.Note how in this case I ∈ B x implies that Ĩ = max{I, a, b} ∈ B x as well, where a and b are defined as in the semi-planar (α, γ)-chain.This implies that the tree shape does not change during the down-step.Hence conditioning on I = i and Ĩ = ĩ in t * 0 , is exactly the same as conditioning on the corresponding random variables I ′ = rank Bx (i) and Ĩ′ = rank Bx (ĩ) in int(x).Now, recall that int(x) D = T #Bx , and so Lemma 4.5 ensures that, under this conditioning int(x) after the down move is distributed as T #Bx−1 .Since this is the only internal structure affected in the down-step all the other internal structures will still have the same distribution, and they will all still be conditionally independent as in Lemma 7.9.
Case A2: i ∈ B x for internal x ∈ edge(s).This is very similar to Case A1, except that we have I > k in this case.As above, note that conditioning on I = i and Ĩ = ĩ, both being elements of B x , is equivalent to conditioning on the corresponding random variables I ′ = rank Bx (i) and Ĩ′ = rank Bx (ĩ) in int(x).Note that 1 < I ′ ≤ Ĩ′ ≤ #B x , and recall that int(x) D = T γ #Bx .Combining these observations with Corollary 7.7 yields that int(x) after the down-step has the same distribution as T γ #Bx−1 .Case A3: i ∈ B x for x ∈ bp(s).This is the most complicated scenario due to the complexity of the internal structure of a branch point.But similarly to Cases A1 and A2, we do not change the tree shape since I > k ensures that Ĩ ∈ B x as well.Hence we only need to argue that this internal structure has the correct (conditional) distribution after the down step, as in the previous cases.Similar to these cases, conditioning on I = i and Ĩ = ĩ in t * 0 is equivalent to conditioning on the corresponding random variables I ′ = c + rank Bx (i) and Ĩ′ = c + rank Bx (ĩ) in int(x), where c is the number of children of x in s.
From Lemma 7.9(vi) we obtain that int(x) has the same distribution as the #B x 'th step of a c-order branch point (α, γ)-growth process started from

Definition 4 . 2 (
Deletion of a leaf in T * A ). Let A be a finite set and fix t * ∈ T * A .The operation of deleting leaf j ∈ A from t * is the map

Figure 2 .
Figure2.Swapping labels in T *[3] .The orange "edges" is an illustrative expansion of the branch point v so that the left-to-right ordering of the subtrees of v becomes apparent.Thus, all semiplanar and planar trees depicted above have the same tree shape, but have different permutations associated to v. Label swapping in t (bottom left) occurs by constructing the planar version, t * (top left), and then swapping the labels in the planar version.Swapping labels 1 and 2 yields a planar tree corresponding to a semi-planar tree since the two subtrees with the lowest labels are located leftmost.This is not the case if we swap labels 2 and 3, and thus the bottom right is not a semi-planar tree, i.e. not an element of T[3] .

Lemma 4 . 5 .
Fix n ∈ N and let T m 1≤m≤n be the first n steps of the semi-planar (α, γ)-growth process.Fix 1 ≤ i ≤ ĩ ≤ n and define

Theorem 4 . 6 .
Fix n ∈ N. Let T n (m) m∈N0 denote the semi-planar (α, γ)-chain and let T n denote the nth step of the semi-planar (α, γ)-growth process.Then it holds that T n (m) D − → T n for m → ∞, where D − → denotes convergence in distribution.Furthermore, it holds that the projection of T n to the space of (non-planar) [n]-trees has the same distribution as the nth step of the (α, γ)-growth process.

Figure 4 .
Figure 4. Commutative diagram to illustrate the construction of the transition kernel for the (α, γ)-chain.

Theorem 5 . 4 .
Fix n ∈ N. Let T n (m) m∈N0 be a semi-planar (α, γ)-chain, and let T n (m) := π T n (m) denote the projection of the chain onto T [n] .If T n (0) ∼ Πn (t, •) for some t ∈ T [n] and replace y x with Y new and y v with y v − Y new .
and y e v↓ with y e v↓ − N b .

Lemma 7 . 4 .
Fix 0 < γ < α ≤ 1 and c ≥ 2. Let T c−bp n n∈N0 denote the semiplanar c-order branch point (α, γ)-process.For each n ∈ N and l ∈ [c − 1] let Y n l denote the number of leaves with labels larger than c that have been inserted in the l'th gap (enumerated from left to right) in the branch point closest to the root in T c−bp n

Corollary 7 . 5 .
Fix 0 < γ < α ≤ 1 and c ≥ 2. Let T c−bp n n∈N0 denote the (nonplanar) c-order branch point (α, γ)-process.If S n 1 , . . ., S n L denotes the subtrees with least label larger than c in the branch point closest to the root of T c−bp n , enumerated in the order of least leaf label, then
for an illustration.Recall that τ denotes the label swapping map on T [n] from Definition 2.12.
A ). Let A be a finite set and fix t * = (t, σ * ) ∈ T * A .The operation of swapping labels is a map t * , i, j τ → (τ (t, i, j), σ * τ ) for any pair i, j ∈ [n] that satisfies that if v ∈ bp(t) is the parent of i, and i 1 , . . ., i cv denotes the smallest leaf label of the subtrees of v in t, enumerated in increasing order, then t , and (ii) replace e uu ′ with e ∅u ′ in Ex s,t , withu ′ ∈ V x s,t uniquely characterised by u ≺ t u ′ .•If x = e uv ∈ edge(s) is internal such that u ≺ s v then (i) add ∅ and 1 to V x s,t, and (ii) if V x s,t = ∅ replace the edge e uv with e ∅1 , and otherwise replace the two edges e uu ′ and e v ′ v with e ∅u ′ and e v ′ 1 in Ex s,t , where u ′ , v ′ ∈ V x