Foundations of the wald space for phylogenetic trees

Evolutionary relationships between species are represented by phylogenetic trees, but these relationships are subject to uncertainty due to the random nature of evolution. A geometry for the space of phylogenetic trees is necessary in order to properly quantify this uncertainty during the statistical analysis of collections of possible evolutionary trees inferred from biological data. Recently, the wald space has been introduced: a length space for trees which is a certain subset of the manifold of symmetric positive definite matrices. In this work, the wald space is introduced formally and its topology and structure is studied in detail. In particular, we show that wald space has the topology of a disjoint union of open cubes, it is contractible, and by careful characterisation of cube boundaries, we demonstrate that wald space is a Whitney stratified space of type (A). Imposing the metric induced by the affine invariant metric on symmetric positive definite matrices, we prove that wald space is a geodesic Riemann stratified space. A new numerical method is proposed and investigated for construction of geodesics, computation of Fréchet means and calculation of curvature in wald space. This work is intended to serve as a mathematical foundation for further geometric and statistical research on this space.

1. Introduction 1.1.Background.Over billions of years, evolution has been driven by unobserved random processes.Inferences about evolutionary history, which by necessity are largely based on observations of present-day species, are therefore always subject to some level of uncertainty.Phylogenetic trees are used to represent possible evolutionary histories relating a set of species, or taxa, which form the leaves of each tree.Internal vertices on phylogenetic trees usually represent speciation events, and edge lengths represent the degree of evolutionary divergence over any given edge.Trees are typically inferred from genetic sequence data from extant species, and a variety of well-established statistical methods exist for phylogenetic inference (Felsenstein, 2003).These generally output a sample of trees -a collection of possible evolutionary histories compatible with the data.Moreover, evolutionary relationships can vary stochastically from one gene to another, giving a further source of random variation in samples of trees (Maddison, 1997).It is then natural to pose statistical questions about such samples: for example identifying a sample (Jonas Lueg and Stephan F. Huckemann) Felix-Bernstein-Institute for Mathematical Statistics in the Biosciences, Georg-August-Universität, Göttingen, Germany (Maryam K. Garba and Tom M. W. Nye) School of Mathematics, Statistics and Physics, Newcastle University, UK E-mail addresses: jonas.lueg@stud.uni-goettingen.de,m.k.garba1@ncl.ac.uk, tom.nye@ncl.ac.uk, huckeman@math.uni-goettingen.de.
mean, identifying principal modes of variation in the sample, or testing differences between samples.This, in turn, calls for the design of suitable metric spaces in which each element is a phylogenetic tree on some fixed set of taxa, and which are ideally both biologically substantive and computationally tractable.The design of these tree spaces is aggravated by the continuous and combinatorial nature of phylogenetic trees and furthermore, a metric space that is also a geodesic space (so that distance corresponds to the length of shortest paths, also called geodesics) is to be preferred, as it facilitates computation of statistics like the Fréchet mean significantly.The first geodesic space of phylogenetic trees was introduced by Billera et al. (2001) and is called the BHV space, where BHV is an acronym of the authors Billera, Holmes and Vogtmann.For a fixed set of species L = {1, . . ., N }, also called taxa or labels, with 3 ≤ N ∈ N, BHV space is constructed via embedding all phylogenetic trees into a Euclidean space R M , where M ∈ N is exponentially growing in N , and then taking the infinitesimally induced intrinsic distance on this embedded subset, giving a metric space.As a result, BHV space features a very rich and computationally tractable geometry as it is CAT(0) space, i.e. globally of non-positive curvature, and thus having unique geodesics and Fréchet means.Starting with the development of a polynomial time algorithm for computing geodesics and thereby overcoming the combinatorial difficulties (Owen and Provan (2011)), many algorithms have been derived for computing statistics like sample means (Bačák (2014); Miller et al. (2015)) and variance (Brown and Owen (2020)), confidence regions for the population mean (Willis (2016)) and principal component analysis Nye (2011Nye ( , 2014)); Nye et al. (2016), Feragen et al. 2013).The BHV paper has had considerable influence more widely on research in phylogenetics (see Suchard (2005) for example), non-Euclidean statistics (Marron and Alonso (2014)), algebraic geometry (Ardila and Klivans (2006)), probability theory (Evans et al. (2006)) and other area of mathematics (Baez and Otter (2015)).In addition to the BHV tree space, a variety of alternative tree spaces have been proposed, both for discrete and continuous underlying point sets of trees.For example, in the tropical tree space (Speyer and Sturmfels, 2004;Monod et al., 2022) edge weights are times, not evolutionary divergence, thus allowing for a distance metric between two trees involving tropical algebra.
The geometries of the BHV and tropical tree spaces are unrelated to the methods used to infer phylogenies from sequence data.In contrast, there are substantially different tree spaces that originate via the evolutionary genetic substitution models used by molecular phylogenetic methods for tree inference (see Yang (2006) for details of these).Evolutionary substitution models are essentially Markov processes on a phylogenetic tree with state space Ω.For DNA sequence data, the state space is Ω = {A, C, T, G}.Under an appropriate set of assumptions on the substitution model, each tree determines a probability distribution on the set of possible letter patterns at the labelled vertices L, (i.e. a probability mass function p : Ω N → [0, 1]), N = |L|, and this can be used to compute the likelhiood of any tree.At about the same that BHV space was introduced, Kim (2000) provided a geometrical interpretation of tree estimation methods, where, given the substitution model, an embedding of phylogenetic trees into an |Ω| N -dimensional simplex using the likelihoods was discussed informally.The concept was then picked up by Moulton and Steel (2004), introducing the topological space known as the edge-product space, taking not only into account phylogenetic trees but also forests, characterising each forest via a vector containing correlations between all pairs of labels in L under the induced distribution p.This representation is then an embedding of all phylogenetic forests into a N (N − 1)/2-dimensional space.Using the same characterisation of phylogenetic trees via distributions on Ω N obtained from a fixed substitution model, Garba et al. (2018) considered probabilistic distances to obtain metrics on tree space, but these metrics do not yield length spaces.Therefore, in Garba et al. (2021), the fact that all phylogenetic trees with a fixed fully resolved tree topology are a manifold was used to apply the Fisher information geometry for statistical manifolds on each such piece of the space to eventually obtain a metric space that is a length space.Additionally, instead of using substitution models with finite state space Ω, Garba et al. (2021) considered a Gaussian model with state space Ω = R in order to deal with the problem of computational tractability.The distributions characterising phylogenetic trees are then zero-mean multivariate Gaussians, and sums over Ω N for discrete Ω are replaced with integrals over R N .The characterisation with this Gaussian model together with the choice of the Fisher information geometry and the extension to phylogenetic forests ultimately leads to the wald space, which is essentially an embedding of the phylogenetic forests into the real symmetric N × N -dimensional strictly positive definite matrices P (Garba et al. (2021)).The elements of wald space are called wälder ("Wald" is a German word meaning "forest").
Figure 1.Two trees T 1 and T 2 with positive edge length ∈ (0, ∞).Letting → ∞, the intuitive limit element for both trees is the forest F , as species are considered not related if their evolutionary distance approaches infinity.In wald space, the distance between T 1 and T 2 goes to zero accordingly as → ∞ and their limit is the forest F that is also contained in wald space.In BHV space however, their distance goes to infinity as → ∞ and F is not an element of the space.
The geometry of wald space is fundamentally different from BHV space (Garba et al. (2021); Lueg et al. (2021)), as illustrated in Fig. 1, which also underlines the biological reasonability of the wald space.Loosely speaking, wald space can be viewed topologically as being obtained by compactifying the boundaries at the "infinities" of BHV space, which comes with the price of fundamentally changing the geometry that is not locally Euclidean anymore.We avoid, though, the compactification at the "zeroes" of the edge-product space proposed by Moulton and Steel (2004) which suggests itself by mathematical elegance.It is biologically questionable, however, as it would allow different taxa to agree with one another.In Garba et al. (2021), apart from defining the wald space, certain properties of the space were established, such as showing the distance between any two points to be finite, and algorithms for approximating geodesics were proposed.In Lueg et al. (2021), a compact definition of wald space as well as more refined algorithms for approximating geodesics were introduced.1.2.Contribution of this paper.Previous work on wald space established the space as a length space, and this paper was originally motivated by the aim of proving the existence of a minimising geodesic betweeen every two points, i.e. establishing wald space as a geodesic metric space, since the existence of geodesics is crucial for performing statistical analysis within the space.This aim is achieved in Theorem 4.2.1 below.The proof involves three essential characterisations of the elements of wald space (as graph-theoretic forests; as split systems; and as certain symmetric positive definite matrices).In turn, these enable a rigorous analysis of the topology of wald space, such as Theorem 3.3.5 about its stratified structure, in addition to providing a foundation for further research on this space.
The remainder of the paper is structured as follows.In Section 2 we define the wald space W for a fixed set of labels {1, . . ., N } as equivalence classes of partially labelled graph-theoretic forests.The topology on W is obtained by defining a map ψ from W into the set of N × N symmetric positive definite matrices and requiring ψ to be a homeomorphism onto its image.We then provide an equivalent, but more tractable, definition in terms of splits or biparitions of labels, and an equivalent map φ from split-representations of wälder to symmetric positive definite matrices.In particular, we show that wald space can be identified topologically with a disjoint union of open unit cubes.Each open unit cube is called a grove.In Section 3 we describe the structure or stratification of the wald space by investigating on how the groves are glued together along their respective boundaries.This is achieved by first providing in Section 3.1 a detailed characterization of the matrices in the image φ(W) in terms of a set of algebraic constraints on the matrix elements.Using this characterization, for example, we show that wald space is contractible.Then in Section 3.2 we use a partial ordering of forest topologies, first introduced by Moulton and Steel (2004) to establish results about the boundaries of groves and the stratification of wald space.This culminates in Section 3.3 in which we prove wald space satifies certain axioms at grove boundaries, collectively known as Whitney condition (A) (Pflaum, 2001), which ensure that tangent spaces behave well as the boundaries of strata are approached.We then go on to consider the induced affine invariant or information geometry on wald space in Section 4. We show the topology induced by the metric is the same as the previous topology defined using φ, and hence show that W is a geodesic metric space (i.e.every two points are connected by a minimising geodesic).Finally in Section 5, we use a new algorithm for computing approximate geodesics to explore the geometry on wald space, specifically computing sectional curvatures within groves and Alexandrov curvatures for fundamental examples.We also investigate the behaviour of the sample Fréchet mean, in particular with reference to the issue of stickiness observed in in BHV space (see for example Hotz et al. (2013);Huckemann et al. (2015) for a discription).In Section 6 we discuss the contributions of the paper and some of the many open questions and unsolved problems about the geometry of wald space.1.3.Notation.Throughout the paper we use the following notation and concepts, where points 4-6 below can be found in standard textbooks of differential geometry, e.g.(Lang, 1999, Chapter XII): (1) 2 ≤ N ∈ N is a fixed integer defining the set of labels L = {1, . . ., N }.
(2) n i=1 A i denotes the union if the A i are pairwise disjoint (i = 1, . . ., n).
(3) When we speak of partitions, no empty sets are allowed.(4) For a set E, its cardinality is denoted by |E|.
(5) S is the Euclidean space of real symmetric N × N matrices.(6) P is the space of real symmetric and positive definite N × N matrices.It is an open cone in S and carries the topology and smooth manifold structure inherited from S. In particular, every tangent space T P P at P ∈ P is isomorphic to S. (7) We equip P with the affine invariant Riemannian metric, also called information geometry, yielding a Cartan-Hadamard manifold.Its metric tensor is given by X, Y P = trace(P −1 XP −1 Y ) for X, Y ∈ S ∼ = T P P and the unique geodesic with the usual matrix exponential and logarithm, respectively.Here, √ P denotes the unique positive definite root of P .(8) The Riemannian metric induces a metric on P denoted by d P and for a rectifiable curve γ : [a, b] → P let L P (γ) be its length.In a word of caution we note that the term topology appears in two contexts: (i) as a system of open sets defining a topological space and (ii) as a branching structure of a graph-theoretic forest.The latter is standard in the phylogenetic literature, despite the potential for confusion.

Definition of Wald Space via Graphs and Splits
2.1.From a Graph Viewpoint.This section recalls definitions and results from Garba et al. (2021) and Lueg et al. (2021).
Moreover, (1) Every phylogenetic equivalence class is called a phylogenetic forest and denoted by (2) W is the set of all phylogenetic forests.
(3) Every topological equivalence class is called a forest topology and denoted by Definition 2.1.3.Let (V, E, ) be a forest.For two leaves u, v ∈ L let E(u, v) be the set of edges in E of the unique path between u and v, if u and v are connected, else set E(u, v) = ∅.Further define a mapping of forests via By definition, the above matrix is the same for two forests representing the same phylogenetic forest.It is even positive definite and characterizes phylogenetic forests uniquely as the following theorem shows.
Theorem 2.1.4(Garba et al. (2021), Theorem 4.1).For every forest (V, E, ), we have ψ(V, E, ) ∈ P and for any two forests (V, E, ) and (V , E , ) we have In consequence of Theorem 2.1.4,ψ induces a well defined injection from W into P.In slight abuse of notation we denote this mapping also by ψ, that is Definition 2.1.5.The wald space is the topological space W equipped with the unique topology under which the map ψ : W → P from Equation (2.2) is a homeomorphism onto its image.
2.2.From a Split Viewpoint.If (V, E, ) is a representative of a phylogenetic forest F, there is K ∈ N such that the graph-theoretic forest (V, E) decomposes into K disjoint nonempty graph-theoretic trees In particular, this decomposition induces a partition L 1 , . . ., L K of the leaf set L with L α ⊆ V α , 1 ≤ α ≤ K.
Furthermore for 1 ≤ α ≤ K, taking away an edge e ∈ E α decomposes (V α , E α ) into two disjoint graph-theoretic trees that split the leaf set L α into two disjoint subsets A and B.
The representation of phylogenetic trees via splits is more abstract than as graphs but more tractable.We first introduce the weighted split representation and then show equivalence of the concepts.Definition 2.2.1.A tuple F = (E, λ) with E = ∅ is a split-based phylogenetic forest if (i) there is 1 ≤ K ≤ N and a partition L 1 , . . ., L K of the leaf set L; (ii) every element e ∈ E is of the form e = {A, B}, called a split, where for some 1 ≤ α ≤ K, A, B is a partition of L α ; E α denotes the elements in E that are splits of L α ; for notational ease we write interchangeably where two splits A|B and C|D of L α are compatible with one another if one of the sets below is empty: (iv) for all distinct u, v ∈ L α , 1 ≤ α ≤ K, there exists a split e = A|B ∈ E α such that u ∈ A and v ∈ B; (v) λ := (λ e ) e∈E ∈ (0, 1) E .Moreover F ∞ with E = ∅ and void array λ is the completely disconnected split-based phylogenetic forest with leaf partion {1}, . . ., {N }.
The partition L 1 , . . ., L K is not mentioned explicitly in the definition of a splitbased phylogenetic forest F = (E, λ) since it can be derived from E via {L 1 , . . ., L K } := A ∪ B : A|B ∈ E , where K ≤ K, and for all u ∈ L \ K α=1 L α , the singleton {u} is added to the collection to obtain L 1 , . . ., L K .
Theorem 2.2.2.There is a one-to-one correspondence between split-based phylogenetic forests F = (E, λ) from Definition 2.2.1 and phylogenetic forests F = [V, E, ] from Definition 2.1.2with and λ related by with an arbitrary but fixed representative (V, E, ).Furthermore, there is a one-toone correspondence between compatible split sets E from Definition 2.2.1 (i) -(iv), and phylogenetic forest topologies [V, E].
Proof.Case I. Suppose K = 1, i.e.F comprises only one tree: We take recourse to (Semple and Steel, 2003, Theorem 3.1.4)who establish a one-to-one correspondence of compatible split sets E from Definition 2.2.1 (i) -(iv), and phylogenetic forest topologies [V, E], in case these are taken from graph-theoretic trees.Indeed, our phylogenetic forest topologies correspond to isomorphic X-trees there (our L is X there and the labelling map from (Semple and Steel, 2003, Definition 2.1.1)is the identity in our case) and for every representative (V, E) ∈ [V, E] there is a unique compatible split set E from Definition 2.2.1 (i) -(iv) ((iv). is a consequence of L ⊆ V, K = 1 and injectivity of the labelling map).Vice versa, there is a bijection e → s e , E → E that, removing the edge e from E produces two disconnected trees, yields a unique split s = s e = A|B of the leaf set L = A ∪ B. This yields the second assertion, namely a one-to-one correspondence between compatible split sets E from Definition 2.2.1 (i) -(iv) and phylogenetic forest topologies [V, E] in case of underlying graph-theoretic trees.The first assertion follows from the correspondence in (2.3), which thus yields, due to phylogenetic equivalence in Definition 2.1.2(iii), a one-to-one correspondence between split based phylogenetic forests F = (E, λ) and phylogenetic forests [V, E, ], in case of underlying graph-theoretic trees.
Case II.Suppose F comprises several K > 1 trees: Here, consider two phylogenetic forests representatives (V, E, ), (V , E , ) ∈ [V, E, ].Due to Definition 2.1.2,(i) and (ii), both (V, E, ) and (V , E , ) have the same number of connected components, each of which is a graph-theoretic tree and the bijection f from Definition 2.1.2restricts to bijections between the corresponding graph-theoretic trees.For each of these, Case I (K = 1) is applicable, thus yielding the assertion in the general case.
In consequence of Theorem 2.2.2 we introduce the following additional notation.
Definition 2.2.3.From now on, we identify split-based phylogenetic forests F = (E, λ) with phylogenetic forests F = [V, E, ] and say that F is a wald, in plural wälder, so that F ∈ W, and use interchangeably the name split and edges for the elements of E (as they are "edges" in equivalence classes).In particular, the λ e , e ∈ E, from Definition 2.2.1, 5., are called edge weights.Furthermore, (1) [F ] := E also denotes the topology [V, E] of F and denotes the set of all possible topologies; (2) wälder of the same topology E form a grove (3) for any two u, v ∈ L with leaf partition L 1 , . . ., L K , define E(u, v) := {A|B : ∃ 1 ≤ α ≤ K and e ∈ E(u, v) that splits L α into A and B} , which also denotes set of edges between u and v, that may be empty; (4) the edge length based matrix representation ψ from Equation (2.2) translates to the edge weight based matrix representation φ defined by , with the agreement that in case of empty E(u, v) ρ uv := 1 whenever u = v and ρ uv := 0 whenever u ∈ L α and v ∈ L β , α = β, α, β ∈ {1, . . ., K} ; (2.5) here λ is computed from as defined in Equation (2.3).
Remark 2.2.4.In light of Definition 2.1.5,the wald space is the topological space W equipped with the unique topology such that the map φ : W → P is a homeomorphism onto its image.Thus, groves can be identified topologically with open unit cubes and the wald space thus with the disjoint union where we note that |E| runs from 0 (corresponding to F ∞ ) to 2N − 3 (for fully resolved trees), as is easily seen upon induction on N .

Furthermore, observe that
(1) Equation ( 2.3) links strictly monotonous edge weights with edge lengths so that the limits λ e → 0, 1 correspond to the limits e → 0, +∞, respectively; (2) for any partition A, B of L α (1 ≤ α ≤ K, as above), we have that where the implication to the right is a consequence of E α being a tree topology and the reverse implication is a consequence of A and B being a partition of L α .
Indeed, for every connected pair of leaves, there is a split separating this pair, for instance for all u, v ∈ L 1 there is a split e = A|B ∈ E such that u ∈ A and v ∈ B. Removing the edge 1|234 from the subtree comprising the leaf set L 1 violates this condition: If there is no split separating 1 and 2, which remain connected, then one vertex is labelled twice with 1 and 2. (Semple and Steel, 2003, e.g. Section 3.1.)allow such trees, we, however, exclude them.

Figure 2.
The topology E as defined in Example 2.2.5 with the corresponding splits annotated to the edges.

Topology and Stratification of Wald Space
3.1.Embedding.Recall from Theorem 2.1.4that ψ : W → P from Equation (2.1) is injective and so is the equivalent φ : W → P from Equation (2.4).Its image is characterized by algebraic equalities and inequalities, as shown by the the following theorem.Further exploration will yield that the topology of wald space is that of a stratified union of disjoint open unit cubes, each corresponding to a grove from Definition 2.2.3.Theorem 3.1.1.A matrix P = (ρ uv ) N u,v=1 ∈ P is the φ-image of a wald F ∈ W if and only if all of the following conditions are satisfied for arbitrary u, v, s, t ∈ L: (R1) ρ uu = 1, (R2) two of the following three are equal and smaller than (or equal to) the third Furthermore, the wald F ∈ W is then uniquely determined.
Before proving Theorem 3.1.1,we elaborate on the above algebraic conditions.
(1) Condition (R2) above is called the four-point-condition.In its non-strict version, all three products are equal and this indicates some degeneracy, namely that some internal vertices have degree four or higher.The four-point-condition is equivalent to (e.g.Buneman (1974) or (Semple and Steel, 2003, p.147)) and implies (e.g.setting s = t in (R2) and exploiting (R1)) (R4) ρ uv ≥ ρ us ρ sv for all u, v, s ∈ L. Notably (R1) and (R2) imply, in conjunction with P ∈ P that (R5) ρ uv < 1 for all u = v, for otherwise, if ρ uv = 1 for some u = v, Condition (R4) implied for any s ∈ L that ρ us ≥ ρ uv ρ vs = ρ vs and ρ vs ≥ ρ uv ρ us = ρ us , so ρ us = ρ vs and hence, P would be singular, a contradiction to , where the d uv are the finite or infinite distances between leaves u, v ∈ L, and, with Definition 2.2.3 5., this translates to d uu = 0 and d uv = ∞ whenever u and v are in different components.In the literature, (d uv ) N u,v=1 is also called tree metric (e.g.(Semple and Steel, 2003, Chapter 7)) or distance matrix (e.g.(Felsenstein, 2003, Chapter 11)).Indeed, it conveys a metric on L as Condition (R4) encodes the triangle inequality (for any u, v, s ∈ L) (3) In particular, the unit N × N matrix I = (δ uv ) u,v∈L ∈ P is the φ-image of the complete disconnected wald F ∞ ∈ W with topology E ∞ = ∅ in which each leaf comprises one of the K = N single element trees.(4) For a given P ∈ P satisfying conditions (R1), (R2) and (R3) there are neighbour joining algorithms in (Semple and Steel, 2003, Scn 7.3), determining its split E ∈ E.
Since φ(W) is defined by algebraic equalities and nonstrict inequalities, we have the following corrollary at once.Corollary 3.1.3.φ(W) ⊆ P is a closed subset of P.
Example 3.1.4(W for N = 3).For N = 3, all matrices P = φ(F ) with F ∈ W are given by (using Theorem 3.1.1) and 0 ≤ ρ 12 , ρ 13 , ρ 23 < 1.This set in coordinates ρ 12 , ρ 13 , ρ 23 is depicted in Figure 3, where the two-dimensional surfaces correspond to the non-linear boundaries resulting from the triangle inequalities.Note that the regions, where at least one coordinate is one, are not included in φ(W), as the corresponding matrix is no longer strictly positive definite.
for N = 3 embedded in P, where only the off-diagonal entries on the boundary are depicted.Note that the geometry of the wald space is not Euclidean and thus this depiction may be deceiving (as it is a non-isometric embedding into R 3 ), e.g. the regions where one coordinate equals 1 are infinitely far away.
Corollary 3.1.5.Conveyed by the homeomorphism φ, W is star shaped as a subset R N ×N with respect to F ∞ and hence contractible.
uv ) u,v∈L = P (x) = x I + (1 − x)P, and observe that for all x ∈ [0, 1], P (x) ∈ P, ρ x) satisfies (R1) and (R3) for all x ∈ [0, 1].Moreover, to see that P (x) satisfies Equation (3.1) for all x ∈ (0, 1) for all u, v, s, t ∈ L, assume w.l.o.g that vs , (3.3) as well.If only one pair is equal, there are two typical cases.If u = v, say, we obtain a different but valid four point condition vs , where the inequality is strict in case of ρ st > 0 due to 1 − x > (1 − x) 2 .If u = t, say, then we obtain Equation (3.3) where the inequality is strict if ρ vs > 0. If exactly two pairs are the same, then, with the above setup only u = t and v = s is possible and both Equation (3.2) and Equation (3.3) are strict.In case of three equal indices, one different, or the same, Equation (3.3) holds again.Therefore, P (x) satisfies (R2) for all x ∈ [0, 1], and by Theorem 3.1.1 the entire continuous path Showing contractibility of the edge-product space, Moulton and Steel contract to the same forest (cf.(Moulton and Steel, 2004, Proposition 5.1)), employing a different proof, however.
Remark 3.1.6.We make the following observations about the proof of Corollary 3.1.5.
(3) All triangle inequalites (R4) involving initial nonzero ρ uv are strict, however, for 0 < x < 1, so that for φ −1 (P (x) ) none of the leaves have degree 2. For example, starting with the wald consisting of a chain of three vertices with N = 3 (so each vertex is labelled and the middle is of degree two), it is immediately transformed into a fully resolved tree (and stays one for all x ∈ (0, 1)).(4) The point F ∞ can be viewed as a vantage point of W which is then a bounded part of a cone where every is a slice of level a ∈ [0, 1).Then for every F ∈ B a , there is r F > 1 such that ∈ W for all 0 ≤ x < r F and φ(F ) is singular for x = r F .For N = 3, the set B a for several a ∈ (0, 1] embedded into P is depicted in Fig. 4. (0, 0, 0) (1, 1, 1) (1, 0, 0) (0, 1, 0) (0, 0, 1) We next consider the restriction of the map φ to each grove G E explicitly in terms of edge weights.Definition 3.1.7.With the agreement (2.5) in case of empty E(u, v), we denote the restriction of φ : W → P from Definition 2.2.3 to a grove G E by ; its continuation onto all of R E is denoted by , Remark 3.1.8.The continuation φE is multivariate real analytic on all of R E .
The following theorem characterizes each grove.
Proof.For the first assertion consider e = A|B, where A ∪ B = L α , for some 1 ≤ α ≤ K and where L 1 , . . ., L K is the leaf partition induced by E. Then the matrix entries d uv := − log ρ uv (u, v ∈ L α ) define a metric on L α , as noted in Remark 3.1.2.For such a metric, (Buneman, 1971, Lemma 8) asserts that one can assign a tree (V α , E α , α ) where which is uniquely determined by (Buneman, 1971, Theorem 2).Due to our uniqueness results from Theorem 2.2.2 and Theorem 3.1.1,due to Equation (2.3), λ e = 1 − exp(− α e ) and hence, using ρ uv = exp(−d uv ), the asserted equation follows at once from Equation (3.6).
For the second assertion, let e ∈ E and suppose that Else, if u, v ∈ L α for some 1 ≤ α ≤ K, then ρ uv > 0 and with the Kronecker delta δ, Thus, for every x ∈ R E , we have x e 1 − λ e =: h uv .
We now view each of the e := xe 1−λe , e ∈ E as a real valued "length" of e.With a representative (V, E) of E with leaf set partition L 1 , . . ., L K , for every e ∈ E there are If the r.h.s. is zero due to Equation (3.8), then x e = 0, yielding that (dφ E ) λ has full rank, as asserted.
The third assertion follows directly from 1. and 2., i.e. φ E is bijectively smooth onto its image and its differential is injective.
In the following, we are concerned with φE (λ) if λ ∈ (0, 1) E approaches the boundary.The next result characterises exactly under which conditions φE (λ) stays in the image φ(W) of wald space under φ.
The first equivalence follows from that Equation (3.9) is well-defined.We prove the second equivalence."⇒": Follows from Remark 3.1.2,Condition (R5)."⇐": Analogously to the proof of Theorem 3.1.1,"⇐", we find a phylogenetic forest in the sense of (Semple and Steel, 2003, Chapter 2.8), whose tree metric coincides with the one obtained from φE (λ * ), but there might be multiply labelled vertices.However, this is impossible due to ρ * uv < 1 for any u = v, which is equivalent to a distance greater than zero between u and v. Therefore, there exists a phylogenetic forest F ∈ W with φ(F ) = φE (λ * ), and thus by Theorem 3.1.1,φE (λ * ) ∈ P.
The previous result immediately shows which matrices in P form the boundary of a grove.
Corollary 3.1.11.Let E be a wald topology.Then the boundary of the grove G E in W is given by The following result gives a first glimpse on how different groves are connected through the convergence of wälder.
Proof.For the first assertion, noting that there are only finitely many wald topologies, there needs to exist a subsequence F n k of F n with E n k = E for some topology E for all k ∈ N, and thus, since For 1., by Bolzano-Weierstraß, there needs to exist a cluster point For 2., for any cluster point λ * ∈ [0, 1] E , from the continuity of φE , φE (λ * ) is a cluster point of (φ(F n )) n∈N and by , and due to φE (λ * ) = φ(F ) ∈ P, the assertion follows.
The following example teaches that when F n → F , λ (n) can have distinct cluster points.
if both of the sets above are non void, else, we say that the restriction does not exist.In case of existence, we also say that e| L is a valid split.
The following definition is from Moulton and Steel (2004) and translated into the language of wälder and their topologies.
respectively, we say that if all of the following three properties hold: Refinement: with the partitions L 1 , . . ., L K and L 1 , . . ., L K of L induced by E and E, respectively, for every 1 where the r.h.s. is the set of splits E restricted to L α ; Cut: Further, we say The restriction condition above corresponds to the definition of a tree displaying another tree in Moulton and Steel (2004).From (Moulton and Steel, 2004, Lemma 3.1), it follows at once that the relation ≤ as defined in Definition 3.2.2 is a partial ordering.
In contrast, E 2 ≤ E, although the refinement and restriction properties are satisfied, the cut property is not, since there is no edge A|B = e ∈ E with {2, 5} ⊆ A and {3, 4} ⊆ B. (1) For each edge e ∈ E α , 1 ≤ α ≤ K , denote the set of all corresponding splits in E by R e := e ∈ E : e| L α = e .
(2) Furthermore, denote the set of all disappearing splits in E with (3) Denote the set of all cut splits with Example 3.2.5.
(ix) R dis , R cut in conjunction with the R e over all e ∈ E give a pairwise disjoint union of E, where R dis and R cut might be empty.Proof.Since Assertion (ii), which, among others, implies Assertion (v), follows from Assertion (xi) we proceed in the following logical order.(i): K = K implies w.l.o.g.L α = L α for all α = 1, . . ., K and therefore e| L α = e| Lα = e are valid splits for all e ∈ E α for all α = 1, . . ., K, so E α ⊆ E α as well as R e = {e } for all e ∈ E . (iii): (iv): By the restriction property of E ≤ E, each e ∈ E α is the restriction of some e ∈ E α , thus e ∈ R e = ∅.Assume that there exist e 1 , e 2 ∈ R e with e 1 = e 2 .If L α = L α was true, then e 1 = e 1 | L α = e 2 | L α = e 2 , a contradiction.
(vi): Assume the contrary: let A|B = e ∈ R e ∩ R e , where e ∈ L α ⊂ L α and e ∈ L α ⊂ L α .If α = α , then e = e| L α = e , a contradiction to e = e , so α = α .Since e is in both R e and R e , both restrictions to L α and L α exist and therefore Consequently, e = (A∩L α )|(B∩L α ) and e = (A for some e = A|B ∈ R e , and thus u ∈ A, v ∈ B, or vice versa, i.e. e ∈ E(u, v).Since the choice e ∈ R e was arbitrary, R e ⊆ E(u, v).If e ∈ R e ∩ E(u, v), u, v ∈ L α , then e = e| L α and e ∈ E (u, v) due to Equation (2.8).
(ix): By definition of R dis and R cut , they are disjoint and furthermore have empty intersection with each R e , e ∈ E and the latter are pair-wise disjoint due to (vi).
(x): By definition, R cut ∩ E(u, v) for all u, v ∈ L α (else R cut would contain valid splits).Then (ix) in conjunction with (viii) yields the assertion.
(xi): Without loss of generality, let K = 1 < K and suppose that α = 1, α = 2.In the first step note that it suffices to find a split e = A|B that separates L 1 from L α for all 2 ≤ α ≤ K for then, w.l.o.g.
so that none of the e| L 1 , . . .e| L K is a valid split and in consequence e ∈ R cut as desired.
In the second step we show the existence of such a e.In fact, to this end, it suffices to establish the following claim for all 3 ≤ J ≤ K , invoke induction and separately show the assertion for K = 2.
Claim: If ∃ split f = C|D separating L 1 from all of L 1 , . . ., L J−1 , i.e. w.l.o.g.L 1 ⊆ C, L 2 , . . ., L J−1 ⊂ D, that has the property C∩L J = ∅ = D∩L J then ∀ compatible splits e = A|B separating L 1 from L J where, w.l.o.g.L 1 ⊆ A, L J ⊆ B we have that e separates L 1 from all of L 1 , . . ., L J , i.e. equivalently Indeed, if K = 2 and e = A|B separates L 1 from L 2 then, w.l.o.g., A = L 1 and B = L 2 .
In the third step we show the claim.To this end let K ≥ 3, 3 ≤ J ≤ K , f = C|D as in the claim's hypothesis and suppose that e = A|B is an arbitrary compatible split with By compatibility of splits we have thus (ii): We show equivalently K = K ⇔ R cut = ∅."⇒": If K = K , then by (i) w.l.o.g.L α = L α and in particular e| L α = e| Lα = e are valid splits for all e ∈ E α , α = 1, . . ., K, so that R cut = ∅."⇐" follows at once from (xi).
(xiii): Suppose that F is a wald with leaf partition L 1 , . . ., L K and |E | < 2N − 3.In case of K = 1 there is a vertex of degree k ≥ 4, i.e. there is a partition A 1 , . . ., A k of L = L 1 with splits and all other splits in E are of form where A i is a suitable subset of A i .Then one verifies at once that the new split e := A 1 ∪ A 2 |L \ (A 1 ∪ A 2 ) is compatible with all splits in E so that E := E ∪ {e} is a wald topology with the desired properties |E| = |E | + 1 and E < E. For the latter note that R e = {e } for all e ∈ E , R cut = ∅ and R dis = {e}.In case of K ≥ 2 introduce the new split f := L 1 |L 2 and for every e 1 = A|B ∈ E 1 let e(e 1 ) := A|B ∪ L 2 , so that e(e 1 )| L 1 = e 1 .Similarly, for every e 2 = C|D ∈ E 2 let e(e 2 ) := C|D ∪ L 1 , so that e(e 2 )| L 2 = e 2 .Setting one verifies that all splits in E are pairwise compatible.Hence E is a wald topology with |E| = |E | + 1 and E < E. Indeed, for the latter note that R e = {e(e )} for all In the following theorem, we characterize the boundaries of groves via the partial ordering on wald topologies.Theorem 3.2.7.For wald topologies E and E , the following three statements are equivalent (with ∂G E as in Equation (3.9)): (i) E < E, By injectivity of φ, it suffices to show ( * ): = (ρ uv ) N u,v=1 := φ(F ) .First, observe by Agreement (2.5) that for all u ∈ L, ρ * uu = 1 = ρ uu .Next, again from Agreement (2.5), for all u, v ∈ L with u = v that are not connected in F , say u ∈ L α 1 , v ∈ L α 2 for some α 1 , α 2 ∈ {1, . . ., K }, we have ρ uv = 0.If u and v are also not connected in E, then ρ * uv = 0 = ρ uv .Assume now that u and v are connected in E.Then, by Lemma 3.2.6,(xi), there exists an edge A|B = e ∈ R cut with u ∈ A and v ∈ B, and due to λ * e = 1 by construction, ρ * uv = 0 = ρ uv .Finally, for all u, v ∈ L that are connected in F , we have, due to construction and Lemma 3.2.6,(x), Thus, we have shown φ(F ) = φE (λ * ).As F = (E , λ ) was arbitrary, we have shown G E ⊂ ∂G E where equality cannot be due to In the following, we will construct As Claim II implies F • = F and E • = E , in conjunction with Claim I we then obtain the assertion E < E.
In order to see Claim I, let φE (λ * ) = (ρ * uv ) N u,v=1 .Denote the connectivity classes of L, where u, v ∈ L are connected if and only if ρ * uv > 0, by By Lemma 3.2.6 (vii), each E • α • comprises compatible splits only so that E • satisfies the restriction property from Definition 3.2.2.
Verifying the cut property, suppose there exist 1 , say, then v ∈ B and hence e ∈ E(u, v) due to Equation (2.8) and hence ρ * uv = 0, due to λ * e = 1, a contradiction to Equation (3.12).Thus the cut property holds.
Having verified all of the properties from Definition 3.2.2,we have shown E • ≤ E, and we can use the notation introduced in Definition 3.2.4 and Lemma 3.2.6 is applicable for E • ≤ E. Since λ * is on the boundary, there must be some e ∈ E with either λ * e = 1 > λ e > 0 or all λ * e < 1 and there is λ * e = 0 < λ e .In the first case, e ∈ R cut , in the second case e ∈ R dis , so that in both cases E • = E by Lemma 3.2.6,(v), yielding E • < E, which was Claim I.
In order to see Claim II we define suitable edge weights λ Indeed, λ We now show the final part of Claim II, namely that φ(F . By Agreement (2.5), for all u ∈ L we have ρ * uu = 1 = ρ • uu and by definition of the connectivity classes For the first set we have But as e ∈ R dis this split does not exist in E • which, taking into account Equation (3.11), is only possible for λ * e = 0.In consequence, we have (the first and the last equality are the definitions, respectively, the second uses that R dis ∩ E(u, v) and R e • , e • ∈ E • (u, v) partition E(u, v) and the third uses for the first factor (3.14) and (3.13) for the second factor) From the above theorem and its proof, we collect at once the following key relationships.
Note that for the boundaries where at least one λ coordinate is one, infinitely many coordinates give the same wald: let . This is also illustrated in Figure 9 (right panel), where several arrows point to the coordinates on curves that correspond the same wald.This means that a two-dimensional boundary of the cube collapses into a onedimensional grove.
If at least two coordinates of λ * are equal to 1, then the corresponding phylogenetic forest will be the forest consisting of three isolated vertices, and in this case, four points as well as the three segments where two coordinates are 1 and one is strictly between zero and one on the boundary of the cube collapse to only one point in W, marked red in Figure 9 (right panel).
Corollary 3.2.10.Let F, F ∈ W with topologies E, E , respectively, and let With the same argument as in the proof of Theorem 3.1.12,there exists at least one subsequence (λ by definition of ∂G E from Equation (3.9)), then by Theorem 3.2.7 it follows that E < E, so in general E ≤ E.

Whitney Stratification of wald space.
Recall from Section 1.3 the differentiable manifold of strictly positive definite matrices P, and that the tangent space T P P at P ∈ P is isomorphic to the vector space of symmetric matrices S. In order to study convergence of linear subspaces of S, we recall the Grassmannian manifold of k-dimensional linear subspaces in R m , 0 ≤ k ≤ m, see e.g.(Lee, 2018, Chapter 7). .Depicting the grove G E ∼ = (0, 1) 3 of a fully resolved tree with N = 3 leaves, and its boundary ∂G E , as discussed in Example 3.2.9.Left: G E and its two-dimensional "boundary at zero" (coordinate axes are excluded).Right: the "boundary at one" comprising the one-dimensional component (points on same blue curves represent a single wald) and zero-dimensional component (points on the red spider) represent where S(m, k) = {V ∈ R m×k : rank(V ) = k} is the Stiefel manifold of maximal rank (m × k)-matrices equipped with the smooth manifold structure inherited from embedding in the Euclidean R m×k .Since col(V ) = col(V G) for every G ∈ S(k, k) and V ∈ S(k, m), the space

As every orbit {V
) and since for every V ∈ S(m, k) its isotropy group {G ∈ S(k, k) : V G = V } contains the unit matrix only, the quotient carries a canonical smooth manifold structure.
Remark 3.3.2. 1) Note that none of the cluster points of G n or G n / G n can be singular, hence they are all in S(k, k) 2) There may be, however, a sequence Nevertheless we have the following relationship.
Lemma 3.3.3.Let V n ∈ S(m, k) and assume that the two limits below exist.Then Then the assertion follows, once we show v ⊥ W with W = lim n→∞ V n .By hypothesis, for every > 0 there are N ∈ N and Let us first assume that there is a subsequence where If there is no such subsequence, w.l.o.g.we may assume G n ≤ 1 for all n ≥ N .Again, G n has a cluster point R and thus |v T W R| ≤ which implies, as above, v T W R = 0. Since R ∈ S(k, k) by Remark 3.3.2we have v T W = 0 as asserted.
In the following, recall the definition of a Whitney stratified space of type (A) and (B), respectively, taken from the wording of Huckemann and Eltzner (2020, Section 10.6).Definition 3.3.4.A stratified space S of dimension m embedded in a Euclidean space (possibly of higher dimension M ≥ m) is a direct sum A stratified space S is Whitney stratified of type (A), (A) if for a sequence q 1 , q 2 , • • • ∈ S j that converges to some point p ∈ S i , such that the sequence of tangent spaces T qn S j converges in the Grassmannian G(M, d j ) to a d j -dimensional linear space T as n → ∞, then T p S i ⊆ T , where all the linear spaces are seen as subspaces of R M .Moreover, a stratified space S is a Whitney stratified space of type (B), (B) if for sequences p 1 , p 2 , • • • ∈ S i and q 1 , q 2 , • • • ∈ S j which converge to the same point p ∈ S i such that the sequence of secant lines c n between p n and q n converges to a line c as n → ∞ (in the Grassmannian G(M, 1)), and such that the sequence of tangent planes T qn S j converges to a d j -dimensional plane T as n → ∞ (in the Grassmannian G(M, d j )), then c ⊂ T .
Theorem 3.3.5.Wald space with the smooth structure on every grove G E conveyed by φ E from (3.4), is a Whitney stratified space of type (A).
Proof.First, we show that W is a stratified space.In conjunction with Remark 2.2.4, the manifolds S i of dimension d i = i are the unions over disjoint groves of W of equal dimenison i = 0, . . ., 2N − 3 = m, counting the number of edges, each diffeomorphic to an i-dimensional open unit cube, If S i ∩ S j = ∅ for some 0 ≤ i = j ≤ m then there are wald topologies E, E with In particular, then i < j.Further, if E with i = |E | is any other wald topology, induction on Lemma 3.2.6 (xiii) shows that it can be extended to a wald topology E with j = |E| such that E < E and hence G E ⊂ G E by Theorem 3.2.7.Thus, we have shown that S i ⊂ S j , as required.
In order to show Whitney condition (A), it suffices to assume i = j.Let F 1 , F 2 , • • • ∈ S j be a sequence of wälder that converges to some wald F = (E , λ ) ∈ S i , so i < j.Since S j is a disjoint union of finitely many groves, w.l.o.g.we may assume that With the analytic continuation φE of φ E , see Remark 3.1.8,a cluster point ), see Theorem 3.1.12,and the unit standard basis ∂/∂λ e , e ∈ E of G E ∼ = (0, 1) E we have thus and, due to Lemma 3.3.3, Since likewise To see this, it suffices to show that for each e ∈ E , there exists a constant c > 0 and an edge e ∈ E such that In the following we show (3.16).

Recalling for
1 − λ e from Definition 3.1.7,obtain their derivatives Recall from Corollary 3.2.8 the two relationships between F and φE (λ * ): Consequently, for any e ∈ E α there exists e ∈ R e with λ * e = 0. Now, let u, v ∈ L be arbitrary and for every e ∈ E , we consider e as above.

Information Geometry for Wald Space
In Garba et al. (2021) we equipped the space of phylogenetic forests with a metric induced from the metric of the Fisher-information Riemannian metric g on P (see Section 1.3), where the latter induces the metric d P on P. In this section we show, first that this induced metric is compatible with the stratification structure of W, and second that this turns W into a geodesic Riemann stratified space.4.1.Induced Intrinsic Metric.In Garba et al. (2021) we introduced a metric on W induced from the geodesic distance metric d P of P introduced in Section 1.3.Recalling also the definition of path length L P from Section 1.3, for two wälder This metric defines the induced intrinsic metric topology on W. While in general this topology may be finer than the one conveyed by making an embedding a homeomorphism, as the following example teaches, this is not the case for wald space.
Example 4.1.1.Consider an infinite union of half open intervals in R 2 connected vertically on the right.In the trace topology where the canonical embedding ι : M → R 2 is a homeomorphism, the sequence q n = (0, 1/n) converges to q = (0, 0).For the induced intrinsic metric with the Euclidean length L R 2 , we have, however, d W (q n , q) ≥ 2 for all n ∈ N.
Theorem 4.1.2.The topology of W obtained from making φ a homeomorphism agrees with the topology induced from the induced intrinsic metric d W .In particular d W turns W into a metric space.
Proof.By definition we have that d W ≥ d P , which implies that sequences that converge with respect to d W also converge with respect to d P .
For the converse, assume that W F n → F ∈ W w.r.t.d P , as n → ∞.Since there are only finitely many groves in W it suffices to show that d W (F n , F ) → 0 for F n ∈ G E and F ∈ G E with a common grove G E .Hence, we assume that φ−1 is a path in W connecting γ(0) = F with γ(1) = F n .For k ∈ N and j = 1, . . ., k we note that also uniformly n ∈ N, due to Remark 3.1.8.In consequence, in conjunction with Section 1.3, with a constant C > 0 independent of n.Letting n → ∞ thus yields the assertion.
4.2.Geodesic Space and Riemann Stratification.Having established the equivalence between the stratification topology and that of the Fisher information metric, we longer distinguish between them.
Theorem 4.2.1.The wald space equipped with the information geometry is a geodesic metric space, i.e. every two points in (W, d W ) are connected by a minimising geodesic.
Proof.By (Lang, 1999, p.325), (P, g) is geodesically complete as a Riemannian manifold and thus by the Hopf-Rinow Theorem for Riemannian manifolds (among others, (Lang, 1999, p.224)), it follows that (P, d P ) is complete and locally compact.By Corollary 3.1.3,φ(W) is a closed subset of the complete and locally compact metric space P and so (φ(W), d P ) itself is, and so is (W, d P ).By (Garba et al., 2021, Theorem 5.1), any two wälder in are connected by a continuous path of finite length in (W, d P ), which is complete, and thus applying (Hu and Kirk, 1978, Corollary on p.123) yields that (W, d W ) is complete.Applying the Hopf-Rinow Theorem for metric spaces (Bridson and Haefliger, 1999, p.35) to (W, d W ), the assertion holds.
Following Huckemann and Eltzner (2020, Section 10.6), extend the notion of a Whitney stratified space in Definition 3.3.4 to the notion of a Riemann stratified space.
Definition 4.2.2.A Riemann stratified space is a Whitney stratified space S of type (A) such that each stratum S i is a d i -dimensional Riemannian manifold with Riemannian metric g i , respectively, if whenever a sequence q 1 , q 2 , • • • ∈ S j which converges to a point p ∈ S i (where, assume again that the sequence of tangent planes T qn S j converges to some d j -dimensional plane T as n → ∞), then the Riemannian metric g j qn converges to some two form g * p : T ⊗ T → R with g i p ≡ g * p | TpSi⊗TpSi .Theorem 4.2.3.The wald space W equipped with the information geometry is a Riemann stratified space.
Proof.As we impose the Riemannian metric g from P onto all of φ(W) ⊂ P, the assertion follows immediately.

Numerical Exploration of Wald Space
In this section we propose a new algorithm to approximate geodesics between two fully resolved trees F 1 and F 2 , that is a mixture of the successive projection algorithm and the extrinsic path straightening algorithm from Lueg et al. (2021).Using this algorithms allows to explore curvature and so-called stickiness of Fréchet means.
5.1.Approximating Geodesics in Wald Space.From the ambient geometry of P, recalling the notation from Section 1.3, we employ the globally defined Riemannian exponential Exp and logarithm Log at P ∈ P, with Q ∈ P, X ∈ T P P, as well as points on the unique (if P = Q) geodesic γ P,Q in P comprising P and Q.
Return: The current discrete path Γ, which is a discrete approximation of the geodesic between F and F with 2 I (n 0 − 1) + 1 points.
While Theorem 4.2.1 guarantees the existence of a shortest path between any F, F ∈ W, it may not be unique, and it is not certain whether the path found by the algorithm is near a shortest path or represents just a local approximation.
To better assess the quality of the approximation Γ = (F 0 , . . ., F n ) found by the algorithm, Rumpf and Wirth (2015) propose considering its energy, yielding a means of comparison for discrete paths with equal number of points.
This path is depicted in Figure 11, as well as the BHV space geodesic (which is a straight line with respect to the -parametrization from Definition 2.1.2),first in the coordinates λ ∈ (0, 1) 3 and second embedded into P viewed as R 3 , cf. Figure 3.
In contrast to the BHV geometry, the shortest path in the wald space geometry sojourns on the two-dimensional boundary, where the coordinate λ 1 is zero for some time.The end points λ (1) , λ (2) , are trees that show a high level of disagreement over the location of taxon 1, but a similar divergence between taxon 2 and taxon 3. The section of the approximate geodesic with λ 1 = 0 represents trees on which the overall divergence between taxon 1 and the other two taxa is reduced.In this way, the conflicting information in the end points is resolved by reducing the divergence (and hence increasing the correlation) between taxon 1 and the other two taxa, in comparison to the BHV geodesic which has λ 1 > 0 along its length.
5.2.Exploring Curvature of Wald Space.Since curvature computations involving higher order tensors are heavy on indices, we keep notation as simple as possible in the following by indexing splits in E by h, i, j, k, m, s, t ∈ E .
The concepts of transformation of metric tensors, Christoffel symbols and curvature employed in the following can be found in any standard text book on differential geometry, e.g.Lang (1999); Lee (2018).
Recall that the Riemannian structure of wald space is inherited on each grove G E ∼ = (0, 1) E from the information geometric Riemann structure of P pulled back from φ E : (0, 1) E → P. In consequence, the Riemannian metric tensor g F (1)   Figure 11.The wald space geodesic (red) between fully resolved phylogenetic forests F 1 , F 2 ∈ W (N = 3) sojourns on the boundary (brown).The image of the BHV space geodesic (blue) remains in the grove as discussed in 5.1.2.In λ-representation (left) and embedded in P viewed as R 3 (right, cf. Figure 3).G E , evaluated at λ ∈ (0, 1) E , is given by the Riemannian metric tensor g P λ at φ E (λ) = P , where base vectors transform under the derivative of φ E : As usual (g ij ) i,j∈E denotes the matrix of g in standard coordinates and (g ij ) i,j∈E its inverse.This yields the Christoffel symbols for i, j, m ∈ E, which give the representation of the curvature tensor Introducing the notation (P = φ E (λ)) and performing a longer calculation in coordinates i, j ∈ E, gives Evaluating the sectional curvature tensor at a pair of tangent vectors x, y ∈ T λ G E ∼ = R E at λ gives the sectional curvature K(x, y) at λ of the local two-dimensional subspace spanned by geodesics with initial directions generated by linear combinations of x and y.Abbreviating |x| Example 5.2.1.Again revisiting W from Example 3.2.9 with N = 3, we first consider wälder in the unique top-dimensional grove G E ∼ = (0, 1) 3 , and then on its boundary.
(1) We compute minimum and maximum sectional curvatures at the wälder F with λ = (a, a, a), for a ∈ (0, 1), as displayed in Figure 12.Traversing along 0 < a < 1 we find both positive and negative sectional curvatures and their extremes escape to positive and negative infinity as the vantage point, the isolated forest F ∞ , is approached, where all dimensions collapse.(2) In order to assess Alexandrov curvature that measures "fatness/slimless" of geodesic triangles (hence it does not require a Riemannian structure, a geodesic space suffices, see Sturm ( 2003)) we compute several geodesic triangles and their respective angle sums within W. The corners of the triangles are wälder F 1 , F 2 , F 3 ∈ W, where {i, j, k} = {1, 2, 3}, E i := {j|k} and λ j|k ∈ (0, 1]. Figure 13 depicts the geodesic triangles (left panel) non isometrically embedded in R 3 representing the off-diagonals in P, as well as their respective angle sums (right panel).When the two connected leaves approach one another (λ e ≈ 0) triangles become infinitely thin, but near F ∞ (λ e ≈ 1) the triangles become Euclidean.Conjecture 5.2.2.This example hints towards a general situation: (i) Wald space groves feature positive and negative sectional curvatures alike, both of which become unbounded when approaching the vantage point F ∞ .(ii) When approaching the infinitely far away boundary of P from within W, some Alexandrov curvatures tend to negative infinity.Recently, it has been discovered by Hotz and Huckemann (2015); Eltzner and Huckemann (2019) that positive curvatures may increase asymptotic fluctuation by orders of magnitude, and by Hotz et al. (2013);Huckemann et al. (2015) that infinite negative Alexandrov curvature may completely cancel asymptotic fluctuation, putting a dead end to this approach of non-Euclidean statistics.In particular, this can be the case for BHV spaces, cf.Barden et al. (2013Barden et al. ( , 2018)); Barden and Le (2018).
Example 5.3.1 (Stickiness in wald space).Consider two samples F 1 , F 2 , F 3 ∈ W and F 1 , F 2 , F 3 ∈ W with N = 4, depicted in Figure 14, where F 1 and F 1 only differ by weights of their interior edges.By symmetry, their Fréchet means are of form F having equal but unknown pendent edge weights 0 < λ pen < 1 and unknown interior edge weights 0 ≤ λ int < 1, as in Figure 14.It turns out that the Fréchet means of both samples agree in BHV with λ int = 0, i.e. the empirical mean sticks to the lower dimensional star tree stratum (featuring only pendant edges).
In contrast, the two empirical Fréchet functions in wald space have different minimizers, and, in particular the minimizer for F does not stick to the star stratum but has λ int > 0. Figure 15 illustrates the values of F and F for different values of the parameters λ pen , λ int of F near the respective minima.Remark 5.3.2.This preliminary research indicates that effects of stickiness, which are still expected where "too many" lower dimensional strata hit higher dimensional strata, are less severe in wald space than in BHV space, thus making wald space more attractive for asymptotic statistics based on Fréchet means.

Discussion
In previous work (Garba et al., 2021), the wald space was introduced as a space for statistical analysis of phylogenetic trees, based on assumptions with a stronger biological motivation than existing spaces.In that work, the focus was primarily on geometry, whereas here we have provided a rigorous characterization of the toplogy of wald space.Specifically, wald space W is a disjoint union of open cubes with the Euclidean toplogy, and as topological subspaces we have with the BHV space BHV N −1 from Billera et al. (2001) and the edge-product space E N from Moulton and Steel (2004).We have shown that this topology is the same as that induced by the information metric d W defined in Garba et al. (2021).Furthermore, we have shown W is contractible, and so does not contain holes or handles of any kind.Examples suggest that W is a truncated cone in some sense (see Figure 4), but its precise formulation remains an open problem.As established in Theorem 3.3.5,boundaries between strata in wald space satisfy Whitney condition (A); whether Whitney condition (B) holds is an open problem, although we expect it to hold on the boundaries of any grove (0, 1) E ∼ = G E corresponding to the limit as one or more coordinates λ e → 0 (i.e. the boundaries between strata in BHV N −1 ).Our key geometrical result is that with the metric d W , wald space is a geodesic metric space, Theorem 4.2.1.The existence of geodesics greatly enhances the potential of wald space as a home for statistical analysis.
The approximate geodesics computed via the algorithm in Definition 5.1.1 provide insight into the geometry and a source of conjectures.For example, unlike geodesics in BHV tree space, it appears that geodesics in wald space can run for a proportion of their length along grove boundaries, even when the end points are within the interior of the same grove (see Example 5.1.2).If wald space is uniquely geodesic (so that there is a unique geodesic between any given pair of points), its potential as a home for statistical analysis would be improved further.However, the presence of positive and negative sectional curvatures for different pairs of tangent vectors at the same point, and an apparent lack of global bounds on these, suggests geodesics may non-unique, or at least makes proving uniqueness more challenging.Finally, Example 5.3.1 which involves approximate calculation of Fréchet means, suggests that wald space is less 'sticky' than BHV tree space and hence more attractive for studying asymptotic statistics.
A variety of open problems remain, and we make the following conjectures.
(1) All points on any geodesic between two trees are also trees.
(2) Geodesics between trees in the same grove do not leave the closure of that grove.
(3) The disconnected forest F ∞ is repulsive, in the sense that the only geodesics passing through the disconnect forest have an end point there.Other open problems include the following, all mentioned elsewhere in the paper.
(4) Is wald space a truncated topological cone?
(5) Does Whitney condition (B) hold at grove boundaries?(6) Most importantly for statistical applications, is wald space uniquely geodesic or can examples of exact non-unique geodesics be constructed?What is then the structure of cut loci?

Figure 4 .
Figure 4. Depicting the off-diagonal matrix entries of φ(W) embedded in P for N = 3 (orange boundary) in a 3-dimensional coordinate system (cf.Example 3.1.4)and the 2-dimensional images φ(B a ) (purple) of the slices B a for a = 0.2, 0.87, 0.997 (from left to right).

Figure 5 .
Figure 5.A sequence of wälder (left and middle) converging (right) but having different λ cluster points as detailed in Example 3.1.13.

3. 2 .
At Grove's End.In light of Theorem 3.1.12,we investigate how two wald topologies E = [F ] and E = [F ] are related to each other.Definition 3.2.1.Let F ∈ W be a wald with topology E = [F ].For an edge e = A|B ∈ E, we define the edge restricted to some subset L ⊆ L by e|

Figure 6 .
Figure 6.Wald topologies E, E 1 and E 2 from Example 3.2.3,where E 1 < E but E 2 ≤ E. Definition 3.2.4.Let E, E be wald topologies with E ≤ E.(1) For each edge e ∈ E α , 1 ≤ α ≤ K , denote the set of all corresponding splits in E by R e := e ∈ E : e| L α = e .
in conjunction with the R e over all e ∈ E (u, v) give a pairwise disjoint union of E(u, v), where R dis ∩ E(u, v) might be empty.(xi) For any L α , L α ⊂ L α with α = α , there exists a split A|B = e ∈ E withL α ⊆ A, L α ⊆ B and e ∈ R cut .Let F, F ∈ W with ρ = φ(F ), ρ = φ(F )and topologies E and E , respectively, with label partitions L 1 , . . ., L K and L 1 , . . ., L K , respectively.Then the following hold (xii) If for all u, v ∈ L:ρ uv = 0 =⇒ ρ uv = 0, then L 1 , . . ., L K is a refinement of L 1 , . . .,L K .Finally, we have the general result (xiii) For every wald topology E with |E | < 2N − 3 there is a wald topology E with |E| = |E | + 1 and E < E.
Due to E ≤ E, by the cut property there exists C|D = ẽ ∈ E α separating L α and L α , i.e.L α ⊆ C and L α ⊆ D. But then ẽ, e ∈ E α cannot be compatible, a contradiction.(vii): Let L α ⊂ L α and e , e ∈ E| L α such that e = e| L α and e = e • | L α with e = A|B ∈ E and e • = A • |B • ∈ E. Then e, e • ∈ E α for otherwise their restrictions to L α would not be valid splits.Since e and e • are compatible, w.l.o.g.A ∩ A • = ∅.
Figure9.Depicting the grove G E ∼ = (0, 1) 3 of a fully resolved tree with N = 3 leaves, and its boundary ∂G E , as discussed in Example 3.2.9.Left: G E and its two-dimensional "boundary at zero" (coordinate axes are excluded).Right: the "boundary at one" comprising the one-dimensional component (points on same blue curves represent a single wald) and zero-dimensional component (points on the red spider) represent F ∞ .
and λ * e = 0 as well as for each e ∈ E , (3.19)

.
Thus for c in (3.16) any positive constant can be chosen.(2)Case e = A|B ∈ E(u, v).W.l.o.g.assume that u ∈ A and v ∈ B. Then there are two subcases: (a) e / ∈ E (u, v).On the one hand, as above this implies ∂φ E ∂λ e (λ ) uv = 0, on the other hand, as e 16) as it does not depend on u and v and is non-zero by Equation (3.19).Having thus shown (3.16), as detailed above we have established (3.15) thus verifying Whitney condition (A).

Figure 10 .
Figure 10.Depicting the distance d of an arbitrary wald, to the disconnected forest F ∞ as a function of λ 1 (left) and between any two other wälder in W, as a function of (λ 1 , λ 2 ) (right) as detailed in Example 4.2.4.

Figure 12 .
Figure 12.Minimum and maximum sectional curvatures along 0 < a < 1 of wald space (N = 3) at F ∈ W with λ = (a, a, a), as described in Example 5.2.1.

Figure 13 .F
Figure13.Displaying sums of angles in degrees (right) of geodesic triangles spanned by three wälder for N = 3 with one disconnected leaf and edge weight 0 < λ e < 1 between the other two leaves as discussed in Example 5.2.1.Embedding W in P viewed (non isometrically) as R 3 , the geodesic triangles are visualized on the left, where the origin corresponds to λ e = 1.

Figure 14 .Figure 15 .
Figure 14.Two samples of wälder: F 1 , F 2 , F 3 and F 1 , F 2 , F 3 where F 1 and F 1 only differ by weights of their interior edges.By symmetry, F is a candidate for each Fréchet mean, see Example 5.3.1.