The Erlang Weighted Tree, A New Branching Process

In this paper, we study a new discrete tree and the resulting branching process, which we call the \textbf{E}rlang \textbf{W}eighted \textbf{T}ree(\textbf{EWT}). The EWT appears as the local weak limit of a random graph model proposed in~\cite{La2015}. In contrast to the local weak limit of well-known random graph models, the EWT has an interdependent structure. In particular, its vertices encode a multi-type branching process with uncountably many types. We derive the main properties of the EWT, such as the probability of extinction, growth rate, etc. We show that the probability of extinction is the smallest fixed point of an operator. We then take a point process perspective and analyze the growth rate operator. We derive the Krein--Rutman eigenvalue $\beta_0$ and the corresponding eigenfunctions of the growth operator, and show that the probability of extinction equals one if and only if $\beta_0 \leq 1$.


Introduction
This chapter deals with a random tree object called the Erlang Weighted Tree (EWT). The construction of the EWT begins with the construction of the "backbone tree". The backbone tree has more edges, some of which are then pruned to obtain the EWT.
Let N f denote the labels of vertices of an infinite tree. Each i ∈ N f is associated with three types of random variables: 1) n i which is the potential number of descendants of the vertex i, 2) v i which is the value associated with the vertex i, and, 3) {ζ (i,j) } n i j=1 which represents the cost of the potential edges {i, (i, j)} for j ∈ {1, 2, . . . , n i }. The probability distribution of n ø is given by P ∈ P(N) which is assumed to have a finite mean. The probability distribution of n i for i ∈ N f \ N 0 is given by the shifted distribution P ∈ P(Z + ), i.e., P (k) = P (k + 1) for all k ≥ 0. Conditioned on n i , v i is distributed as Erlang(· ; n i + 1, λ) for a positive and fixed real value λ. Conditioned on n i and v i , {ζ (i,j) } n i j=1 are n i independent and uniformly distributed random variables over the interval [0, v i ]. When n i = 0, there are no potential edges emanating from vertex i. The backbone tree is the connected component of ø with the potential edges as its edge set. The edges of the backbone tree are pruned to obtain the EWT. Define a rooted tree T • = (V, E, ø, w v , w e ), rooted at ø, by preserving the edge between the vertices i and (i, j) if and only if ζ (i,j) < v (i,j) . The mark functions are defined as follows, The random rooted tree T • is called an Erlang Weighted Tree with distribution for the potential degree given by P . Henceforth, we call P the potential degree distribution. Let [T o ] denote the equivalence class of T o up to isomorphisms (over vertex relabelings that preserve the root). Denote by Er(P, λ) the probability distribution of [T • ] in G * , which denotes the set of rooted marked graphs, up to isomorphisms. For a formal definition of G * and related background material, see Section 2.1.

Remark 1.
The parameter λ in the definition of Er(P, λ) appears only as a scaling factor. Usually, this value is set to be 1, and for ease of notation, Er(P ) is used instead of Er(P, 1).

Remark 2.
Throughout the chapter, a non-root vertex i with the mark (n i + 1, v i ) will be referred to as a vertex of type (n i , v i ).
We will show that EWT appears as the local weak limit of a random graph model introduced by La and Kabkab in [15]. The graph construction starts with a complete graph K n = ([n], E n ), a sequence of positive integers d n = (d 1 (n), d 2 (n), · · · , d n (n)) and a random cost function C n that assigns non-negative real values to the edges of K n , independently. The value of d i (n) indicates the number of neighbors that vertex i wants to connect to. The value assigned to each edge by C n is an independent exponentially distributed random variable with parameter 1/n that represents the cost of the edge. Each vertex i then selects the d i (n) lowest cost incident edges and declares them to be its preferred edges. The random graph G n = ([n], E n ) is constructed by keeping only those edges of E n that are preferred by both end vertices. This model is closely related to the k-th nearest neighbor graphs presented by Cooper and Frieze in [6], in which a connection survives as long as one individual is interested in it. The bilateral agreement required in the above random graph model makes the analysis much more challenging.

Main Results
In this work, we derive the following main properties of the EWT.
(ii) Let Bi(·; n, p) denote the binomial distribution with parameters n ∈ N and p ∈ [0, 1]. The degree distribution of the root is given by, Note that this is the asymptotic degree distribution of the random graph family in [15]. Unlike the canonical branching processes, the degree distribution of a vertex at depth l ≥ 0 depends on l.
(iii) The probability of extinction is given by, where q(·) ∈ C 1 (R + ; [0, 1]) is the smallest fixed point (point-wise smaller than all the other fixed points) of the operator T : L(R + ; [0, 1]) → C 1 (R + ; [0, 1]) defined as, This fixed point is also the pointwise limit of T l (0)(·) as l goes to infinity where 0(·) is the zero function. If probability of extinction equals 1, then the function q(x) ≡ 1 for all x ≥ 0 is the unique fixed point of T (upto sets of measure 0). If probability of extinction is smaller than 1, then the operator T has exactly two fixed points: q(·) and 1(·), where 1(·) is the all 1 function.
(v) Let the assumption of part (iv) hold and let β 0 > 1. Then there is a random variable W such that Z l /β 0 l converges to W almost surely and in L 2 . Moreover, Z l ∼ β 0 l W , i.e., β 0 is the growth rate of Z l , and the proportion of various types converges to a non-random limit.
(vi) Let the assumption of part (iv) hold. If β 0 > 1, then the probability of extinction is less than 1, otherwise it equals 1. Moreover, if β 0 > 1, then the number of vertices at generation n as n → ∞, goes to either 0 or ∞.
The organization of the rest of the chapter is as follows: In Section 2, we provide the necessary background: a short background on random graphs and local weak convergence, a shot note on the point process perspective of a branching process, and a short description of spectral theory for compact self-adjoint bounded linear operators. In Section 3, we describe the finite graph model and discuss the local weak convergence of the finite graph model to the EWT. In Section 4, we begin with the unimodularity of EWT. Then we derive the degree distribution of the root vertex, expected number of vertices at generation l and the probability of extinction. Finally, we discuss the point process perspective and derive the growth rate of the branching process and the phase transition. In Section 5, we present some numerical illustrations of our results. Some proofs are presented in the Appendix for ease of presentation.

Background Material
In this section, we present the necessary background for the rest of the chapter. The essential background on "random graphs and local weak convergence" is mostly based on lecture notes by Bordenave [5] and the work of Aldous and Lyons [2]. The background on the "point process perspective of a branching process" is based on chapter 3 of Harris's book [12]. We use this background in Section 4.5 which proves the most significant result of our work. The background on the "spectral theorem for compact self-adjoint bounded linear operators" is based on a classical text book in functional analysis by Lax [16] and the work of Toland [18]. The related topics from this subject are used in Section 4.5; however, we will rederive the main theorems presented in this section using a probabilistic approach.

Random Graphs and Local Weak Convergence
We start with a few graph terminologies that are used in the chapter. Let G = (V, E) denote an undirected graph, where V is the set of vertices (finite or countably infinite), and E is the set of edges. A rooted graph G • = (V, E, ø) is a graph with a distinguished vertex ø ∈ V . Vertices v 1 , v 2 ∈ V are said to be neighbors, if {v 1 , v 2 } ∈ E. The degree of a vertex v ∈ V , denoted by d v , is the number of its neighbors. A graph G is said to be locally-finite if the degree of each vertex is finite. A path p of length n − 1 is an ordered sequence of vertices (v 1 , v 2 , . . . , v n ) where {v i , v i+1 } ∈ E, ∀i < n. A graph G is said to be connected if there is a path between every pair of vertices.
Two graphs G = (V, E) and G = (V , E ) are said to be isomorphic if there is a bijection σ from V to V such that {v 1 , v 2 } ∈ E if and only if σ({v 1 , v 2 }) := {σ(v 1 ), σ(v 2 )} ∈ E . The function σ is called an isomorphism from G to G . A rooted-isomorphism between two rooted graphs is an isomorphism that maps the root vertices to each other.
A network N = (V, E, w v , w e ) is a graph (V, E) with mark functions w v : V → Ω 1 and w e : E → Ω 2 , where Ω 1 and Ω 2 are the mark spaces. A rooted network is a network with a distinguished vertex as the root vertex. In this chapter, the mark spaces are assumed to be Ω 1 = N × R + and Ω 2 = R + , which are complete separable metric spaces equipped with the following metrics, d Ω 1 ((m, x), (n, y)) = (m − n) 2 + (x − y) 2 ∀m, n ∈ N, ∀x, y ∈ R + d Ω 2 (x, y) = |x − y| ∀x, y ∈ R + Two networks N and N are said to be isomorphic if there is a bijection map from V to V that preserve the edges as well as the marks. A rooted-isomorphism between two rooted networks N • and N • is an isomorphism that maps the root of one network to the other. For a rooted network N • = (V, E, ø, w v , w e ), [N • ] denotes the class of rooted networks that are isomorphic to N • . Let G * (Ω 1 , Ω 2 ) denote the set of all isomorphism classes [N • ], where N • ranges over all connected locally-finite rooted networks with mark spaces Ω 1 and Ω 2 . For notational simplicity, we use G * instead of G * (Ω 1 , Ω 2 ).
There is a natural way to define a metric on G * . Consider a connected rooted network N • = (V, E, ø, w v , w e ) 1 and the corresponding rooted graph G • = (V, E, ø). The depth of a vertex v ∈ V is defined to be the infimum length of the paths from v to the root vertex. Let (G • ) t = (V t , E t , ø) denote the subgraph of G • where V t is the set of vertices in V at depth less than or equal to t from φ, and E t is the set of edges in E between the vertices in V t . For any [N • ], [N • ] ∈ G * , a natural way to define a distance is given by where T = sup t ≥ 0 : there exists a rooted-isomorphism σ from (G • ) t to (G • ) t such that ∀v ∈ V t , ∀e ∈ E t , d Ω 1 (w v (v), w v (σ(v))) < t −1 and d Ω 2 (w e (e), w e (σ(e))) < t −1 .
The space G * equipped with d G * is a complete separable metric (Polish) space [5]. Define P(G * ) as the set of all probability measures on G * and endow this space with the topology of weak convergence. Since G * is a Polish space, the space P(G * ) is a Polish space as well [5] with the Lévy-Prokhorov metric.
The members of G * are unlabeled connected locally-finite rooted networks; however, there is a way to generalize the framework to unrooted, not necessarily connected, finite networks. Consider a finite network N = (V, E, w v , w e ). For every vertex v ∈ V , define N (v) to be the connected component of the vertex v in the network N . Let N • (v) denote the rooted version of N (v), rooted at v, and define δ [N•(v)] ∈ P(G * ) to be the Dirac measure that assigns 1 to [N • (v)] and 0 to any other member of G * . Define U (N ) ∈ P(G * ) as follows, The probability measure U (N ) is the law of [N • (ø)], where ø ∈ V is picked uniformly at random. This probability measure captures the local structure of N as viewed from a randomly chosen vertex. The notion of local weak convergence studies the weak limit of {U (N n )} n≥0 for a sequence of finite networks {N n } n≥0 .

Definition 1. (Local Weak Limit)
A sequence of finite networks {N n } n≥1 has a local weak limit ρ ∈ P(G * ) if U (N n ) w − → ρ.
A necessary condition for a probability measure ρ ∈ P(G * ) to be a local weak limit is unimodularity [2] which is defined next. Let G * * (Ω 1 , Ω 2 ), or more simply G * * denote the set of isomorphism classes of connected locally-finite networks with an ordered pair of distinct vertices. Let N •• (ø, v) denote a network in G * * . Equip G * * with the natural metric d G * * which is defined in the same way as d G * .
The function f in the definition of unimodularity ranges over all Borel functions from G * * to R + ; however, it is sufficient to consider Borel functions f : G * * → R that assign a non-zero value to a doubly rooted network only if the roots are adjacent. This property is known as involution invariance [2].
It is easy to show that the class of local weak limits are unimodular. The question that whether the class of unimodular measures and local weak limits coincide or not, is still an open problem.

Point Process Perspective of a Branching Process
Let Ω = Z + × R denote the type space. A point distribution ω = ((m 1 , x 1 ), a 1 ; (m 2 , x 2 ), a 2 ; . . . ; (m k , x k ), a k ) on type space Ω is a finite set of vertices that consists of a j vertices of type (m j , x j ) for k ∈ Z + \ {0}, and k = 0 corresponds to null point distribution. Let P Ω denote the set of all point distributions. A point distribution ω ∈ P Ω defines a natural set function ω(·) over all subsets of Ω, It is easy to see that there is a one-to-one correspondence between point distributions and set functions satisfying the following conditions: (a) for any A ⊂ Ω, ω(A) is a non-negative integer.
. . are subsets of Ω and ∩ j A j = ∅, then ω(A j ) = 0 for all sufficiently large j.
Abusing notation, we write ω(·) as the set function generated by the point distribution ω ∈ P Ω . We now define a σ-algebra on P Ω . A rational interval is a subset of Ω with elements of the form (m, x) such that q 1 ≤ m < q 1 and q 2 ≤ x < q 2 , where q 1 and q 1 are non-negative integers, q 2 and q 2 are non-negative rational numbers, and q j is allowed to be ∞. A basic set is a finite union of rational intervals or the empty set. Given a collection of basic sets A 1 , A 2 , · · · , A k and a set of non-negative integers r 1 , r 2 , · · · , r k , a cylinder set in P Ω is defined as follows: Let A denote the σ-algebra generated by the cylinder sets. The following theorem defines a probability measure on (P Ω , A ) using a set of probability distributions defined over basic sets. The proof is based on the Kolmogorov extension theorem [12].
Theorem 2.2. Let functions p(A 1 , A 2 , · · · , A k ; r 1 , r 2 , · · · , r k ) be given, defined for any collection of basic sets and non-negative integers, satisfying the following.
For a point distribution ω = ((m 1 , x 1 ), a 1 ; (m 2 , x 2 ), a 2 ; . . . ; (m k , x k ), a k ) ∈ P Ω and a function h : Ω → R, the random integral hdω is defined as k j=1 a j × h(m j , x j ). The term "random" refers to the randomness of ω. Given a probability distribution P on (Ω, P Ω ), the Moment Generating Functional (MGF) of P is defined as follows: where s : Ω → R + is a non-negative function. Similarly, given some conditions on a functional Φ defined over non-negative functions s : Ω → R + , there exists a unique probability measure P on (Ω, P Ω ) with MGF Φ [12]. This correspondence implies the following theorem: Theorem 2.3. Let Φ 1 , Φ 2 , · · · , Φ k be MGF's on (Ω, P Ω ). Then the functional Φ(s) = Φ 1 (s)Φ 2 (s) · · · Φ k (s) defines an MGF on (Ω, P Ω ). Now, we revisit the EWT from point processes perspective. For any collection of basic sets {A 1 , A 2 , · · · , A k } and non-negative integers {r 1 , r 2 , · · · , r k } define p (m,x) (A 1 , A 2 , · · · , A k ; r 1 , r 2 , · · · , r k ) to be the probability that a vertex of type (m, x) has r j children of type A j . Then, the functions p (m,x) determines a unique probability measure P (1) (m,x) on (P Ω , A ) (Theorem 2.2). The probability measure P (1) (m,x) determines, in turn, an MGF Φ (1) (m,x) . Note that p (m,x) , for any fixed set of arguments A i s and r i s , is a Borel-measurable function of (m, x) ∈ Ω where Ω is equipped with the same metric as Ω 1 . Using the Theorem 2.3, for any point distribution ω = ((m 1 , is an MGF and induces a probability measure P ω on (P Ω , A ). The probability measure P (1) ω is the transition probability function of a generalized Markov chain defined by the branching process, where Z l is the point distribution of vertices at depth l (abusing the notation). As in regular Markov chains, the m + n-step transition probability function satisfies the following recurrence relation, Ω which satisfies the following recurrence relation, ).

Spectral Theorem for Compact Self-adjoint Bounded Linear Operators
A linear space X equipped with a norm | · | X is called normed linear space. A complete normed linear space is called Banach space. Every Banach space is a metric space. A metric space (X , d) is called separable if it has a countable dense subset, i.e, a set {x 1 , x 2 , x 3 , · · · } with the property that for all > 0 there exists x n such that d(x n , x) < . A linear space equipped with an inner-product is called an inner-product space. We say S = {e α } α∈I is an orthonormal basis of an inner-product space X , if ∀x ∈ X we have x = α∈I x, e α and e α , e β = 1 if and only if α = β. A Banach space with a norm induced by an inner-product is called Hilbert space. It is easy to prove that a Hilbert space is separable if and only if it has a countable orthonormal basis. Let X and U be normed linear spaces over C with norms | · | X and | · | U , respectively. A map M : X → U is called a bounded linear map if it is linear and there exists b > 0 such that ∀x ∈ X , |T x| U ≤ b|x| X . Let L(X , U) denote the set of all bounded linear maps from X to U and equip this space with the natural norm |M | L = sup x∈X ,|x| X =1 |M x| U . Then (L(X , U), | · | L ) is a normed linear space. It is easy to check that if U is a Banach space then L(X , U) is also a Banach space.
Consider L(X , X ) together with its natural binary map, i.e., if N, M ∈ L(X , X ) then N ·M (x) := N (M (x)) for all x ∈ X . This forms an algebra over C which is called a normed algebra. A complete normed algebra is called Banach algebra. A operator M in a Banach algebra is called invertible if ∃N ∈ L(X , X ) such that N · M = M · N = I, where I ∈ L(X , X ) is the identity map.
Let L(X , X ) be a Banach algebra over C and let M ∈ L(X , X ). The resolvent set of M is given by The following theorem is the Riesz-Schauder Theorem which is a spectral theorem for compact operators.
Theorem 2.5. Let X be a Banach space and let M ∈ L(X , X ) be a compact operator. Then the spectrum of M satisfies the followings: (i) 0 is in the spectrum of M unless the dimension of X is finite.
(ii) All non-zero elements of σ(M ) are in σ p (M ).
(iii) If λ is a non-zero eigenvalue of M , then λ has finite multiplicity, i.e., the dimension of the null space of λI − M is finite.
Let H denote a Hilbert space and let A ∈ L(H, H). The adjoint of A, written as A * , is defined by x, A * y H := Ax, y H for all x, y ∈ H. If A * = A or equivalently Ax, y = x, Ay , ∀x, y ∈ H, we say A is symmetric or self-adjoint. The spectral theorem for compact symmetric operators on a Hilbert space H is given as follows.
Theorem 2.6. Let H be a Hilbert space and let A ∈ L(H, H) be a compact symmetric operator. Then the spectrum of A satisfies the following properties.
(i) The spectrum of A is a subset of R.
(ii) If λ, λ ∈ σ p (A) and λ = λ then the null space of λI − A is orthogonal to the null space of λ I − A.
and moreover, x 0 is an eigenvector of A, i.e., Ax 0 = λx 0 for some λ ∈ R. The corresponding eigenvalue λ is the largest eigenvalue of A in magnitude.
(iv) (Hilbert-Schmidt) There exists an orthonormal basis of H consisting of the eigenvectors of A.
Let H be a Hilbert space. A cone K ⊂ H is a closed convex subset of H such that for all λ ≥ 0, λK ⊂ K and K ∩ (−K) = {0} where (−1)K is denoted as −K. A closed subset S of H is said to be invariant under A ∈ L(H, H) if AS ⊂ S. The following theorem by Toland [18] is a version of the Krein-Rutman Theorem [14] for compact self-adjoint operators.
(i) X (A) > 0 is the largest eigenvalue of A in magnitude and X (A) has an eigenvector in K.
(ii) X (A) > 0 is a simple eigenvalue of A.

Finite Graph Model
Let K n = ([n], E n ) denote a complete graph, i.e., E n = {{i, j} : i, j ∈ [n], i = j}. Consider some probability mass function P (·) defined over N. Let d n = (d 1 (n), d 2 (n), . . . , d n (n)) ∈ N n denote the sequence of potential degrees such that d i (n) ≤ n − 2 and, as n → ∞, its empirical distribution converges to P (·), i.e., Often, we just write d i for d i (n). Let C n : E n → R + denote a random function that assigns iid random variables distributed as exp(1/n) to the edges of K n . The value of an edge corresponds to the cost of the edge. For each vertex i, let T i and P i denote the threshold and the set of potential neighbors of the vertex i, Vertices of the graph have the following self-optimizing behavior: they are willing to form an edge only if the cost of the edge is less than each of their thresholds in (3). Call the resulting random graph the random graph G n = ([n], E n ) with E n = {{i, j} ∈ E n : i ∈ P j and j ∈ P i } .
The bilateral agreement required for establishing an edge causes an interdependent structure; more precisely, inclusion of an edge into E n depends on the preference of both ends, which is in turn dictated by the values of all the incident edges. This makes the analysis of the finite graph intricate; however, it is possible to study the model, using the framework of local weak convergence. Consider the random network N n = ([n], E n , W v,n , W e,n ), where the mark functions are defined as follows: W e,n : E n → R, W e,n ({i, j}) = C n ({i, j}) ∀{i, j} ∈ E n .
Let N (n, d n ) denote the law of the random network N n . Define the random probability measure U (N n ) over G * as follows, where N n ∼ N (n, d n ) and N n,• (i) is the connected component of i in N n rooted at i. Taking expectation with respect to the randomness of the network, for every event A ∈ G * , is a random vertex chosen uniformly from [n].
Then the primary motivation of our work is the claim that the sequence of random networks N n converges locally weakly to the EWT, i.e., EU (N n ) w − → Er(P ). As is suggested by Aldous and Steele in [3], the first step to establish the local weak convergence is to guess the object that the finite graph model converges to. We provide an argument to justify the EWT guess.
Aldous [1] proved that the complete graph K n with iid edge weights distributed as exp(1/n) is locally tree-like, and it converges to the Poisson Weighted Infinite Tree(PWIT). The idea is to modify the structure of PWIT to capture the behavior of the finite graph model while preserving unimodularity of the asymptotic object. In our graph family the root vertex ø is potentially connected to n ø other vertices; hence, the n ø + 1 st edge weight in the PWIT is considered as the threshold of the vertex n ø . On the other hand, any non-root vertex with label i, needs to know the edge weight of its n th i descendant to decide whether to connect to its "parent" or not. Hence, the edge weight of the n th i descendant in the PWIT is taken to be its real-valued threshold mark if i belongs to the connected component of ø. Moreover, a pruning process is added to include the fact that the survival of an edge is based on the marks at both ends. Finally, the labels of the descendants of each vertex are permuted to remove the order. This is an essential step to make the object unimodular.
However, there are quite a few technical issues to resolve to make this intuition work. For example, there is interdependence beyond just pairs. The fact that this interdependence can be ignored as was done in the intuitive reasoning that led to the pruned PWIT needs a rigorous proof. It is worth mentioning that the mark space of PWIT and EWT are different and the local weak convergence viewpoint is not the same for these two objects.
where P dn converges weakly to P (·). Then 2. The random network N n has an interdependence structure; however as n grows, the dependency weakens. The second step is to exploit this weak dependence and to prove that as n goes to infinity, the connected component of the vertex r becomes locally tree-like.
3. As the dependency weakens, the local structure of [N n,• (r)] gets close to the local structure of a rooted tree generated by Er(P ). The third step is to prove that for every finite rooted network T • ∈ G * with depth t, the measure assigned to converges to the measure assigned to A T• by Er(P ).
4. Finally, since G * is a Polish space, the Portmanteau Theorem is applied to show the desired convergence.
The formal proof of the theorem is given in Appendix A.
4 Properties of Erlang Weighted Tree

Unimodularity of EWT
The Theorem 3.1 implies that Er(P, λ) is unimodular; however, unimodularity of Er(P, λ) can be proved directly. The proof provides more insight into the structure of the EWT.
Theorem 4.1. If P ∈ P(N) has positive finite mean and λ ∈ (0, ∞), then Er(P, λ) is a unimodular measure in P(G * ) Proof. Using the involution invariance property, we need to prove for all Borel measurable nonnegative functions f : where the expectation is with respect to Er(P, λ). Let us expand the left-hand side of equation (4) by conditioning on the potential degree of the root vertex. By linearity of the expectation, we have, where the last equality is based on the symmetric and conditionally independent structure of {ζ j } nø j=1 and {(n j , v j )} nø j=1 conditioned on n ø . We now expand the term E(f (G, ø, 1)1 1∼ø |n ø = m) by realizing the values of v ø , ζ 1 , n 1 , and v 1 : where the last equality is obtained by changing the order of the integration and replacing P (k − 1) by P (k). Putting it all together, we have Similarly, In order to complete the proof, the following observation is crucial. Let (G, ø) be a realization of Er(P, λ); conditioned on n ø = m, v ø = x, ζ 1 = y, n 1 = k − 1 and v 1 = z such that min(x, z) > y, the structure and distribution of the doubly rooted graph (G, ø, 1) is the same as the structure and distribution of the doubly rooted graph (G, 1, ø) conditioned on n ø = k, v ø = z, ζ 1 = y, n 1 = m − 1 and v 1 = x. This symmetry property is evident from Fig.1. Based on the above discussion, we have which implies (5) and (6) are equal. This completes the proof.

Degree Distribution
Next, we characterize degree distribution of EWT. The conditional degree distribution of a vertex conditioned on its type and the degree distribution of the root vertex is given as follows.
. Consequently, the degree distribution of the root vertex and its mean are given as follows, Proof. The proof is presented in Appendix B.
It is easy to derive the closed form of the degree distribution of the root vertex. However, the degree distribution of a vertex at depth l is rather complex. To see why, let us focus on the vertices at the first generation, i.e., the neighbors of the root vertex ø. For a unimodular measure ρ with support on rooted trees, the following equality holds, The above relation is obtained by using the following function in the definition of the unimodularity, It is easy to check that the function f k is a Borel function from G * * to R. Let D 1 and D ø denote the degree of a vertex at the first generation and the degree of the root vertex, respectively. Simplifying equation (7), Now if D 1 and D ø are independent, then D 1 has the size-biased distribution corresponding to D ø . This is the case for the unimodular Galton-Watson Tree [5]. However, D 1 and D ø are not independent in our setting. Another interesting observation is that the degree distribution of different generations are not the same since the probability of the events n i = m and v i = x depends on the depth of the vertex i.

Probability of Extinction
The next natural quantity to study is the probability that the component containing the root is finite, i.e., the probability of extinction. This is an important quantity associated with the EWT which should be related to the size of the giant component in the finite graph model, as in the unimodular Galton-Watson tree. Let us start with the definition of the probability of extinction.
Definition 3. Let Z l denote the number of vertices at depth l. The probability of extinction is defined as: Observe that the event {Z i = 0} is a subset of the event {Z j = 0} for every j < i; hence, the continuity of probability measures implies that Using this, we can characterize the probability of extinction.
with the convention 0 0 = 0. The probability of extinction is where the function q(·) is the smallest fixed point of the operator T , i.e., for any other fixed point of T (·) say f (·) ∈ C 1 (R + ; [0, 1]), f (x) ≥ q(x) for all x ∈ R + . Equivalently, the function q(·) is the point of convergence of T l (0)(·) as l goes to infinity, where 0(·) is the null function, i.e., 0(x) ≡ 0 ∀x.
Sketch of the proof. The main idea is to find the probability of the event {Z l = 0} and then, let l to infinity. This can be done through the following steps.
1. Observe that conditioned on the type of the root vertex to be (m, x), there are m potential branches and the probability that the depth of each branch is less than or equal to l − 1 depends only on the value of x.
2. Starting from the first generation, all the vertices have the same behavior, i.e., for any nonroot vertex i, the distribution of n i is given by P . Hence, it is possible to write the probability that the depth of a branch is less than or equal to l − 1 via a recursion.
3. Taking the limit and using monotonicity, the result follows.
Proof. We now fill in the details. The theorem claims that the range of T is C 1 (R + ; [0, 1]) and that there exists a fixed point q(·) such that for any other fixed point f (·) of T , i.e., it is the smallest fixed point of the operator T . The theorem also claims that Let us start with these important properties of the operator T . (iii) For every pair of functions (iv) The function T l (0) converges point-wise to some function q(·) ∈ C[0, 1] as l goes to infinity, which is the smallest fixed point of the operator T .
Proof of Lemma 4.4. The proof is algebraic and does not use the connection between the operator T (·) and the probability of extinction.
(i) As the first step, we want to show the range of For the other side of the inequality, note that f (x) ≤ 1 for all x ∈ R + ; hence, The equality holds if and only if f (x) = 1 for almost every x ∈ R + . To see T (f )(·) is nondecreasing, we show that it has a continuous non-negative derivative. Let x > 0. We then have Observe that the derivative exists and is continuous for all x > 0. Taking the limit as x ↓ 0, we obtain that the limit of the derivative from the right-hand side is finite and non-negative. Moreover, Hence, T (f ) ∈ C 1 (R + ; [0, 1]) is non-decreasing which completes the proof of (i).
(ii) It is easy to see that 1(·) is the largest fixed point of T . Moreover, for any other fixed point of T (·) say f (·) ∈ C 1 (R + ; [0, 1]), from (10) the function T (f )(·) is strictly less than 1; hence, f (x) < 1 ∀x ∈ R + . Using the proof of part (i), it is easy to see that the derivative of T (f ) is strictly positive; hence, the fixed point f (·) is strictly increasing.
(iii) The proof is straightforward.
Let f l (x) = T l (0)(x). Since, for every fixed value of x, the sequence {f l (x)} ∞ l=0 is strictly increasing, it converges. Define q(x) = lim l→∞ f l (x) ∀x ∈ R + . We then have The second equality follows from monotone convergence theorem, which allows interchanging the order of the summation, the integration, and the limit.
To show that q(·) is the smallest fixed point of T , consider any other fixed pint of T ,q = T (q).
holds for all values of l ∈ N and x ∈ R + ; hence, passing to the limit as l → ∞, we get q(x) ≤q(x).
We now get back to the proof of the main theorem. As we mentioned, the main idea is to characterize the probability of the event {Z l = 0} for l > 0. Define Z l,i to be the number of children at depth l in the i th subtree connected to the root.
, the i th edge does not form, or (ii) the i th edge forms but there are no children at its l th level, i.e., Z l,i = 0. Recall that for i ∈ [n ø ], ζ i is the cost of the potential edge {ø, i}. Hence, Conditioning on the type of the vertex 1, the probability distribution of Z l,1 for l > 1 is exactly the same as the probability distribution of Z l−1 conditioned on the corresponding type of the root vertex; in particular, A crucial observation is that P ({Z l = 0}|n ø = m, v ø = x) depends on m only through the exponent.
Define the function f l (·) without the m-power as follows, The function f l (·) does not depend on the value of m and further, Using equation (12) and the definition of the function f l (·), for every l > 0, we have where f 0 (·) should be taken to be 0(·) for consistency with (11) at l = 1. Lemma 4.4 implies that f l (·) = T l (0)(·) converges to q(·), the smallest fixed point of T , point-wise. Hence, Taking expectation with respect to n ø and v ø and using monotone convergence theorem, we have The above theorem suggests that for all f (·) ∈ L(R + ; [0, 1]), the function T l (f )(·) converges point-wise to a fixed point of T , as l goes to infinity; however, it is not clear how many fixed points the operator T has and, if there is more than one fixed point, to which one does T l (f )(·) converge. An immediate corollary is the following. A sufficient condition to check P({extinction}) < 1 is to find a test function f (·) ∈ L(R + ; [0, 1]) such that T (f )(x) ≤ f (x) for all x ∈ R + and f (·) = 1(·). One natural choice is Choosing > 0 to be small enough, we get the following corollary.
The given condition in the Corollary makes sure that The assumption of the corollary is not tight, i.e., there are examples where P(extinction) < 1, but the assumption of the above corollary fails. Two natural follow-up questions are: 1) Is there a general test function f (·) such that P({extinction}) < 1 if and only if f ≥ T (f )? 2) If the answer is yes, what is the closed form of f ?
The idea of using test functions, as simple as it seems, combined with point process perspective turns out to be a powerful tool for analyzing the branching process. We revisit this idea in Section 4.8.

Expected Number of Vertices at Depth l
Let Z l and W l denote the number of vertices and the number of potential vertices, respectively, at depth l. The expected value of Z l and W l are related to the growth rate of the EWT. These are also closely related to the probability of extinction via the following: The proof of (13) is based on a classical property of branching processes that Z n goes to either 0 or ∞. We will revisit this property later on. For now, we state the following.
where, as before,F k (·) is the complementary cdf of the Erlang(k) distribution.
Proof. The proof is presented in Appendix C.
A necessary but not a sufficient condition for P({extinction}) to be non-zero, is stated in the following corollary.
Corollary 4.8. If the expected number of the potential neighbors of the root vertex, i.e., E[n ø ], is smaller than 2, then the population will eventually go extinct. See Section 4.8.
Theorem 4.7 does not provide an easy way to check whether E[Z l ] goes to zero or not. There is no recursive representation for the quantities provided by the theorem either; however, using point process perspective leads to a full characterization of the growth rate and provides a necessary and sufficient condition for the probability of extinction to be less than 1.

Krein-Rutman Eigenvalue and the Corresponding Eigenfunctions
To get the growth rate of EWT more work needs to be done. We follow the discussion of Chapter 3 of Harris [12]. Harris analyzes general branching processes from a point process perspective. Although we use the same idea, our assumptions are different and the results from Harris's book [12] do not apply to our setting and require a generalization.
Abusing notation, let Z l (k − 1, A) denote the number of vertices at depth l of type We show that β −l M l (m, x; Z + , R + ) converges to some fixed function independent of l, for a suitable β. Moreover, we show that β −l m l (m, x; k − 1, z) converges to µ(m, x)ν(k − 1, z). The quantity β is the largest eigenvalue of M 1 , and the functions µ(· , ·) and ν(· , ·) are the unique right and the left eigenfunctions corresponding to the eigenvalue β, respectively.
Definition 4. Let m 1 denote the density of M 1 . If there exists a non-zero function µ(· , ·) and a β ∈ R such that then µ(· , ·) is called the right eigenfunction of M 1 corresponding to the eigenvalue β. Similarly, the left eigenfunction corresponding to the eigenvalue β is defined as follows, The main goal of this section is to prove a generalization of the Perron-Frobenius theorem. We show that a version of Krein-Rutman Theorem by Toland [18] applies to our setting. However, it does not provide an easy way to find the spectral radius. The specific structure of the EW T makes it possible to directly prove the convergence of β −l m l (m, x; k − 1, z) to µ(m, x)ν(k − 1, z) and to show that β −l M l (m, x; R + , Z + ) converges to some function that only depends on x and m. Before presenting the main theorems and their proofs, let us simplify the operator of interest, where f k (·) is the probability density function of Erlang(k). Hence, Let β be an arbitrary eigenvalue of M 1 . The right eigenfunction of β then satisfies the following equation by (14): Dividing both sides by m, the right-hand side is independent of m (note that µ(0, x) = 0); hence, µ(m, x) is linear in m and we can write where µ(·) is a solution to the following equation and Note that if µ(·) satisfies the above relation, then a right eigenfunction of M 1 corresponding to the eigenvalue β is given by µ(x, m) := m µ(x)/x. Similarly, for the left eigenfunction, we have Note that the dependence of ν(k − 1, z) in k is through the term P (k) e −z z k−1 (k−1)! . Hence, we can write for a suitable ν(·) that is a solution to the following equation, Observe that ν(·) satisfies the same equation as µ(·) does. To study this equation, we define a new operator and rely on the background materials discussed in Section 2.3. Since H 1 is a compact self-adjoint operator, classical results from operator theory says H 1 indeed Let H = L 2 (R + , υ) denote the set of real-valued square integrable functions with respect to a measure υ. It is easy to prove that L 2 (R + , υ) together with the inner product f, g , where υ(·) is a finite measure with Radon-Nikodym derivative of g 2 (·) with respect to Lebesgue measure. The integral operator H 1 is self-adjoint since its integrand is symmetric. Moreover, H 1 is compact since H is separable (the proof follows by the fact that H has a countable orthonormal basis). Putting these together, we see that H 1 is a compact self-adjoint operator.
Let K denote the set of all non-negative functions in H. The set K is closed and convex. Moreover, λK ⊂ K and K ∩ (−K) = {0}; hence, K is a cone. Actually, it is a total cone, i.e., H = K − K. The following theorem is a direct implication of Theorem 2.5-2.7.
Theorem 4.9. The largest eigenvalue of H 1 in magnitude is, X (H 1 ) > 0 is a simple eigenvalue and corresponds to a non-negative eigenfunction. Moreover, all the eigenvalues of H 1 are real and if ζ(·) is an eigenfunction of H 1 with some eigenvalue µ = X (H 1 ), we have ∞ 0 f (y)ζ(y)dν(y) = 0.
The following simplification will help in finding the Perron-Frobenius eigenvalue of H 1 and the corresponding eigenfunction.
Changing the order of integration, the operator H 1 can be written as follows, Define the operator H 1 as follows, Using (20), we have where I(·) is the identity function, i.e., I(x) = x for all x ∈ R + . The Perron-Frobenius eigenvalue of H 1 and the corresponding eigenfunction are related to the operator H 1 .
Theorem 4.10. Consider the function L(β, x) for x ∈ R + and β ∈ C defined as follows, Assuming the moment generating function of n ø exists for some θ > 0, the function L(β, x) satisfies the following properties, (i) The function L(β, x) is well-defined for all β ∈ C and x ∈ R + , i.e., the series converges in the absolute sense.
(ii) The second partial derivative of L(β, x) with respect to x, satisfies the following equality, (iii) For every fixed x ∈ R + , all the zeros of the function L(β, x) are real-valued.
Proof. In the course of the proof, it will become apparent that L(β, x) and the Bessel function of the first kind of zeroth order share similar properties.
(i) Using the Chernoff bound, We then have Let C = E[e θnø ]/e 2θ and Υ = 1 − e −θ . It is easy to prove that G i (x) is upper bounded by which proves the upper bound by induction. Put it all together, where J 0 (·) is the Bessel function of the first kind of order 0 and I 0 (·) is the modified Bessel function of the first kind of order 0. This establishes that the series converges absolutely.
(ii) Using the definition of L(β, x) and part (i), (iii) Fix some x ∈ R + . Consider the function H 1 (β, x) defined as follows, whereβ is the complex conjugate of β. The partial derivative of H 1 (β, x) with respect to x, using the part (ii), is given as follows, where the last equality is obtained by the fact that L(β, x) = L(β, x). Note that, from which we conclude that β =β, i.e., β ∈ R.
(iv) Pick any real-valued β ≥ E[n ø ] − 1. For all i ≥ 1, we have, and for i = 0, For each i ∈ N, the function G i (x) is decreasing; hence, the function 1−β −1 G 1 (x) is increasing and it achieves its minimum at x = 0, so By induction, Hence, for every real value β ≥ E[n ø ] − 1, by rewriting L(β, x), we get Moreover, if for a fixed real value β > 0 and for all x ∈ R + the function L(β, x) is non-negative, then the function L(β, x) is strictly increasing: Next, we prove that for some β ∈ R + and x ∈ R + , the function L(β, x) is negative. Let us rewrite the function L(β, x), where the last equality is based on the recursive relation between G i (x) and G i−1 (x). Using the above equality we have, where the third equality follows by changing the order of integration.
Suppose that for all β ∈ R + and all x ∈ R + , the function L(β, x) is non-negative. Hence, for any fixed β ∈ R + , the function L(β, x) is strictly increasing and, However, the left-hand side of the above equation is negative for all β ∈ R + but the right-hand side, for small enough β, is positive which is a contradiction. The above argument shows that if there exist somex > 0 such that β ≤xυ([x, ∞]), then the function L(β, ·) hits negative values. Moreover, for every β ≥ E[n ø ] − 1 the function L(β, x) is strictly positive. Combining these together and considering the fact that L(β, x) is a continuous function of x ∈ R + and β ∈ R + , we conclude that there exists a largest β 0 > 0 such that the function L(β 0 , x) is non-negative, and L(β 0 , x 0 ) = 0 for some x 0 ∈ R + . The already established strictly increasing property of L(β 0 , x) implies that x 0 = 0, and the proof is complete.
(v) Note that L(β 0 , x) > 0 for all x > 0. Moreover, using the L'Hospital's rule, which is well-defined since L (β 0 , 0) is strictly positive. Next, let us take the derivative of Note that L(β 0 , 0) = 0 and L(β 0 , x) is a strictly concave function due to part (ii) and (iv); hence, Hence the expression (24) is strictly positive for every x > 0, and we have established that the function The following immediate corollary guarantees the existence of an eigenfunction f (·) and an eigenvalue β of the operator H 1 .
Using the Corollary 4.11 and the equations (17) and (19), a left and a right eigenfunction of M 1 for the eigenvalue β 0 are obtained.
Observe that, from (16), m l (m, x; k − 1, z) satisfies the following recursive equation: The terms related to the values of k and m can be factored out. However, to avoid dividing by zero, we consider the function h l (· , ·) defined recursively as follows: It is easy to see that the function m l is related to the function h l via the following equation; indeed, the relation holds between m 1 and h 1 , which is just (16), and for a general l the proof can be done via induction: Recall that the kernel of the operator H 1 is symmetric, hence, any right eigenfunction is also a left eigenfunction. Moreover, Corollary 4.11 implies that f 0 (·) is an eigenfunction of H 1 with eigenvalue β 0 , i.e., min(x, y)f 0 (y)dυ(y).
Hence, the question of whether or not β 0 is the Krein-Rutman eigenvalue of M 1 with right eigenfunctions µ(· , ·) and left eigenfunction ν(· , ·), boils down to the same question for H 1 with right and left eigenfunctions f 0 .
To show that β 0 is the Perron-Frobenius eigenvalue of H 1 , we define a continuous state Markov chain and prove the uniform geometric ergodicity for the chain. Consider a continuous state Markov chain, with the following transition probability kernel: where the transition probability at x = 0 is defined by taking the limit of p(x, ·) as x goes to 0, i.e., By Theorem 4.10 part (iv), the term f 0 (0) is strictly positive; hence, the function p(· , ·) is welldefined. Moreover, the function p(· , ·) is indeed a valid transition probability kernel since where ( * ) follows from (22). By induction, it is easy to observe from (26) that the l step transition probability kernel is related to the function h l (· , ·) via the following equation, The stationary density of the Markov chain can now be verified to be π(y) = C N g 2 (y)(f 0 (y)) 2 , where C N is the normalization factor. Indeed, from (28) and (29), we have Observe that the stationary distribution equals the product of the left and the right eigenfunctions of H 1 upto a normalization factor. Note that g 2 (·) is the Radon-Nikodym derivative. Moreover, the Markov chain is reversible with respect to the stationary distribution π(·), i.e., π(x)p(x, y) = π(y)p(y, x). It is natural to expect p (l) (x, y) converges point-wise to π(y) as l goes to infinity. To prove this, we invoke the following result by Baxendale [4],  P (x, A), x ∈ S, A ∈ B denote the transition probability and by abusing notation let P denote the corresponding operator on measurable functions S → R. Assume that the following assumptions hold: (A1) Minorization condition: There exists C ∈ B, β > 0 and a probability measure ν on (S, B) such that, for all x ∈ C and A ∈ B.
Then {X n : n > 0} has a unique stationary probability measure π, say, and V dπ < ∞. Moreover, there exists ρ < 1 depending only (and explicitly) on β, β, λ and K such that whenever ρ < γ < 1 there exists M < ∞ depending only (and explicitly) on γ, β, β, λ and K such that, for all x ∈ S and n ≥ 0, where the supremum is taken over all possible measurable functions g : S → R satisfying |g(x)| ≤ V (x) for all x ∈ S. In particular, P n g(x) and gdπ are both well-defined whenever Baxendale [4] provides explicit values for ρ and M and improves the constants if the corresponding Markov chain is reversible, which holds in our case. In the following lemma, we prove that the Markov chain with transition probability p(x, y) from (29) satisfies the assumptions (A1) − (A3).
Lemma 4.13. Assume the moment generating function of n ø exists for some θ > 0 and is finite. Then, the Markov chain defined by the transition probability kernel p(x, y) on state space (R + , B) satisfies the assumptions (A1) − (A3) of Theorem (4.12) where the set C, the constants β, λ, K, β, the function V : R → [1, ∞) and the probability measure ν(x) are given as follows: λ := 1 2 where the constants η and c, and the function W (y) are defined as follows, Proof. First, we prove that the assumption (A2) holds and derive the constants c, λ and K as well as the function V (·). Next, we show that the assumption (A1) holds and derive the probability measure ν and the constant β > 0. Finally, we illustrate that the assumption (A3) holds and derive the constant β.
Assumption (A2): Define the operator P by its action on a measurable function as follows: Assuming the moment generating function of n ø exists for some θ > 0 and using the inequality (21), we have, where the constant η > 0 is small enough such that 1 − e −θ − η > 0. Part (v) of Theorem 4.10 states that the function x f 0 (x) is strictly increasing. Hence, V (·) is a strictly increasing function and its range [1, ∞). Substituting the function V (·) into (33), we get Consider the constants c, K and λ as in the statement of the Theorem. For every x ≤ c, the right-hand side of the equation (34) is bounded by K. Moreover, for every x > c, the following inequality holds Hence, the assumption (A2) is satisfied.
Assumption (A1): Recall that P (x, A) is defined as follows, Consider the function W (y) = min x∈[0,c] p(x, y). Using the fact that x f 0 (x) and f 0 (x) are increasing functions, the function W (·) is given as in (32). Note that W (·) is integrable since it is upper bounded by the integrable function β 0 f 0 (0) −1 g 2 (y)f 0 (y). Define the probability measure ν as follows, whereβ is the normalization factor. The inequality P (x, A) ≥ βν(A) for all x ∈ [0, c] holds because of the following inequalities: From here, the assumption (A1) immediately follows.
Assumption (A3): Using the definition of the probability measure ν, we have, Remark 3. The function V (·) in (31) provides us with more freedom, i.e., it is possible to choose a function g(·) that goes to infinity.
Lemma 4.13 implies that the Theorem 4.12 holds for the continuous state Markov chain with transition probability p(x, y). The first implication is that the stationary distribution π(x) = C N g 2 (x)f 0 (x) 2 is unique. Moreover, there exists M < ∞ and 0 < γ < 1 such that for all the measurable functions g : R + → R with the property that |g(x)| ≤ V (x) for all x ∈ R + , satisfy Since V (0) = 1, and V (x) is increasing as can be gleaned from (31) and Theorem 4.10 part (v), geometric ergodicity follows by restricting the function g(·) to satisfy |g(x)| ≤ 1, for all x ≥ 0, However, it is possible to prove uniform ergodicity by an appropriate choice of function V (·). Lemma 4.14. Let V (x) = 1 + a × 1 x>x 0 . Let where the constant a is defined as follows, and the constant x 0 is large enough such that f 0 (x 0 ) ≥ 0.5 and the following inequality is satisfied for all x > x 0 : Then, for a suitable M > 0 and γ < 1, we have Proof. Again, we apply Baxendale's Theorem (Theorem 4.12), but this time the V (·) is bounded. The only assumption affected by choice of the function V (·) is the assumption (A2). Recall that the transition probability is given by, Hence, the operator P performs on the measurable function V (·) as, Recall that the function f 0 (x) is an increasing function, f 0 (0) = 0 and lim x→∞ f 0 (x) = 1, see for e.g., (23). Consider the function V (x) = 1 + a × 1 {x>x 0 } , where the constant a and x 0 are chosen to be large enough. Plugging in the function V (·), we get Assume x > x 0 . Using the inequality (21), we have, The last inequality follows from evaluating the integrals and removing the negative terms. Let x 0 to be large enough such that f 0 (x 0 ) ≥ 0.5. The constants a and x 0 are chosen such that all the following inequalities are satisfied: The second of these, upper bounds the last two terms in (35), and we have P V ( Given the above choice of constants a and x 0 , for λ = 3 4 and K = a + 1, taking C = {x : x ≤ x 0 }, the assumption (A2) is satisfied; i.e., An application of Baxendale's Theorem then completes the proof.
An immediate consequence of uniform ergodicity is the following. For any x, y ∈ R + and l > 1, we have Proof. The idea is same as in Doob [7, pages 216-217]. Note that π(·) is the unique stationary distribution. Hence, for any To get rid of the constant factor C N , from now on, we assume the function f 0 is normalized by a constant factor such that, ∞ 0 g 2 (y)f 0 (y) 2 dy = 1.
The inequality (36) implies that for every x ∈ R + and y > 0, Harris [10] assumes the density of the M 1 is uniformly positive and bounded, and deduces that the corresponding eigenfunction is uniformly positive as well. However, in our setting f 0 (0) = 0 and g(y) → 0 as y → ∞. As a result, the error term for h l (x, y)/β l 0 explodes as y goes to 0 or ∞. On the other hand, induction implies h l (x, 0) = h l (0, y) = 0. Hence, we should expect a uniform bound. The idea is to use the function V (·) in (31) and apply (26).
Proof. Fix z ∈ R + and define the function g(·) as follows, The function g(·) is a well-defined continuous function by Theorem 4.10 part (v). Moreover, |g(x)| ≤ V (x) for all x ∈ R + where V (·) is given by (31). Now using Lemma 4.13 and Theorem 4.12 (Baxendale's Theorem), we have Using (26) and (28), we get .

Now using (26) again, we have
Applying the inequality (21), we get Now the result follows by the fact that min(x, z) ≤ x, and the fact that η < 1 − e −θ . Note that Combining (27) and (38), we get a similar bound for m l (m, x; k − 1, z): for every x ∈ R + and z > 0, Note that the error term is uniformly bounded for all x, z ∈ R + and k ∈ N (naturally, it is not uniform in m). Next we prove that β 0 is the Krein-Rutman eigenvalue of H 1 with the eigenfunction f 0 (x). Proof. Assume there exists a real-valued function ζ(·) and β such that, Clearly, ζ(x) satisfies the following inequality, Moreover, ζ(0) = 0 since h 1 (0, z) = 0; hence the function g(x) = ζ(x)/f 0 (x) for all x ∈ R + is well-defined. Letting V (x) = f 0 (0)e ηx x f 0 (x) × max( Const f 0 (0) , 1) in Lemma 4.13, we have |g(x)| ≤ V (x) for all x ∈ R + . Using Baxendale's Theorem (Theorem 4.12), Hence, As l goes to infinity, the right-hand side of the above inequality goes to zero. If |β | > β 0 , then the left-hand side explodes. If |β | = β 0 , then the left-hand side does not go to zero for all x. Hence, |β | < β 0 and ζ(·) and f 0 (·) are perpendicular to each other, i.e., The above equality also proves that f 0 (·) is the only non-negative eigenfunction.
We now summarize the key conclusions in the following theorem. These eigenfunctions are the unique non-negative right and left eigenfunctions, respectively. Moreover, there exists 0 < γ < 1 and a constant M > 0 independent of x, m, z and k such that for all x ∈ R + , y > 0, k ≥ 1 and m ≥ 0, Finally, m l (m, x; k − 1, z) is related to the function h l (x, y) via the following equation, and for all functions g : where V (x) = f 0 (0) exp(ηx) x f 0 (x) and η = (1 − e −θ )/2. The constants M and 0 < γ < 1 are independent of x and l.
Using the above theorem, we get similar bounds for M l .
Corollary 4.19. The growth rate of M l (m, x; R + , Z + ) equals β 0 which is given by Theorem 4.18, i.e., where the constant 0 < γ < 1 is independent of x, m and l.
Proof. By Theorem 4.18, we have Recall that Z l denotes the number of vertices at generation l. As an immediate Corollary, the growth/extinction rate of E[Z l ] is β 0 as well.
Corollary 4.20. We have If β 0 > 1, the expected number of vertices at generation l explodes as l goes to infinity. If β 0 = 1, the expected number of vertices at generation l stays bounded. If β 0 < 1, the expected number of vertices at generation l goes to zero.
A follow-up question is the limit of the random variable Z l /β l 0 : 1) If β 0 < 1, it is clear that Z l → 0 almost surely as l → ∞ since the population will become extinct; however, conditioned on Z l > 0, the distribution of the total number of vertices might be of interest. We leave this problem for future work. 2) If β 0 > 1, one way to study the limit is to analyze the second moment. This methodology was introduced by Harris in [9] and was generalized to finite type branching processes in [10]. In [11], Harris pointed out that a similar generalization is possible for general branching processes and discussed this further in [12,Chapter 3]. We follow his argument closely in Section 4.6.
3) The case β = 1 is tricky and is discussed in section 4.7. We will prove that Z l → 0 almost surely as l → ∞, however, similar question as in the first case is left for future work.

Analysis of the Second Moments and Asymptotic Results for β 0 > 1
Let Z l (A) denote the number of vertices at depth l of type (k − 1, ζ) ∈ A, A ⊂ Ω. For A 1 , A 2 ⊂ Ω, define, The conditionally independent structure of EWT implies To get the above equality, note that where ω = ((m 1 , x 1 ), a 1 ; (m 2 , x 2 ), a 2 ; · · · ; (m k , x k ), a k ). Now, we have where the first term corresponds to different descendants of the root.
For any fixed (m, x), we can interpret M l (m, x; A 1 ; A 2 ) as the measure of the "rectangular" A 1 × A 2 , i.e., the measure of points (k 1 − 1, ζ 1 ; k 2 − 2, ζ 2 ) such that (k 1 − 1, ζ 1 ) ∈ A 1 , (k 2 − 1, ζ 2 )A 2 , and (k i − 1, ζ i ) ∈ Z l , where Z l (abusing notation) is the point distribution of vertices in generation l. To make the matters rigorous, we need to define bivariate measures and random double integrals. F (A, B), where A and B are subsets of Ω, is called a bivariate measure if it satisfies the following conditions:

Definition 5. A function
(a) it is finite and non-negative; F is called a signed bivariate measure if F = F 1 − F 2 , where F 1 and F 2 are bivariate measures.

By definition, M
l (m, x; A 1 ; A 2 ) and M 1 (m, x; A 1 )M 2 (m, x; A 2 ) are bivariate measures, and v(m, x; A 1 ; A 2 ) is a signed bivariate measure. Define a map T from the set of signed bivariate measures to itself as follows, To derive a recurrence relation between M Conditioned on Z l = ω ∈ P, the expected value of Z l+1 (A 1 )Z l+1 (A 2 ) is given by the following random integrals, where Z 1 is an iid copy of the point distribution Z 1 and E m i ,x i is the expected value conditioned on the type of the root to be (m i , x i ). Now, taking expectation of the above random integrals with respect to the point distribution Z l , we derive the following recurrence relation, Repetitive use of (42) and then applying (41), we get the following relation where T 0 is the identity map. Finally, observe that, Now, we can use the analysis of the previous section to approximate M where the constants M > 0 and 0 < γ < 1 are independent of x, l, A 1 , and A 2 . The function U (m, x) is defined as follows, Proof. As we pointed out in the proof of Corollary 4.19, using Theorem 4.18 for any A ∈ Σ we have, Now the result follows by plugging in the above equality in (44), , and the fact that The constant C F depends on the choice of the function F . It is easy to check that for v(m, x; A 1 , A 2 ), we can replace C F M with m 2 M for some M > 0 independent of x, l, A 1 , and A 2 (note that min(x, z)/x ≤ 1).

Remark 5. Fix the value ofl > 0 and consider
Using the same argument as above, the conditional expectation converges to the same value as in (45) where the function U (x, m) is given by (46). Furthermore, if A and B are subsets of Ω such that An immediate corollary of the above theorem and Corollary 4.20 is the following, which connects the growth rate and the probability of extinction.
Corollary 4.23. If β 0 > 1 then the probability of extinction is less than 1. If β 0 < 1 then the probability of extinction equals 1.
Proof. By Theorem 4.22 if β 0 > 1, then W (A) is positive with non-zero probability. Hence, the probability of extinction is less than 1. The second part follows by Markov inequality and Corollary 4.20.
However, to analyze the case β 0 = 1 and to show that Z l ∼ β 0 l W we need to show transience of Z n , i.e., Z n either goes to zero or infinity.

Transience of Z l
Consider the generalized Markov Chain introduced in Section 2.2. The following lemma establishes the transience of Z l . Recall that Z l (A) is the number of vertices (k − 1, ζ) ∈ A, and Z l (Ω) is the total number of vertices at generation l. We follow the notation introduced in Section 2.2. Proof. Define P Ω 0 to be the set of non-null point distributions with at most k vertices, Let P Ω 0 ,m to be the set of point distributions ω = ((m 1 , x 1 ), a 1 ; (m 2 , x 2 ), a 2 ; . . . ; (m k , x k ), a k ) ∈ P Ω 0 such that m i ≤ m for all i. Recall that ∅ denote the null point distribution.
Step 1: Using the same argument as in [12, Theorem 11.2, page 69], we show that P(Z l ∈ P Ω 0 ,m ) = 0. Define R m (ω) for ω ∈ P Ω as follows, For P ⊂ P Ω 0 ,m let Q m,2 (ω, P) to be the conditional probability that, conditioned on Z 0 = ω, at least one of the point distributions Z 2 , Z 3 , · · · are in P Ω 0 ,m and if Z l is the first one, then Z l ∈ P. Then In the proof of Theorem 4.3, we show that, if Z 0 = (m i , x i ), the probability of extinction after 2 generations is given by is a decreasing and strictly positive function. Hence, where ω = ((x 1 , m 1 ), a 1 ; (x 2 , m 2 ), a 2 ; · · · ; (x k , m k ), a k ) and > 0 is a constant which depends only on m and k. Contradiction follows by taking supremum from both sides of equation (47).
Step 2 : For the sake of notational simplicity, we prove the result for k = 1 and then discuss the general case. Fix the value of m. Note that by the first step, the probability of hitting P Ω 0 ,m infinitely often is zero. Hence, we need to show that the probability of hitting P Ω 0 ,m := P Ω 0 \P Ω 0 ,m infinitely often is zero to complete the proof.
Assume k = 1 and let κ ∈ N. Define Q m,κ (ω, P) and R m (ω) similar to Q m,2 (ω, P) and R m (ω) by considering the set P Ω 0 ,m instead of P Ω 0 ,m . Assume Z 0 = ω = (m 1 , x 1 ), where m 1 ≥ m. Note that the first time Z l ∈ P Ω 0 ,m for some l > 0, m 1 − 1 out of m 1 branches of Z 0 go extinct. Hence, where q(·) is the smallest fixed point of the operator T defined in Theorem 4.3, and ω = (1, x 1 ) is a point distribution with only one point of type (1, x 1 ). Next, using the same argument as in step 1, we have Now, if we take κ to infinity, we have Combining (48) and (49), and taking supremum with respect to ω, the result follows by assuming m is large enough. Now consider the case k = 2 and pick ω = ((m 1 , x 1 ), a 1 ; (m 2 , x 2 ), a 2 ), where a 1 +a 2 ≤ 2. Assume R m (ω) > 0. The point distribution ω has m 1 a 1 + m 2 a 2 branches. If two of these branches survive, do not go extinct at all, then we have for some i P({Z l (Ω) = 1, infinitely often} | Z 0 = (1, x i )) > 0, which is a contradiction. Hence, only one of these branches can survive. Following the similar logic as before, we have where ω 1 = (1, x 2 ) and ω 2 = (1, x 2 ). The result follows by same argument using (49). For general k, using strong induction, we get similar relation as (50).
The above lemma together with Corollary 4.23 and Corollary 4.20 have an important implication which completes the connection between probability of extinction and the growth rate.
Corollary 4.25. If β 0 > 1 then the probability of extinction is less than 1. If β 0 ≤ 1 then the probability of extinction equals 1.
To show that growth rate of Z n is β 0 when β 0 > 1, i.e., Z n ∼ β 0 n W , we need to show that P(W = 0 | Z n → ∞) = 0. As Harris points out in [12, Remark 1, page 28], if there is a positive probability that Z n → ∞ at a rate less than β 0 , then P(W = 0 | Z n → ∞) > 0. To rule out such a scenario, we need to show that P( is given by Theorem 4.3. In fact, it is easy to see that P( is a fixed point of the operator T . However, to complete the argument we need to show that T (·) does not have any fixed point other than q(·) and 1(·).

Probability of Extinction Revisited
Using point process perspective, we can rewrite the operator T as follows: where ω 0 = (1, x) is the type of the root vertex and P (1) ω 0 is the one step transition probability defined in Section 2.2. Inductively, we have The above equality combined with an appropriate test function becomes a powerful tool to study properties of the operator T and the branching process in general. Recall that q(·) is the smallest fixed point of the operator T (Theorem 4.3).
Lemma 4.26. If β > 1, then the operator T has two fixed points, one of which is : q(·) and 1(·). Moreover, for any function f ∈ L(R + ; [0, 1]) such that the Lebesgue measure of the set Proof. Consider the function f x 0 , (·) defined as follows The goal is to show that for every large enough x 0 , there is an > 0 such that for all x ∈ R + , where q(·) is the smallest fixed point of the operator T . One important implication of this inequality is: Note that for x > 0, By choosing x 0 to be large enough, we can make the second term in the parenthesis to be arbitrary small. Fixing x 0 , we can choose > 0 to be small enough such that q(x) − T (f x 0 , )(x) > 0 for all x ∈ R + . Note that q(0) > 0 and q(x) is a strictly increasing function. Now that we have proved (52), we use the alternative representation of T l (f x 0 , ) as in (51) to prove the lemma. Define the set P M as However, by (52), the left hand-side of the above inequality goes to 0 as l goes to infinity. Hence, For sake of contradiction, assume that T has another fixed point q(·). By Lemma 4.4, we already know that q(x) < q(x) < 1 for all x ∈ R + . Note that, As l goes to infinity, the first term converges to q(x). Using the analysis of f x 0 , , the second term goes to 0 since Finally, we can bound the third term as follows, is non-decreasing (Lemma 4.4). Combining these inequalities, we have, The result follows by letting M to infinity.
Finally, if f ∈ L(R + ; [0, 1]) such that the Lebesgue measure of the set {x ∈ R + : f (x) < 1} is positive, then by same analysis and the fact that T (f )(x) < 1 for all x ∈ R + , we have As we pointed out in Section 4.6, one implication of the above lemma is Z n ∼ β 0 n W .
Proof. Let f (x) = P(W = 0|Z 0 = (1, x)). Note that, and Hence, f (x) is a fixed point of the operator T . On the other hand, by Theorem 4.22, P(W = 0|Z 0 = (1, x)) < 1 for all x ∈ R + . Hence, by Lemma 4.26, f (·) = q(·). Now, the result follows by low of total probability. The second part is just a corollary of the first part and Theorem 4.22.

Numerical Simulation
In this section, we present some numerical results for the case when P , the distribution of n ø , is the geometric distribution. Specifically, we provide a closed form for the Krein-Rutman eigenvalue and the corresponding eigenfunction given by Theorem 4.10, in terms of the zeroth-order Bessel function of first kind J 0 , We then numerically compare the structural properties of EWT with unimodular Galton-Watson Trees [5]. Assume P is the geometric distribution with parameter p, i.e., P (k) = (1 − p) k−1 p for all k ∈ N. Recall the definition of g 2 and G i as in Theorem 4.10. We have Using the above equality together with a simple induction, we get Plugging in the above equality into the definition of L(β, x), we have Let r 0 ≈ 2.4048 denote the smallest zero of J 0 . Recall that β 0 is the smallest root of L(·, 0), and the eigenfunction f 0 is given by L(β 0 , ·). Then, by simple algebra The simple form of the geometric distribution makes it easier to study the associated Erlang Weighted Tree. Next, we numerically compare the degree distribution of EWT with unimodular Galton-Watson Trees (GWT * ). A GWT * with degree distribution Q ∈ P(N ) is a rooted tree, rooted at ø, such that the number of descendants of the root is distributed as Q, and for all the other vertices, the offspring distribution is given by the size-biased distribution Q * : In Figure 2, we compare the degree distribution of the zeroth and the first generation of EWT with GWT * . We consider a GWT * that has a Poisson degree distribution with parameter λ , and a GWT * that has a geometric degree distribution with parameter p . Both p and λ are chosen so that the expected degree of the root vertex is the same as in EWT. We also consider the size-biased distribution of the root vertex of EWT, using (53) and Theorem 4.2. In this figure,  Figure 2: The degree distribution of the root vertex (zeroth generation) and the first generation of Erlang Weighted Tree (with potential degree distribution geo(p)), unimodular Galton-Watson Trees (with degree distribution P oiss(λ ) and geo(p )), and the size-biased degree distribution of the root of EWT. p = 0.08 and the parameters p and λ are chosen so that the expected degree of the root vertex is the same as in EWT.
the potential degree distribution of EWT is the geometric distribution with parameter 0.08. The degree distribution of EWT has different behavior compared with GWT * . Most notably, the degree distribution of the first generation is not the size-biased distribution of the root vertex, as we also mentioned in section 4.2.
Next, we compare the degree distribution of different generations of EWT. In Figure 3, we illustrate the degree distribution of the root, the first generation and the second generation of EWT with potential degree distribution geo(0.08). Given the interdependent structure of EWT, the digree distribution of different generations are not the same. Note that the size-biased distribution of the root node is close to the degree distribution of the second generation. Intuitively speaking, this means that the dependency between the degree distribution of generation l and the root node fades away as l → ∞. Conjecturely, the degree distribution of the lth generation converges to the size-biased degree distribution of the root vertex. This also suggest that the growth/extinction rate of EWT should be close to the growth/extinction rate of GWT * with probability distribution given by the degree distribution of the root vertex in EWT.
In Figure 4 we compare growth/extinction rate of the EWT with GWT * . We consider a GWT * that has a Poisson degree distribution with parameter λ , a GWT * that has a geometric degree distribution with parameter p , and a GWT * with degree distribution given by the degree distribution of the root vertex of EWT. As we mentioned, the growth/extinction rate of EWT is close to the growth/extinction rate of GWT * , however, they are not the same.
Finally, in Figure 5 we compare the probability of extinction of EWT with GWT * . We consider the same set of unimodular Galton-Watson Trees as before. We also compare the ratio of vertices in the giant component of the finite graph model (with potential degree distribution geo(p)), with the random graphs generated by the configuration model (using the same degree distribution as in the associated GWT * ) and the Erdös-Renyi random graph (with parameter λ /n, where n is the number of vertices), in Figure 6. The configuration model generates a random graph by uniformly pairing the half-edges assigned to vertices of the graph, where the number of half-edges assigned to a vertex  Figure 4: The growth rate of Erlang Weighted Tree (with potential degree distribution geo(p)) and unimodular Galton-Watson Trees (with degree distribution P oiss(λ ), geo(p ), and the degree distribution of the root vertex in EWT). p = 0.08 and the parameters p and λ are chosen so that the expected degree of the root vertex is the same as in EWT.
is given by a fixed degree distribution. The Erdös-Renyi random graph with parameter λ /n is given by connecting pairs of nodes to each other with probability λ /n. For the configuration model and the Erdös-Renyi random graph model, this ratio equals 1 − P({extiction}), where P({extiction})is the probability of extinction of the associated GWT * [5]. Figures 5 and 6 suggests that this is also true for EW T .

Open Problems
We close our paper with some open problems: 1. Conditioned on Z l > 0, for β 0 ≤ 1, what is the asymptotic distribution of Z l as l grows without bound? This problem has been studied for general multi-type branching processes, e.g., [8,13,17].  Figure 5: The probability of extinction of Erlang Weighted Tree (with potential degree distribution geo(p)) and unimodular Galton-Watson Trees (with degree distribution P oiss(λ ), geo(p ), and the degree distribution of the root vertex in EWT). The parameters p and λ are chosen so that the expected degree of the root vertex is the same as in EWT.
2. What is the connection between the reversibility of the continuous state Markov process and the unimodularity of the branching process? Exploring this connection can provide a general framework to study an important class of branching processes.
3. What is the connection between the probability of extinction and the ratio of the giant component in the finite graph model? For other random graph models (e.g. configuration model, Erdos Renyi random graph, etc.) the ratio of the giant component converges to 1−P[{Extinction}], where P[{Extinction}] is the probability the associated branching process goes extinct eventually. We have observed the same relation via numerical simulation in Figures 5 and 6 between the finite graph model and the EWT. 4. What is the local weak limit if vertices in the finite graph model iterate to use all their budget, given by their potential degree? Naturally, one can imagine a scenario in which after the realization of G n , all vertices i with degree less than d i (n) have a second chance to find more neighbors by announcing an updated set of potential neighbors. Of particular interest is the case when vertices can iterate as many times as possible until they achieve d i (n) or have checked all other n − 1 nodes. 5. How general is the methodology we developed in this thesis for finding the Krein-Rutman eigenvalue and the corresponding eigenfunction? 6. Finally, what is the connectivity threshold of the random graph model, when the potential degree of all the vertices are same, and is equal to k(n) = c · log(n)? In [?, Conjecture 1], it is conjectured that the phase transition happens at c = 1, i.e., the probability of the event {the graph is connected} goes to one if c > 1 and goes to zero if c < 1.  Figure 6: The ratio of the giant component of the finite graph model (with potential degree distribution geo(p)), random graphs generated by the configuration model (with degree distributions geo(p ), and the degree distribution of the root vertex in EWT), and the Erdös-Renyi random graph (with parameter λ /n). The parameters p and λ are chosen so that the degree distribution are the same.
, pick a permutation σ ∈ S l−1 uniformly at random and set Y i = X (σ(i)) for all 1 Then we have, are identically distributed and conditioned on X (l) , they are independent. Same holds for to be independent exponentially distributed random variables with parameter 1/n. Consider the random variables {Y i } i<l and {Z i } i>l as are defined in Corollary A.2. Then, the conditional distribution of these random variables are given as follows, Most notably, the conditional distribution of Y i for i < l conditioned on X (l) = x l converges to uniform distribution over [0, x l ], as n goes to infinity. Moreover, the distribution of X (i) converges to Erlang(i).
As we mentioned, EU (N n ) is the law of [N n,• (r)] for a uniformly chosen r ∈ [n]. The idea is to first define an exploration process over K n that realizes the connected component of the vertex r in N n . Then, we show that the connected component is locally tree-like and the distribution of the connected component up to any finite time step of the exploration process converges to Er(P ). Finally, using the Portmanteau Theorem, we prove EU (N n ) w − → Er(P ), Step 1: Exploration Process The first step is to define a process that explores K n and realizes the connected component of a randomly selected vertex r ∈ [n] in N n . Let E n = {{i, j} : i = j ∈ [n]} denote the set of all edges in K n . In order to track the process, we also construct a map φ from S ⊂ N f to the connected component of r. The exploration is on E n and the cost of edges in E n ; at each step of the exploration process, E n is partitioned into five sets, defined as follows, During the proof, we may abuse the notation by saying {i, j} ∈ A t without including the cost, which the distinction is clear from the context. We say a vertex i ∈ [n] belongs to A t , i.e., i ∈ A t if there is a vertex j ∈ [n] such that {i, j} ∈ A t . Finally, we say a vertex v ∈ [n] has been explored by time step t, if both the threshold of v, i.e., T v and the set of potential neighbors of v, i.e., P v have been realized. ii) If only P v (P z ) has been realized, then z / ∈ P v (v / ∈ P z ). If P v and P z have been realized, then either z / ∈ P v or v / ∈ P z (or both).
4. R t : The set of realized edges, R t , consists of all the edges {v, z} such that: i) The cost of {v, z} has been realized. ii) Neither v nor z belongs to the connected component at time t. iii) If P v has been realized, then z ∈ P v . If P z has been realized, then v ∈ P z .
5. U t : The set of unrealized edges, U t , consists of all the edges {v, z} such that the cost of {v, z} has not been realized.
Remark 9. At each step of the exploration process, we may add at most one vertex to the connected component of r. Moreover, if the vertex v is added to the connected component at time t + 1, i.e., v ∈ C t+1 , then v is active at time t, i.e., v ∈ A t and the exploration process explores an edge {j, v} such that j ∈ C t .
The exploration process starts by realizing the sets for t = 0. Set φ(r) = ø and define v 0 := r and k := d r (n). Let T 0 and P 0 denote the threshold and the set of potential neighbors of v 0 , respectively. By definition, T 0 and P 0 are given by, However, we present an alternative way to realize T 0 and P 0 without realizing the cost of {v 0 , j} for all j ∈ [n] \ {v 0 }. This alternative construction of the finite graph is an essential part of the proof of the weak convergence, which is used at all time steps t ≥ 0 as well. Pick a vertex z 0 ∈ [n] \ {v 0 } uniformly at random and assume the threshold of the vertex v 0 is equal to the cost of the edge {v 0 , z 0 }, i.e., T 0 = C n ({v 0 , z 0 }). Realize the value of C n ({v 0 , z 0 }); according to Lemma A.1, the density function of C n ({v 0 , z 0 }) is given by, Next, pick I 0 = {z 1 , z 2 , . . . , z k }, a subset of size k, from [n] \ ({z 0 } ∪ {v 0 }) uniformly at random and assume I 0 is the set of potential neighbors of v 0 , i.e., P 0 = I 0 . Without loss of generality, assume z 1 < z 2 < · · · < z k and define φ( ; by Corollary A.2, the conditional joint density function of these random variables is given by, Start the exploration process with, The description of the above equations is as follows,

Equation 58
: All the edges {v 0 , j} such that C n {v 0 , j} has been realized are removed from E n to construct U 0 . Figure 7 depicts the preparation step for the exploration process. Define T 0 to be equal to T 0 . These two values might be different for t > 0. The definition and the role of T t will become clear later on. Before proceeding with the exploration process, we need to define an order on N f : for two sequence i = (i 1 , i 2 , . . . , i l ) and j = (j 1 , j 2 , . . . , j l ), we say i ≺ j if l < l or l = l and there exist some g ∈ Z + such that (i 1 , i 2 , . . . , i g−1 ) = (j 1 , j 2 , . . . , j g−1 ) and i g < j g . Remark 10. For the sake of notational simplicity, we denote the set of potential neighbors and the threshold of the vertex v t by P t and T t instead of P vt and T vt . We may also use P j as the set of potential neighbors of the vertex j. The distinction is clear from the context.
The exploration process for t ≥ 0 is as follows; , b} ∈ D 2 and {d, b} ∈ C 5 . Note that the labeling is based on being a "potential neighbor" rather than being an actual neighbor.
Remark 12. A vertex v = r belongs to the connected component of r by time step t if and only if v ∈ C t . A vertex v ∈ [n] \ {r} has been explored by time step t if and only if v belongs to the connected component, or there is a vertex v ∈ C t such that {v , v} ∈ D t ∪ C t and v belongs to the set of potential neighbors of v , i.e., v ∈ P v . Note that in the last case, the vertex v may not be the vertex par(v); As an example, in Fig. 8 the vertex g is explored by time step t = 6, but {par(g), g} = {d, g} ∈ A 6 . Remark 13. An important observation is that for every {v, z} ∈ A t exactly one of v or z (but not both) belongs to the connected component of the vertex r at time t. Moreover, at least one of the vertices v or z has been explored; hence, at each time step we may explore at most one vertex.  Cost of the edges, threshold of the vertices, potential degree of the vertices, and the sets A t , U t , C t , and D t are not mentioned. Solid green edges belong to C t , dashed red edges belong to D t , dashed green edges belong to A t , and dashed blue edges belong to R t . Note that par(b) is defined to be r although b is connected to the root via d at time t = 5. Moreover, par(g) is d since the vertex d is the first vertex in the connected component such that g ∈ P d ; although g ∈ P b , the vertex b connected to the connected component after the vertex d. Based on the exploration process, {b, g} ∈ A 5 and e 6 = {b, g}.
Based on the exploration strategy the vertex φ −1 (i) has been explored, but it may not belong to the connected component. More explicitly, par(φ −1 (i)) belongs to the connected component(Remark 11) and par(φ −1 (i)) < φ −1 (j); hence, the edge {par(φ −1 (i)), φ −1 (i)} ∈ C t ∪ D t or equivalently, φ −1 (i) has been explored by time t(Remark 12). However, the vertex φ −1 (j) has two different possibilities, • φ −1 (j) has not been explored: in this case, the vertex φ −1 (i) belongs to the connected component. Let v t+1 = φ −1 (j). Let m ≤ t + 1 denote the number of explored vertices by time step t. Note that at time t = 0, the root vertex has already been explored and for each t > 0, we may explore at most one vertex at each time step(Remark 13). Define k := min(n − m − 2, d v t+1 (n)). If n − m − 2 < 0, which may happen if the graph is fully connected and the process is reaching to its end, then let k = 0. In order to explore v t+1 , the first step is to choose B t+1 = {z 1 , z 2 , . . . , z k }, a subset of size k , uniformly at random from the set of unexplored vertices(there are n − m − 1 unexplored vertices other than v t+1 ). Next, pick a vertex z 0 out of remaining unexplored vertices uniformly at random(there are n − m − 1 − k option for z 0 ). Assume that the cost of the edges {v t+1 , z i } k i=1 are the least k values in {C n ({v t+1 , z}) : z is not explored} and the cost of {v t+1 , z 0 } is exactly the k + 1 th smallest one. As in t = 0, we do not realize the cost of {v t+1 , z} for all unexplored vertices z ∈ [n]. Using Lemma A.1 and Corollary A.3, the joint density function of {C n ({v t+1 , z})} k i=0 is given by, Notice that for every vertex v / ∈ B t+1 ∪ {z 0 } such that v has not been explored and the cost of {v t+1 , v} has not been realized, the value of C n ({v t+1 , v}) is greater than c n ({v t+1 , z 0 }). Define T t+1 to be c n ({v t+1 , z 0 }), If the set of unexplored vertices v such that {v t+1 , v} has not been realized is non-empty, then d v t+1 (n) < n − m − 2 and T t+1 ≤ T t+1 .
The second step to explore v t+1 is to realize the cost of all the edges between v t+1 and the explored vertices; by Corollary A.3, for every explored vertex v such that {v t+1 , v} ∈ U t , the density of C n ({v t+1 , v}) conditioned on T v = w 0 is given by Remark 15. Assume the vertex v has been explored but the value of C n ({v t+1 , v}) has not been realized. Since v has been explored, we already know that v t+1 / ∈ P v and C n ({v t+1 , v}) > T v . However, the first step of the exploration process for the vertex v states C n ({v t+1 , v}) > T v . Moreover, Remark 14 suggests T v ≥ T v .
Note that the potential neighbors of v t+1 are either explored or belongs to B t+1 ∪ {z 0 }. Define k := d v t+1 (n) and set the threshold and the set of potential neighbors of v t+1 , The value of k is less than or equal to k. As the process reaches to its end or if d v t+1 (n) > n − m − 2, we have k < k; hence, it is possible to have z 0 ∈ P t+1 . If c n (e t+1 ) ≥ T t+1 , then the connection e t+1 does not survive; however, all the potential neighbors of v t+1 has been realized and the vertex v t+1 has been explored. In this case, update the sets as follows, , j}, c n ({v t+1 , j})) : j / ∈ P t+1 and {v t+1 , j} ∈ A t } (59a) , j}, c n ({v t+1 , j})) : j / ∈ P t+1 and C n ({v t+1 , j}) is realized} ∪ {({v t+1 , j}, c n ({v t+1 , j})) : j has been explored and v t+1 / ∈ P j } (59c) R t+1 = (R t ∪ {({v t+1 , j}, c n ({v t+1 , j})) : j ∈ P t+1 and j has not been explored}) \ {({v t+1 , j}, c n ({v t+1 , j})) : j / ∈ P t+1 and {v t+1 , j} ∈ R t } (59d) The description of the above equations is as follows, 1. Equation 59a: All the active edges {v t+1 , j} in A t such that j / ∈ P t+1 are removed, including e t+1 . Note that if {v t+1 , j} ∈ A t , then v t+1 ∈ P j (Remark 13); however, after exploring the vertex v t+1 , it is clear whether j is a potential neighbor of v t+1 or not. If j / ∈ P t+1 then the edge {v t+1 , j} is moved to D t+1 . On the other hand, if j ∈ P t+1 , then {v t+1 , j} survives; however, this edge need to be revisited at later time in order to add new members to the set of active edges.

Equation 59b
: The vertex v t+1 is not connected to the connected component through the edge e t+1 . Note that there might be some other vertex j such that {v t+1 , j} ∈ A t and j ∈ P t+1 , i.e., {v t+1 , j} survives(Remark 13); however, the exploration of the edge {v t+1 , j} is postponed to some later time t > t.

Equation 59c
: All the edges {v t+1 , j} such that C n {v t+1 , j} has been realized and j / ∈ P t+1 do not survive. Moreover, for all explored vertex j such that v t+1 / ∈ P j , the edge {v t+1 , j} do not survive as well.

Equation 59d
: For all j ∈ P t+1 such that the vertex j has not been explored, {v t+1 , j} is added to R t+1 . Note that the cost of {v t+1 , j} has been realized and neither v t+1 nor j belong to the connected component. Moreover, for each explored vertex j, if {v t+1 , j} / ∈ R t then either v t+1 / ∈ P j or j belongs to the connected component; hence, {v t+1 , j} need not to be included in R t+1 . Finally, for all edges {v t+1 , j} ∈ R t , the vertex v is a potential neighbor of the vertex j; however, if j / ∈ P t+1 then {v t+1 , j} does not survive.

Equation 59e
: All the edges {v t+1 , j} such that C n {v t+1 , j} has been realized is removed from U t+1 .

Remark 17.
Consider an edges e = {v t+1 , j} such that the cost of e has been realized. If the vertex j / ∈ P t+1 , then the edge e does not survive and it belongs to D t+1 . Assume j ∈ P t+1 . If the vertex j has not been explored, then e belongs to R t+1 . If the vertex j has been explored and v t+1 / ∈ P j , then the edge e does not survive and it belongs to D t+1 . Assume j has been explored and v t+1 ∈ P j . If j belongs to the connected component, then e ∈ A t . If j does not belong to the connected component, then e ∈ R t . In either case, e needs no update, and it is included in the corresponding set at time step t + 1.

Equation 60a
: All the edges {v t+1 , j} such that j ∈ P t+1 and j has not been explored are added to A t . Moreover, all the edge {v t+1 , j} such that j has been explored, j do not belongs to the connected component, j ∈ P t+1 and v t+1 ∈ P j are also included in A t+1 .

Equation 60b
: The vertex v t+1 is connected to the connected component through the edge e t+1 ; however, all the edges {v t+1 , j} ∈ A t such that j ∈ P t+1 are also included in C t+1 ; since for each edge {v t+1 , j} ∈ A t the vertex j belongs to the connected component and v t+1 ∈ P j .

Equation 60c
: All the edges {v t+1 , j} such that C n {v t+1 , j} has been realized and j / ∈ P t+1 do not survive. Moreover, for all explored vertex j such that v t+1 / ∈ P j , the edge {v t+1 , j} does not survive as well.

Equation 60d:
Since v t+1 is connected to the connected component, no edge needs to be added to R t ; however, all the edges {v t+1 , j} ∈ R t is removed from R t , since one end of such an edge belongs to the connected component.

Equation 60e
: All the edges {v t+1 , j} such that C n {v t+1 , j} has been realized is removed from U t+1 .
Remark 18. Consider an edges e = {v t+1 , j} such that the cost of e has been realized. If the vertex j / ∈ P t+1 , then the edge e does not survive and it belongs to D t+1 . Assume j ∈ P t+1 . If the vertex j has not been explored, then e belongs to A t+1 . If the vertex j has been explored and v t+1 / ∈ P j , then the edge e does not survive, and it belongs to D t+1 . Assume j has been explored and v t+1 ∈ P j . If j belongs to the connected component, then e ∈ A t and e is moved to C t+1 . If j does not belong to the connected component, then e ∈ R t and e is moved to A t+1 . Figure 9 illustrates the update process for the case where only φ −1 (i) has been explored.
• φ −1 (j) has been explored: Let v t+1 denote the one, amongst φ −1 (j) and φ −1 (i), which is not connected to the connected component. Since v t+1 has already been explored, all the potential neighbors of the vertex v t+1 has been realized.  Remark 19. Since the vertex v t+1 has been explored and it does not belong to the connected component by time t, there is a vertex v ∈ [n], which belongs to the connected component of r by time t and v t+1 ∈ P v and {v, v t+1 } ∈ D t . Note that v may or may not be par(v t+1 ). To clarify the reason, consider the following cases, 1. Consider the case where φ −1 (j) belongs to the connected component. As is mentioned in Remark 11, the vertex par(φ −1 (i)) has been explored; hence, {par(φ −1 (i)), φ −1 (i)} ∈ D t . In Figure 8, at t = 4, we have i = (2) and φ −1 (2) = b, and j = (3, 1) and φ −1 (j) = d; however, d belongs to the connected component and b does not and the edge {par(b), b} = {r, b} ∈ D 4 .
Without loss of generality, assume φ −1 (i) belongs to the connected component; hence, v t+1 = φ −1 (j). Define k := d v t+1 (n) and set the threshold and the set of potential neighbors of v t+1 , Remark 20. Given both φ −1 (i) and φ −1 (j) has been explored and one of them does not belong to the connected component, the survival of {φ −1 (i), φ −1 (j)} should have been determined, i.e., it survives. The edge {φ −1 (i), φ −1 (j)} has been added to the set of active edges to revisit the vertex v t+1 and add new potential edges to A t .
As is mentioned in Remark 20, the connection e t+1 survives and v t+1 belongs to the connected component. Define I t+1 = {z ∈ P t+1 : φ(z) is not defined}. Assume I t+1 = {z 1 , z 2 , . . . , z |I t+1 | } such that z i < z j for all i < j and set φ(z l ) = (j, l) for all l ∈ [|I t+1 |], where j = φ(v t+1 ). Update the sets as follows, The description of the above equations is as follows,

Equation 61a
: All the edges {v t+1 , j} ∈ R t is added to A t ; since, for every {v t+1 , j} ∈ R t , the vertex j is a potential neighbor of v t+1 and if j has been explored then v t+1 ∈ P j as well. In addition, all the edges {v t+1 , j} ∈ A t are removed from A t ; since, j belongs to the connected component at time t(Remark 13), the edge {v t+1 , j} survives(Remark 20) and we do not need to revisit the vertex v t+1 at a later time.

Equation 61b
: All the edges {v t+1 , j} ∈ A t are moved to C t+1 ; since, if {v t+1 , j} ∈ A t then j ∈ P t+1 , v t+1 ∈ P j and the vertex j belongs to the connected component(Remark 13 and Remark20).

Equation 61c
: Note that both φ −1 (i) and φ −1 (j) has been explored; hence, the cost of none of the edges in U t is realized and the set D t needs no update.

Equation 61d
: All the edges {v t+1 , j} ∈ R t are removed from R t , since one end of such an edge belongs to the connected component. All of these edges are moved to A t+1 .

Equation 61e
: The cost of none of the edges in U t is realized; hence, U t needs no update.
Remark 21. Consider an edges e = {v t+1 , j} with realized cost. If e ∈ A t , then j belongs to the connected component, v t+1 ∈ P j (Remark 13) and j ∈ P t+1 (vertex v t+1 has been explored); hence, e is moved to C t+1 . If the edge e ∈ D t , then e needs no update. If the edge e ∈ R t , then e is moved to A t+1 since v t+1 belongs to the connected component. Finally, e does not belong to U t nor C t .
Remark 22. Recall that for any {v, z} ∈ R t , if v has been explored then z ∈ P v . Moreover, neither z nor v belongs to the connected component of r by time t. Figure 10 illustrates the updating process for the case where both φ −1 (i) and φ −1 (j) have been explored.
The exploration terminates when A t = ∅. Consider the following filtration, Let τ denote the time that the algorithm terminates. Indeed, τ is a stopping time of the filtration.
Step 2: Locally tree-like In the second step, the goal is to show that the rooted graph induced by C t∧τ for any fixed t becomes a tree as the number of vertices, n, goes to infinity. This implies that the graph G n , induced by the network N n after removing the marks, is asymptotically locally tree-like. In fact, a stronger property holds: for every fixed t > 0, the probability that the vertex v l , for all l ∈ [t ∧ τ ], has been touched twice during the exploration process prior to time step l goes to zero as n → ∞. The term "touching" is defined as follows, Definition 7. A vertex v is said to be touched at time t ≤ τ if the cost of {v t , v} is realized at time t , i.e., {v t , v} ∈ U t −1 \ U t . The vertex v t is chosen according to the exploration process. Note that the vertex v may have or may not have been explored.
If for every l ∈ [t ∧ τ ], the vertex v l has been touched only once before the time step l, then e l = {par(v l ), v l }; moreover, for every l < l, the vertex v l is not the potential neighbor of the vertex v l . This implies that the rooted graph induced by C t∧τ is a tree. A stronger condition is proved in the following lemma: with high probability, for all l ∈ [t ∧ τ ] the potential neighbors of the vertex v l are touched for the first time, except probably par(v l ).
(c) Update the sets for time step t + 1. Red dashed lines belong to D t+1 , green dashed lines belong to A t+1 , and solid green lines belong to R t+1 . Figure 10: The exploration process at time step t, when both the vertices φ −1 (j) and φ −1 (i) have been explored.
Lemma A.4. Locally tree-like For t > 0, let J t denote the set of vertices j such that C n ({v t , j}) ≤ T t and j has been touched at least twice during the exploration process up to time t , once at time step t and at least once at some time stept < t , i.e., J t = {j ∈ [n] : C n ({v t , j}) ≤ T t , {v t , j} ∈ U t −1 \ U t , and ∃ṽ = v t such that {ṽ, j} / ∈ U t −1 } Consider a fixed value t > 0, then we have, lim n→∞ P (∃l ∈ [t ∧ τ ] such that |J l | = 0) = 0 (62) Remark 23. Consider the event J l = ∅ for all l ∈ [t ∧ τ ]. This implies that for every vertex j such that C n ({v l , j}) ≤ T l , either j is touched for the first time at time step l or the value of C n ({v l , j}) has been realized by time step l − 1. However, if j = par(v l ), then the second case cannot happen; otherwise, the vertex v l should have been touched at least twice during the exploration process up to time l − 1.
Remark 24. Even if the rooted graph induced by C t∧τ is a tree, it does not mean that the exploration process satisfies the property which is mentioned in Lemma A.4. In Fig. 8, the vertex b has been touched twice during the exploration process up to time step t = 1, at time steps t = 0 (by the vertex a) and t = 1; however, C 1 is a tree.
Proof. Observe that J 0 = ∅. Fix t > 0. An obvious upper-bound for the left-hand side of the equation (62) is given by applying the union bound: We provide an upper-bound for each term on the right-hand side. If the vertex v l has been explored by time step l − 1, then we do not need to touch any vertex at time l and J l = ∅. In figure 8, the vertex b has already been explored at time step t = 2 and J 5 = ∅. Hence, we only need to consider the sample paths that v l has not been explored.
P(|J l | = 0|F l−1 ) = 1{v l has been explored} + 1{v l has not been explored}P(|J l | = 0|F l−1 ) Consider the sets δ l , ε l , l ∈ F l−1 defined as follows, Observe that | l | ≥ 1 since v l ∈ A l . Moreover, at each step of the time we may explore at most one vertex (there might be cases in which we revisit an explored vertex); hence, |ε l | ≤ l. Furthermore, for all sample paths in F l−1 in which v l has not been explored, l ⊆ ε l since if {v l , j} has been realized and v l has not been explored, then j has been explored. Finally, at each time step l , we may touch at most d v l (n) + 1 new vertices; hence, |δ l | ≤ 1 + l−1 i=0 d v i (n) + 1 Let k := d v l (n) denote the potential degree of the vertex v l . Let k := min(k, n − | l | − 2), where n − | l | − 1 equals to the number of vertices j such that {v l , j} ∈ U l−1 . Note that n − | l | − 1 > 0 if v l has not been explored and n > l. Define T l and P l to be a modified version of T l and P l , i.e., In the definition of T l , all possible vertices are considered; however, the definition of T l skips all the vertices j, such that {v l , j} has been realized prior to time step l. Hence, if k = k, then T l ≤ T l . Moreover, for every vertex j ∈ P l such that the cost of {v l , j} is realized at time l, i.e., {v l , j} ∈ U l−1 , we have j ∈ P l . To see this, consider the two cases: 1)If k = k, then j ∈ P l implies C n ({v l , j}) < T l ≤ T l . 2)If k < k, then P l contains all the vertices j such that {v l , j} ∈ U l−1 .
To realize T l and P l , we need to pick the k + 1 closest vertices to v l , based on the cost of the connection. For an unexplored vertex j, the cost of {v l , j} is an exponentially distributed random variable with parameter 1 n . For an explored vertex j such that {v l .j} ∈ U l−1 , the cost of {v l , j} conditioned on T j is a shifted exponentially distributed random variable with parameter The result follows from the fact that the summation in (63) has only t summands, each of which converging to zero as n goes to ∞.
Step 3: Convergence of the Exploration In the third step, the local structure of the rooted graph induced by C t∧τ for any fixed t is studied. The goal is to analyze the joint distribution of the sequence (X Conditioned on n i = m and v i = x, the probability of the event ζ (i,j) < v (i,j) is given as follows, The symmetric and conditionally independent structure of EWT implies that the random variable D i conditioned on n i = m and v i = x has the binomial distribution. Hence, The degree distribution of the root follows immediately by integrating/summing over all possible values of v ø and n ø . The mean of D ø is obtained as follows:

C Proof of Theorem 4.7
Let W l,i denote the number of potential vertices at depth l on the backbone tree, all of whose paths to the root vertex pass through the potential vertex i ∈ N f . In the following, we write i = (i 1 , i 2 , · · · , i k ) where k ≥ 0. We have where f l (·) is the probability density function of Erlang(l) and 1 j ∈ N f is a sequence of all 1 of length j. Using the equality f k (x) × (k − 1)/x = f k−1 (x), interchanging order of integration in pairs, e.g., z l and y l−1 , and using the complementary cdfs to simplify the integrals involving the z's, we have, (y 1 , y 2 )) . . . F k l−1 −1 (max(y l−1 , y l ))F k l (y l )dy 1 dy 2 . . . dy l