Wireless random-access networks with bipartite interference graphs

We consider random-access networks where nodes represent servers with a queue and can be either active or inactive. A node deactivates at unit rate, while it activates at a rate that depends on its queue length, provided none of its neighbors is active. We consider arbitrary bipartite graphs in the limit as the initial queue lengths become large and identify the transition time between the two states where one half of the network is active and the other half is inactive. The transition path is decomposed into a succession of transitions on complete bipartite subgraphs. We formulate a randomized greedy algorithm that takes the graph as input and gives as output the set of transition paths the network is most likely to follow. Along each path we determine the mean transition time and its law on the scale of its mean. Depending on the activation rates, we identify three regimes of behavior.


Introduction
The present paper is a continuation of [1].In Section 1.1 we give our motivation, which is a summary of the more extensive motivation provided in [1, Section 1.1], where also relevant references to the literature are included.In Section 1.2 we formulate the random-access model whose performance we analyze in detail.In Section 1.3 we introduce the interference graph and recall a key theorem from [1] for the total transition time on complete bipartite graphs.In Section 1.4 we hint at the key idea behind our analysis, which involves transitions along a sequence of complete bipartite subgraphs selected via a randomized greedy algorithm, and give an outline of the remainder of the paper.

Motivation and background
We are interested in transition time asymptotics of queue-based random-access protocols in wireless networks.Specifically, we consider a stylised stochastic model for a wireless network, represented in terms of an undirected graph G = (S, E), referred to as the interference graph.The set of nodes S labels the servers and the set of edges E indicates which pairs of servers interfere and are therefore prevented from simultaneous activity (see Fig. 1).We denote by X(t) = (X w (t)) w∈S the joint activity state at time t, which is an element of the state space X = x ∈ {0, 1} |S| : x w x w = 0 ∀ (w, w) ∈ E , (1.1) where x w = 0 means that node w is inactive and x w = 1 means that node w is active.We assume that packets arrive at the nodes as independent Poisson processes and have independent exponentially distributed sizes.When a packet arrives at a node, it joins the queue at that node and the queue length undergoes an instantaneous jump equal to the size of the arriving packet.The queue length decreases at a constant speed c (as long as it is positive) when the node is active.We denote by Q(t) = (Q w (t)) w∈S the joint queue length state at time t.When node w is inactive at time t, it activates at a rate that is an increasing function of Q w (t), provided none of its neighbors is active.When a node is active at time t, it deactivates at rate 1.The joint process (X(t), Q(t)) t≥0 (1.2) evolves as a time-homogeneous Markov process with state space X × R |S| ≥0 , since the transition rates depend on time only via the the current state of the vector.
The Markov process in (1.2) may be viewed as a hard-core interaction model with statedependent activation rates.Its present state not only depends on the history of the packet arrivals and their service times (which cause upward jumps in the queue lengths), but also on the history of the activity process (through the gradual reduction in queue lengths during activity periods).The state-dependent nature of the activation rates raises interesting and challenging issues from a methodological perspective.We are particularly interested in what happens when the initial queue lengths (1.3) become large.In this limit the network exhibits metastable behavior : before becoming active, an inactive node must wait until all the nodes it interferes with have become inactive simultaneously, which takes a long time when the queues at these nodes are long and the activation rates grow without bound as function of the queue length.
In [1] we focused on the simple case of a complete bipartite interference graph: the node set can be partitioned into two nonempty sets U and V such that two nodes interfere if and only if one belongs to U and the other belongs to V .In the present paper we turn our attention to general bipartite interference graphs, for which not necessarily all nodes in U interfere with all nodes in V .This case will turn out to be considerably more challenging.We will be interested in starting from the state where all the nodes in U are active and all the nodes in V are inactive, and examining the transition time to the state where all the nodes in U are inactive and all the nodes in V are active.We refer to this transition as a metastable crossover.It will turn out that, in order to achieve the full transition, the network goes through a succession of subtransitions, in which a certain succession of complete bipartite subgraphs achieve a metastable crossover and, in doing so, effectively remove themselves from the network.This succession depends in a delicate manner on the full structure of the bipartite interference graph, which we capture with the help of a randomized greedy algorithm that identifies which subtransition occurs first, which second, etc., and with what probability.By combining the results in [1] with a detailed analysis of the algorithm, we are able to determine the distribution of the full metastable crossover time to leading order as the initial queue lengths become large.

Mathematical model
We consider the bipartite graph G = ((U, V ), E), where U ∪ V is the set of nodes and E is the set of edges that connect a node in U to a node in V , and vice versa (edges are undirected).We set N = |V |.We recall some definitions and basic facts from [1].

Definition 1.1. [Key notions 1]
(1) State of a node.A node in the network can be either active or inactive.The state of node w at time t is described by a Bernoulli random variable X w (t) ∈ {0, 1}, defined as X w (t) = 0, if w is inactive at time t, 1, if w is active at time t. (1.4) The configuration at time t is denoted by X(t) = {X w (t)} w∈U ∪V . (1.5) We denote by 1 U (1 V ) the configuration where all nodes in U are active (inactive) and all nodes in V are inactive (active).
(2) Transition time.Our main object of interest is the transition time to 1 V starting from 1 U , i.e., τ 1 V = min t ≥ 0 : X(t) = 1 V } given X(0) = 1 U . (1.6) (3) Activation and deactivation of a node.An active node w turns inactive according to a deactivation Poisson clock: when the clock ticks the node switches itself off.Conversely, an inactive node w attempts to become active according to an activation Poisson clock, but the attempt is successful only when no neighbors of i are active.We are interested in what are called internal models, where the activation rate at node w at time t depends on the queue length at node w at time t.The deactivation rate is 1 and does not depend on the queue length.
(4) Queue length at a node.Let t → Q + w (t) be the input process describing packets arriving at node w according to a Poisson process t → N w (t) = Poisson(λt) and requiring i.i.d.exponential service times Y wn , n ∈ N, with rate µ U for w ∈ U and µ V for w ∈ V .This is a compound Poisson process with mean ρ U = λ/µ U for w ∈ U and ρ V = λ/µ V for w ∈ V .Let t → Q − w (t) be the output process representing the cumulative amount of work that is processed by the server at node w in the time interval [0, t] at rate c, which equals cT w (t) = c t 0 X w (s)ds.In order to ensure that the queue length tends to decrease when a node is active, we assume that ρ U < c and ρ V < c.Define and let s * = s * (t) be the value where sup s∈[0,t] [∆ w (t) − ∆ w (s)] is reached, i.e., equals [∆ w (t) − ∆ w (s * −)].Let Q w (t) ∈ R ≥0 denote the queue length at node w at time t.Then where Q w (0) is the initial queue length.The maximum is achieved by the first term when Q w (0) ≥ −∆ w (s * −) (the queue length never sojourns at 0), and by the second term when Q w (0) < −∆ w (s * −) (the queue length sojourns at 0 at time s * −).
(5) Initial queue length.The initial queue length is assumed to be given by where γ U , γ V > 0, and r is a parameter that tends to infinity.In order to ensure that the queue lengths at nodes in V always remain of order r, we assume that (1.10) We will often write Q U (0) and Q V (0) to indicate the initial queue lengths at nodes in U and V , respectively.(6) Dependence of activation rate on queue length.Let g U , g V ∈ G with G = g : R ≥0 → R ≥0 : g non-decreasing and continuous, g(0) = 0, lim The deactivation clocks tick at rate 1, while the activation clocks tick at rate (1.12) We focus on the particular choice We assume that nodes in V are much more aggressive than nodes in U , namely, As we will see later, this ensures that the transition path from 1 U to 1 V can be decomposed into a succession of transitions on complete bipartite subgraphs.

Interference graph
Write P 1 U and E 1 U to denote probability and expectation on path space given that the initial configuration is 1 U and the initial queue lengths are as in (1.9).We say that an event occurs with high probability if its P 1 U -probability tends to 1 as r → ∞.
In [1, Theorem 1.7], building on results from [2], we analyzed the mean transition time for the special case where the interference graph is a complete bipartite graph.They are strongly related to the initial queue lengths Q U (0) at the nodes in U . x The curve in the center is convex when C ∈ (0, 1 2 ) and concave when C ∈ ( 1 2 , 1).The curve on the right is the limit of the curve in the center as C ↑ 1.
Theorem 1.2.[Transition time for complete bipartite graph [1, Theorem 1.7]] Let G be a complete bipartite graph.Suppose that (1.13)-(1.14)hold.Suppose that the initial queue lengths at the nodes in U equal Q U (0) = γ U r.
Theorem 1.2 shows that there is a trichotomy (see Fig. 2): depending on the value of β the transition exhibits a subcritical regime, a critical regime and a supercritical regime.A heuristic for the critical value is the following: the fraction of joint inactivity time of the nodes in U is of order (1/r β ) |U | = r −β|U | ; since the time it takes to leave the joint inactivity state is of order r −β , all nodes in U become simultaneously inactive for the first time after a period of order r −β /r −β|U | = r β(|U |−1) .Our goal is to extend Theorem 1.2 to arbitrary bipartite graphs (see Fig. 3 for examples).Note how the mean transition time depends on the actual value of the initial queue lengths at nodes in U : for complete bipartite graphs, the initial queue lengths are fixed to be γ U r; for arbitrary bipartite graphs, we will see how the mean transition time depends on the way the queue lengths change while activating nodes in V .

Definition 1.3. [Key notions 2]
(1) Neighbors of a node.For a node v ∈ V , we define the set of neighbors of v as and the degree of v as (1.25) (2) Updated queue lengths.Let Q U = {Q u } u∈U be the sequence of queues associated with the nodes in U , and Q V = {Q v } v∈V the sequence of queues associated with the nodes in be the pair of sequences representing the updated queue lengths after k nodes in V have been activated (see Definition 2.13 later for more details).
(3) Transition time and forks.We denote by T Q G the transition time of the graph G, i.e., (1.6), conditional on the initial queue lengths Q = (Q U , Q V ).It represents the time τ 1 V it takes to hit configuration 1 V starting from configuration 1 U .Given a node v ∈ V , we refer to the fork of v as the complete bipartite subgraph of G containing only node v, its neighbors N (v) ⊆ U and the edges between them.We talk about a d-fork when d(v) = d with d ∈ N. (4) Nucleation times.We denote by T Q v the nucleation time of the fork of v conditional on the state of the queues Q.It represents the time it takes for the fork of v to deactivate N (v) and activate v and it can be seen as the transition time of the complete bipartite subgraph of G represented by the fork of v.Note that, for v, w ∈ V , T Q v and T Q w are dependent random variables when The difference of wording between transition and nucleation is chosen in order to distinguish between the full transition of G and the successive nucleations of the forks (subgraphs of G) of the activating nodes in V .

Key idea and outline
The key idea behind the present paper is to define a randomized greedy algorithm that allows us to identify the set of paths A the network is mostly likely to follow while deactivating the nodes in U and activating the nodes in V .We label the nodes in V based on their first activation and we denote by a * the path that the network follows, i.e., a * = {v * 1 , . . ., v * N } with v * 1 the first node that activates and v * N the last.Let E(a * ) denote the event that one of the paths in A occurs.We will prove that (1.26) In particular, we will show that if we condition on the event then we are able to identify how the mean transition time depends on the sequence of nucleation times of the forks of the nodes in V , ordered as in the path a (Theorem 3.2 below).We derive the asymptotics of the mean transition time as r → ∞ (Theorem 3.3 below) and identify the law of the transition time on the scale of its mean (Theorem 3.5 below).To do so, we determine how the queue lengths change along the given path (Theorem 4.7 below).Similarly as for the complete bipartite graph in Theorem 1.2, we distinguish between three regimes for the value of β (subcritical, critical and supercritical), in which the queues behave differently and, consequently, so does the transition time.
Outline.The remainder of the paper is organized as follows.In Section 2 we introduce the algorithm, show that it has two important properties -greediness and consistency -and give an example of how it works.In Section 3 we state our main theorems.In particular, we show how both the mean transition time and the law of the transition time on the scale of its mean can be determined according to the path that the algorithm chooses.In Section 4 we show how the nucleation times depend on the graph structure and we analyze how the queue lengths at the nodes change along each path that the algorithm chooses.In Section 5 we provide the proof of the two algorithm properties mentioned above.In Section 6 we prove our main theorems.In Appendix A, we show some technical computations for the mean nucleation time in the special setting of independent forks competing for activation.

Algorithm
In this section we introduce a randomized algorithm that describes, step by step, how the network behaves while deactivating the nodes in U and activating the nodes in V .The presentation is organized into a series of definitions and lemmas.In Section 2.1 we define how the algorithm works iteratively.In Section 2.2 we show that the algorithm is greedy and consistent (Propositions 2.10-2.11below).In Section 2.3 we explain how the algorithm is used to capture the nucleation of the forks.An example of a bipartite graph and how the algorithm acts on it are given in Section 2.4.

Definition of the algorithm
The algorithm takes as input the bipartite graph G = ((U, V ), E) and gives as output a sequence of triples that is needed to characterise the transition time, namely, where Y k is a random variable with values in {1, . . ., N } describing the index of the node in V selected at step k, dk ∈ N is the degree of the selected node in the graph remaining after k − 1 steps, and n k ∈ N is a parameter that counts how many possibilities there are at step k to choose the next node in V (uniformly at random) from the remaining nodes with least degree.Sometimes we will write v * k instead of v Y k to emphasise that the network is following a specific order while activating the nodes.The integer N = |V | represents the number of iterations of the algorithm.
), E k+1 ) by iterating the following procedure until V k+1 is empty: • Look at the nodes in V k and at their minimum degree dk in G k .
• Pick a node in V k uniformly at random from the ones with minimum degree dk .
• Denote the chosen node by v * k and the number of choices by n k .• Eliminate the node v * k , all its neighbors in U k , together with all their edges.Denote the resulting bipartite graph by G k+1 .
We next introduce the notion of admissible paths, which will be relevant to computing the transition time of the graph with the help of the above algorithm.Definition 2.2.[Paths and admissible paths] Define a path a = (v 1 , . . ., v N ) as a sequence of activating nodes in V , where each node is present exactly once.The set of paths, denoted by Ω, is the set of every possible ordering of the nodes in V .Let A be the set of admissible paths defined as the set of all the paths generated by the algorithm with positive probability.If a * is the path followed by the network, we define A a to be the event that the network follows the admissible path a ∈ A as We are now ready to define the transition time along an admissible path.The idea of eliminating step by step the nodes in U that deactivate comes from the fact that when a node in V activates, it "blocks" all its neighbors in U , which with high probability will remain inactive for the rest of the time.This is due to the aggressiveness of the nodes in V compared to the nodes in U (recall (1.13)-(1.14)).
Lemma 2.4.[Activation sticks] Consider a node u ∈ U and let N (u) ⊆ V be the set of neighbors of u.Denote by t u the first time a node v ∈ N (u) activates.Then, with high probability u remains inactive after t u for the duration of the transition, i.e., X u (t) = 0 for all t u ≤ t ≤ T Q G .
The above lemma will be proved in Section 6.2 and ensures that the transition along an admissible path can be decomposed into a succession of nucleations associated with the nodes in the path.
Remark 2.5.[Good behavior] Recall from [1] that the queue lengths at nodes in U all have a good behavior, in the sense that the queue length at u ∈ U stays close to its mean until one of the neighbors in V activates or until time r → ∞, the expected time it takes for the queue lengths at nodes in U to hit zero.More precisely, for any u ∈ U , if we denote by T gb u the minimum between these two times, for δ > 0 small enough and for all t ∈ [0, T gb u ], lim Since the queue lengths at nodes in U always remain of order r while subcritical or critical nodes are activated, it follows that, for any u ∈ U and for all t ∈ [0, T gb u ], which means that the queue length is always close to its mean for all times smaller than T gb u .
Any statement involving the transition time or the nucleation times holds for typical values of the queue lengths, i.e., for values compatible with good behavior.With the above remark in mind, in accordance with Theorem 1.2, we now define the mean nucleation times associated with the nodes of an admissible path.Definition 2.6.[Nucleation times associated with an admissible path] Suppose that the algorithm generates the admissible path (v * 1 , . . ., v * N ).Associated with each step k of the algorithm is the nucleation time Here F k is a pre-factor that depends on the degree dk , which plays the role of |U | in Theorem 1.2, and on its relation with β.The term Q k−1 u represents the updated queue length at node u ∈ U k in the subgraph G k and plays the role of the initial queue lengths in Theorem 1.2.
Note that (2.5) is an asymptotic statement about sequences of random variables (see the following remark).The notation o(1) = o P 1 U (1) refers to a random variable determined by the law P 1 U that goes to 0 in distribution as r → ∞.Similarly, for α > 0, the notation o(r α ) = o P 1 U (r α ) refers to a random variable determined by the law P 1 U that goes to 0 in distribution when divided by r α as r → ∞.Throughout the paper we write o(1) and o(r α ) to simplify the notation.
Remark 2.7.[Conditioning] Every time we consider the transition time or the nucleation times we are conditioning on the state of the queues, hence all the expectations should be interpreted as conditional expectations.While the initial queue lengths are fixed via (1.9), the updated queue lengths are random.For v ∈ V , in expressions of the form T Q k−1 v with k = 1, . . ., N , the dependence on the random updated queue lengths is indicated by the superscript Q k−1 (note that Q 0 actually represents the initial queue lengths).More precisely, when we consider step k of the algorithm, or the induced subgraph G k , for k = 1, . . ., N , we are conditioning on Q k−1 and on the first k − 1 activating nodes.Throughout the paper for compactness we omit the specific notation for conditional random variables and conditional expectations, but the reader is encouraged to keep it in mind for the statements that follow.
Intuitively, the sum of the mean nucleation times associated with an admissible path gives the mean transition time along that path.We will see in Section 4.2 that the pre-factors F k actually need to be adjusted by certain weights that depend on the graph structure.

Properties of the algorithm
Definition 2.8.[Maximum least degree] Given the sequence ( dk ) N k=1 generated by the algorithm, let d * = max 1≤k≤N dk be the maximum least degree of the path associated with ( dk ) N k=1 .
The notions of minimum degree dk at step k and maximum least degree d * can be extended also to non-admissible paths.For a general path, the degree dk at step k is the minimum degree of the remaining nodes in V in the induced subgraph of G obtained by removing the nodes activated in the first k − 1 steps and their neighbors.We will show that the set of admissible paths A is the set of the most likely paths the network follows.The following lemma and two propositions will be proved in Section 5.2.In other words, given any path b, its maximum least degree cannot be smaller than the maximum least degree of an admissible path a.We will see how the maximum least degree d * determines the order of the mean transition time.Depending on how β is related to d * , we distinguish three different regimes: (2.6) Note that d * = 0 means that there are no edges in the graph, while d * = 1 means that each node in V has at most one neighbor in U at the moment of its activation, which implies that the transition occurs in time O(1).The most interesting scenarios to investigate occur then when d * ≥ 2, which will be an implicit assumption throughout the paper.
The algorithm is greedy, in the sense that it always chooses a node that adds the least to the order of the total transition time along the path, simply because this node is likely to be the first to activate.Proposition 2.10.[Greediness] The order of the mean transition time along any admissible path is the smallest possible.
The algorithm is consistent, in the sense that d * is unique.Different admissible paths lead to the same order of the mean transition time.
Proposition 2.11.[Consistency] All the admissible paths lead to the same order of the mean transition time.

Structure of the algorithm
It is intuitive that a node in V activates because it is the one whose complete bipartite fork has the fastest nucleation.Note that this depends on the activation and deactivation Poisson clocks, and on the queue length processes.We will see in Theorem 3.2 below that, with high probability, the network follows an admissible path.Hence a node in V activates because it is the one whose complete bipartite fork has the fastest nucleation among the nodes with minimum degree.Definition 2.12.[Next nucleation time] On the event that the network follows one of the admissible paths, given that k − 1 nodes in V have already been activated, define the time the network subsequently takes to activate the k-th node in V by where V k is the set of inactive nodes in V produced by the algorithm after k − 1 iterations.
Note that if we condition on the event A a that the network follows the admissible path a = (v 1 , . . ., v N ) ∈ A, then the k-th activating node a k is the node that realizes the minimum in (2.7), and its nucleation time is By keeping track of which nodes have been picked, in Section 4.3 we will compute the updated queue lengths after each activation in V , of which we are now able to give a precise definition (recall Definition 1.3(2)).Definition 2.13.[Updated queue lengths] On the event that the network follows one of the admissible paths, for k = 1, . . ., N , define the updated queue lengths Q k−1 at step k after k − 1 nodes have been activated by where are vectors that represent the updated queue lengths at nodes in U and V , respectively.
When a node in V activates, its fork can be of three different types depending on how its degree is related to β. Definition 2.14.[Subcritical, critical and supercritical nodes] Given that k − 1 nodes in V have already been activated, consider the k-th activating node and its fork of degree dk .
In the subcritical and the critical regime, the nucleation time of a node from Definition 2.12 is with high probability a minimum over the nodes with least degree in V k .Indeed, nodes with least degree activate first with high probability.The following lemma will be proved in Section 6.2.

Lemma 2.15. [Activation selects low degree]
(2.9) In the supercritical regime the situation is more delicate.If at step k the least degree fork has degree dk such that β( dk −1) > 1, then the mean nucleation time of the next activating fork is the same for all the remaining forks in the graph.The network does not distinguish between the nodes according to their degree anymore, since all possibilities contribute equally to the total mean transition time.Indeed, we know from Theorem 1.2 that the mean nucleation time is given by the expected time it takes for the queues in U to hit zero.Hence, after the nucleation of the first supercritical fork, all the queues in U are of order o(r) and the transition occurs very fast (see Section 4.3 for more details).
In Section 3 we will see how the transition time can be determined given the set of admissible paths.Moreover, we will identify the mean transition time along each path and its law on the scale of its mean.Given a path, we know in which order the nodes activate.In Section 6 we will see how we can identify the nucleation time of a node given in Definition 2.12 with the nucleation time of the complete bipartite fork of the activating node, as written in (2.5).The sum of all the nucleation times gives us the transition time of the graph.Not all the terms in the sum contribute significantly in the limit as r → ∞.We will need to identify which are the leading order terms.The answer depends on the sequence of degrees ( dk ) N k=1 generated by the algorithm and on how the queue lengths change along the path.

Example
Consider the bipartite graph G = ((U, V ), E) with |U | = 6 and |V | = 4 in Fig. 4.This graph serves as a simple example of how the algorithm works.
).There are two nodes v 2 , v 4 with minimum degree d1 = 2, so n 1 = 2. Pick uniformly at random one of them (with probability 1 n 1 = 1 2 ), say Y 1 = 2. Eliminate node v 2 , all its neighbors u 2 , u 3 , and all their edges . The nucleation time associated with this node satisfies, for any u ∈ U 1 , (2.10) Node v 1 has the minimum degree d2 = 1, so Y 2 = 1.Eliminate node v 1 , all its neighbors, and all their edges.Denote the new bipartite graph by G 3 = ((U 3 , V 3 ), E 3 ).The nucleation time associated with this node satisfies, for any u ∈ U 2 , ), E 4 ).The nucleation time associated with this node satisfies, for any u ∈ U 3 , (2.12) k = 4. Node v 3 is the only node left, with degree d4 = 1, so Y 4 = 3. Eliminate node v 3 , all its neighbors, and all their edges, after which the empty graph is left.The nucleation time associated with this node satisfies, for any u ∈ U 4 , (2.13) The above scenario forms a path that is described by nodes in V activating in the order v 2 , v 1 , v 4 , v 3 (see Fig. 5).
Note that the algorithm may pick node v 4 at the first step by setting Y 1 = 4, since the choice of the node with minimum degree is made uniformly at random.If so, then the algorithm follows a different path.At the first step, Y 1 = 4 and . This choice leads to a different path, where the nodes in V activate in the order v 4 , v 2 , v 1 , v 3 .
Each possible scenario is identified with a path in the algorithm (admissible path), described by the nodes in V according to the order of their first activation.The total mean transition time along a path can be thought of as a sum of the mean nucleation times associated with each activating node in the path (see Theorem 3.2).We will prove in Section 5.2 that all the admissible paths lead to the same order of the mean transition time.

Transition time: main theorems
In this section we present our main theorems regarding the transition time.In Section 3.1 we show that E, the event that the network follows the algorithm, has high probability (Theorem 3.2(i) below).We analyze the contributions along a given admissible path, noting that not all the nucleation times are significant for the total mean transition time (Theorem 3.2(ii) below).In Section 3.2 we compute the asymptotics of the mean transition time, including the pre-factor, focusing on the significant terms only (Theorem 3.3 below).In Section 3.3 we identify the law of the transition time divided by its mean, which turns out to be a convolution of the laws found for the complete bipartite graph in Theorem 1.2 (Theorem 3.5 below).There is again a trichotomy, depending on the value of β.Proofs will be given in Section 6.

Most likely paths
Recall that Ω is the set of all possible paths and A ⊆ Ω is the set of admissible paths.Denote by A sc the subset of admissible paths truncated at the first supercritical node (if there is any).Recall that, according to Definition 2.14, a supercritical node is a node that is activated through a supercritical fork.If a = (v 1 , . . ., v N ) is an element of A, then a sc = (v 1 , . . ., v sc ) is an element of A sc , where v sc denotes the last node of each truncated ordering.We allow this node to be any of the remaining supercritical nodes.Definition 3.1.[The network follows the algorithm] Denote by a * = (v * 1 , . . ., v * N ) the path followed by the network and define as the event that the network follows any of the admissible paths up to the first supercritical node (if there is any).
Our first main theorem consists of three statements and shows how the algorithm helps us find the mean transition time of the network.The first statement holds for all three regimes.The second and third statements hold for the subcritical and the critical regime only (for which the network follows the algorithm until the last activating node).The idea is that the mean transition time of the network can be seen as a weighted sum of the mean nucleation times associated with each activation and of negligible terms representing the time it takes after each activation to bring the network back in the configuration with all the nodes in U active.For the second statement, we denote by A k the set of (partial) admissible paths until step k.For the third statement, recall that A a is the event that the network follows the admissible path a ∈ A, as defined in (2.2).Note that in both the statements we take the expectation E Q on the right-hand side: this averages over the random values Q 1 , . . ., Q N −1 of the updated queue lengths on which we condition via the nucleation times appearing in each term of the sums.For the supercritical regime we do not need any statement, because the mean transition time is known from [1] to be the expected time it takes for the queue lengths to hit zero.
as in (1.9).(i) With high probability the network follows the algorithm, i.e., Consider β ∈ (0, 1 d * −1 ]: subcritical or critical regime. (ii) With high probability the transition time of G given the initial queue lengths Q 0 satisfies ) where n k ∈ N is the number of possible nodes that the algorithm can pick at step k, while the factor f k ∈ (0, 1) (to be identified in Theorem 3.3) comes from the fact that the node activating at step k is the one that activates first among the n k nodes with the same least degree.Both n k and f k depend on the sequence of nodes that have been activated before step k.
(iii) With high probability the transition time of G along an admissible path a ∈ A given the initial queue lengths Q 0 satisfies Theorem 3.2 will be proved in Section 6.3.Note that the mean transition time of the graph G given the initial queue lengths Q 0 can be split as (3.5) The second term on the right-hand side represents the mean transition time when the network does not follow the algorithm, and equals Even though we know from Theorem 3.2(i) that P 1 U (E(a * ) C ) tends to zero as r → ∞, a priori this term may still affect the total mean transition time, since the conditional expectation may be substantial.In what follows we focus on the first term on the right-hand side of (3.5), since this captures the typical behavior of the network.All the results in the present paper are conditional on the high probability event E(a * ) that the network follows the algorithm.We will omit the conditional notation to facilitate the reading.The reader should always keep this in mind while going through the statements and the proofs.
We will see in Theorem 3.3 below that, in the supercritical regime, the mean transition time is the expected time it takes for the queues in U to hit zero, independently of which path the network took before activating the first supercritical node.Theorem 3.2(ii)-(iii) give us a way, in the subcritical regime and the critical regime, to split the total mean transition time into a sum of mean nucleation times of successive forks, by taking into account all the admissible paths, each with its own probability.Namely, (3.7) The above expression allows us to compute the mean transition time along a single admissible path.Note that, the probability that the network follows any path a ∈ A conditional on the event that it follows the algorithm is given by the probability that the algorithm generates path a, i.e., Recall that, by Proposition 2.11, that the order of the mean transition time does not depend on which path is followed.

Mean of the transition time
Consider an admissible path a ∈ A and the event A a that the network follows this path.
Recall that d * = max 1≤k≤N dk is the maximum degree among the sequence of minimum degrees ( dk ) N k=1 .Let v * k be the k-th activating node in path a.According to Definition 2.6, for any u ∈ U k the mean nucleation time where F k sub , F k cr , F k sup are constants depending on dk , B, c, ρ U .Namely, Note that F k sub really depends on k, while is the same for every critical node, and F k sup = F sup is independent of k.Moreover, note that the first mean nucleation time depends on the initial queue lengths Q 0 U at the nodes in U , but in general the mean nucleation time associated with a fork depends on the queue lengths at the nodes in U at the moment the fork starts the nucleation.
Our second main theorem identifies the mean transition time along a given path.
as in (1.9).The transition time of the graph G given the initial queue lengths Q 0 satisfies the following. ) with and γ where for a critical node v i the coefficient f ′ i is defined in a recursive way as ) (3.17) r → ∞.
(3.18) Theorem 3.3 will be proved in Section 6.4.Both in the subcritical and the supercritical regime, Theorem 3.3 provides explicit formulas for the mean transition time in terms of the parameters c, γ U , ρ U and B, β in our model (recall Section 1.2) and the sequence of numbers ( dk , n k ) N k=1 that are produced by the algorithm (recall (2.1)), with d * = max 1≤k≤N dk .In the critical regime, however, the formula is more delicate, since the pre-factor depends on how long the critical nucleations take.Indeed, γ U in (3.15) represents the mean updated queue length at nodes in U at step k (see Section 4.3 for more details).Note that the mean transition time in the subcritical and the critical regime depends on the path, while in the supercritical regime it does not.

Law of the transition time
Our third main theorem identifies the law of Recall the laws P sub , P cr , P sup introduced in (1.2).Write ⊛ to denote convolution. ) where δ 1 (z) is the Dirac function at 1. Theorem 3.5 will be proved in Section 6.5.There we will also see why there is no statement for the critical regime (II).

Discussion
Analyzing the transition time for arbitrary bipartite graphs is much harder than for complete bipartite graphs.The key idea is to view the transition time as a sum of subsequent nucleation times for complete bipartite subgraphs.The order in which nodes activate in V is random, because it depends on the fluctuations of the activation rates via the queue lengths.However, with high probability the nodes with the least number of active neighbors in U activate first.After each activation, the underlying bipartite graph changes according to which node is activated and which nodes are deactivated.Hence the subsequent activations in V depend on how this graph changes, as well as on the evolution of the network, since the queue lengths (and hence the activation rates) change with time as well.
To keep track of this evolution, we defined a randomized greedy algorithm in Section 2. If we run the algorithm once, then it generates a specific path of activating nodes in V .This is enough to determine the leading order of the transition time as r → ∞, since it only depends on the maximum least degree d * , which is the same for all the admissible paths.Moreover, given d * , we can immediately determine whether we are in the subcritical, the critical or the supercritical regime.If we are interested in the pre-factor of the mean transition time and in its law, then we need to generate all the admissible paths.Theorem 3.2 shows that we can split the mean transition time into a weighted sum over all the admissible paths of the mean nucleation times associated with each activation in the paths.Theorem 3.3 gives the mean transition time conditional on the path and shows that the outcome is non-trivial both in the subcritical and the critical regime.Theorem 3.5 gives the law conditional on the path, but fails to capture the critical regime.The reason is that there are intricate dependencies between the subsequent nucleation times along the path.

Nucleation times and queue lengths
In Section 4.1 we introduce the concept of asymptotic independence of forks and we show that in the subcritical and critical regime competing forks can be treated as if they were independent, in the limit as r → ∞ (Proposition 4.1).In Section 4.2 we study the mean and the law of the next nucleation time by using concepts from metastability and results from Section 4.1 (Propositions 4.3 and 4.6 below).In Section 4.3 we show how the queue lengths change depending on which node activates in V (Theorem 4.7 below).Throughout the section, recall the notation discussed in Remark 2.7.

Asymptotic independence of forks
In this section we show that, in the limit as r → ∞, forks can be treated as being independent of each other even when they share some nodes.We introduce the concept of asymptotic independence of forks, which allows us to treat overlapping forks as if they were independent in the limit as r → ∞.We show that the nucleation time of a fork is asymptotically not influenced by the behavior of other forks sharing nodes with it.
In [1] it is shown that, as soon as all the nodes in U of a complete bipartite graph are simultaneously inactive, the first node in V (and subsequently all the others nodes) activate in a very short time interval, negligible compared to the time it takes to deactivate all the nodes in U .Hence, the time it takes for the nodes in U to be all simultaneously inactive is the same as the time it takes to activate the first node in V , up to an error term that is negligible as r → ∞.In our setting, to study the nucleation times of forks it is enough to study the time it takes to deactivate all their respective nodes in U , without considering the set V .Proposition 4.1.[Asymptotic independence] In the subcritical or critical regimes, consider the graph G k , the updated queue lengths Q k−1 and the dk -fork W , where dk is the minimum degree of the nodes in V k .Let {u 1 , . . ., u dk } denote the subset of nodes in U k belonging to fork W .For α < dk , consider the first time t = t(α) ∈ k−1 j=1 τj , k j=1 τj when α nodes {s 1 , . . ., s α } ⊂ {u 1 , . . ., u dk } belonging to fork W are simultaneously inactive.Recall that is the time it takes W to nucleate, conditional on the updated queue lengths Q k−1 , and denote by T Wα the time it takes W to nucleate starting from time t with α nodes inactive, conditional on the updated queue lengths Q(t).The two following statements hold.(i) Starting at time t from a state with α nodes inactive, (ii) Starting at time t from a state with α nodes inactive, Proof.We prove the two statements separately.
(i) We denote by S the event that after time t all the nodes of W that are still active become simultaneously inactive before any of the inactive nodes in {s 1 , . . ., s α } activates again.We know from Theorem 1.2 that the time T S it takes for dk − α nodes to become simultaneously inactive behaves as an exponential random variable with mean of order r β( dk −α−1) as r → ∞, while the time it takes for one of the α inactive nodes to activate is an exponential random variable with mean of order 1/r β .Hence the probability of S is of order r −β( dk −α) = o(1).If S occurs, then clearly T Q(t) Wα = T S , and hence we have On the other hand, if the complementary event S C occurs, then the expected time t ′ for the network to reach the configuration with all the nodes u 1 , . . ., u dk active is o(1), and from there it takes time ] for W to nucleate.Hence since, in the subcritical or critical regime, the queue lengths at time t + t ′ are of the same order as the queue lengths at time k−1 j=1 τj .Putting the two complementary events together, we obtain that r → ∞.
(4.5) (ii) Using the complementary events S and S C , we can write, for all x ≥ 0, since lim r→∞ P 1 U (S) = 0 and, conditional on S C , with high probability the network reaches the initial configuration in a negligible time t ′ after time t.Hence it behaves as if at time t all nodes in U were active.
The above proposition shows that, in the limit as r → ∞, the mean nucleation time of a fork W and its law are not influenced by the fact that some of its nodes are simultaneously inactive at some time.The intuition is that, as r → ∞, the nucleation of a fork is so hard to achieve and takes so long that sharing some nodes with other forks does not help to make the nucleation happen appreciably faster.The network tends to quickly reach the metastable initial configuration with all the nodes in U active, and hence the nucleation time of W can be seen as the time it takes to deactivate all the nodes in U starting from all of them being active.In particular, in case of overlapping forks, the nucleation time of W is not influenced by the behavior of other forks sharing nodes with W .

Next nucleation time
Given the graph G k , consider the next nucleation time from Definition 2.12.When the network activates a node, it activates the node that completes the fastest nucleation among the n k nodes with least degree.We want to find an expression for In Appendix A we show the computations for the mean next nucleation time in the case when the competing forks are independent of each other.Recall that in the subcritical regime we are considering a minimum of nucleation times that are exponential random variables, while in the critical regime we are considering a minimum of nucleation times that follow a truncated polynomial law (see Theorem 1.2).By using Proposition 4.1, we are also able to give explicit asymptotics for the mean next nucleation time without assuming the forks being independent.
Each nucleation of a fork can be seen as a successful escape from a metastable state, which is represented by the initial configuration where the nodes in U k in the fork are active and the node in V k in the fork is inactive.When considering multiple forks, we can view the network as an irreducible Markov process on a state space Ω.The first nucleation can be described by a regenerative process where the Markov process leaves a metastable state x 0 (with all the nodes in U k active) and reaches a set S, which represents the set of states where at least one of the forks of minimum degree has all the nodes in U simultaneously inactive.The set S is rare for the Markov process, in the sense that the fraction of time spent in S is small.Indeed, once the process reaches state S, with high probability the first nucleation happens in time o(1) after one of the nodes of minimum degree in V k activates.Denote by T k x 0 →S the time it takes to go from x 0 to S, and note that [Mean return time to metastable state] For k = 1, . . ., N , suppose that k − 1 nodes in V have already been activated.Then, with high probability the time R x U k it takes for the network G k to reach the configuration with all the nodes in U k active (the metastable state x 0 ) starting from any other configuration x is negligible, i.e., with high probability In particular, let R k−1 U k be the time it takes for the network G k to reach the configuration with all the nodes in U k active starting from the moment the (k − 1)-th node in V activated.Then, with high probability Proof.Recall that at any time t, the activation and deactivation of each node u ∈ U are described by random variables with rates g U (Q u (t)) and 1, respectively.Hence, each node u ∈ U k takes on average one unit of time to deactivate and 1/g U (Q u (t)) to activate.Since in the subcritical and critical regime the queue lengths at any node at any moment are of order r (see Section 4.3 for more details), we can say that 1/g U (Q u (t)) = o(1).Suppose that, at some time t, node u ∈ U k is active and node u ′ ∈ U k is inactive, i.e., X u (t) = 1 and X u ′ (t) = 0. Since lim and there is a finite number of nodes in U k , with high probability, starting from any configuration x, all the nodes in U k will be active on average in o(1).Hence, as r → ∞, We are now ready to state a result for the mean next nucleation time in the subcritical and the critical regime. )

.15)
Proof.By Proposition 4.1, in the limit as r → ∞ we may consider arbitrarily overlapping forks as if they were independent of each other.Therefore the computations for the mean next nucleation time carried out in Appendix A for the case of independent forks can be used for the case of overlapping forks as well.For completeness, in the subcritical regime (I) we offer a proof that uses a different argument, which cannot be used in the critical regime (II) because the queues are changing on scale r over time.
Consider the stationary distribution π of the Markov process mentioned above and recall that x 0 represents the metastable state with all the nodes in U k active.For any u ∈ U k the probability of the set S is given by where S j is the state in which the j-th fork has all its nodes simultaneously inactive.The terms representing multiple forks with all their nodes simultaneously inactive contribute in a negligible way to π(S).Moreover, for j = 1, . . ., n k , since the stationary distribution π(S j ) can be interpreted as the long run proportion of time spent in state S j , for any u ∈ U k we can write (4.17) This proves (4.16).
Using the same type of argument, we can compute hence, by inverting it, we get The proof is completed by using (4.8).
Corollary 4.4.[Pre-factor adjustment] Given the graph G k , conditional on the next activating node of degree dk , where f k is as in (4.13) or (4.15) when a subcritical node or a critical node activates, respectively.
Proof.The claim follows from Proposition 4.3.
In the subcritical regime (I), the queue lengths do not change on scale r and therefore the renewal theory developed in [3] applies, which is tailored to exponential behavior in metastable regimes.In the critical regime (II), however, the queue lengths do change on scale r and [3] does not apply.For details, see Section 4.3.

.23)
Proof.We choose H to be a constant, and without loss of generality set H = 1.We claim that the pair (x 0 , S) satisfies the property Rec(H, h) with h sufficiently small.Indeed, starting from any configuration x ∈ Ω, the network reaches the set {x 0 , S} in a small time which is o(1).
If the starting configuration x is one of the configurations S j , j = 1, . . ., n k , corresponding to the set S, then we are done.Otherwise, by Lemma 4.2, the metastable state x 0 attracts in time o(1) every configuration x for which some forks have some nodes in U inactive.It is therefore immediate that T x→{x 0 ,S} is smaller than H with high probability, which is what we need in order to claim that (4.22) holds when h is sufficiently small.Note that we can let h ↓ 0 as r → ∞.
We recover from Proposition 4.3 that the ratio between H and the mean next nucleation time is sufficiently small.Indeed, ϵ = H/E 1 U [τ k ] ↓ 0 as r → ∞.Hence a straightforward application of [3,Theorem 2.3] allows us to conclude that the law of the next nucleation time divided by its mean is exponential with unit rate as r → ∞.

Updated queue lengths
In this section we analyze in more detail how the mean queue lengths change over time and how they affect the mean nucleation times associated with each step of the algorithm.Since the queue lengths have a good behavior (see Remark 2.5), we will often approximate them by their mean, or vice versa, at the cost of an error term that is negligible as r → ∞.Note that in case of activation of a supercritical node, the queue lengths become of order less than r, but at that point we do not need any more control on their behavior since we know how the transition occurs.
We start with initial queue lengths We are interested in studying how the queue lengths change along a fixed path, depending on which types of forks we encounter at each activation.Fix an admissible path and consider the sequence of nodes activating in V .
Similarly to (3.9), the next nucleation time τk = min where f ′ k depends on f k , on the constants F k sub , F k cr , F k sup (for the three regimes, respectively), and on the updated queue lengths.The following theorem shows how the queue lengths change according to which type of node activates in V .
Recall the notation o(r α ) = o P 1 U (r α ) for α ≥ 0, which refers to a random variable determined by the law P 1 U that goes to 0 in distribution when divided by r α as r → ∞.
(II) β = 1 d * −1 : critical regime.After step k, the mean queue length at any node u ∈ U is where for a critical node v i the coefficient f ′ i is defined in a recursive way as After step k, the mean queue length at any node u ∈ U , if any supercritical node in V has activated, is Proof.We treat the three regimes separately.
(I) β ∈ (0, 1 d * −1 ).All the nodes in V are subcritical, in particular the first node v 1 ∈ V .Then E 1 U [τ 1 ] = o(r) as r → ∞.The mean queue length at any node u ∈ U after node v 1 activates is (recall Section 1.2) which means that after the first activation the mean queue lengths are the same as before, up to an error term o(1).Iterating this reasoning, we conclude that the mean queue lengths remain approximately the same as long as we activate subcritical nodes in V . ( , then the time it takes to nucleate its fork does not influence the mean queue lengths by much, as seen in (I).Without loss of generality, we may therefore assume that v 1 is critical.Then 1 r is of order r.The mean queue length at any node u ∈ U after node v 1 activates is is subcritical, then again the time it takes to nucleate its fork does not influence the mean queue lengths by much.Assume therefore that v 2 is critical.Then the fork requires a nucleation time of order r, namely, The mean queue length at any node u ∈ U after node v 2 ∈ V activates is More generally, we have that, for any node u ∈ U , where the last sum is over all the critical nodes activated up to step k.Each of them contributes with a positive coefficient f ′ i which is given by the recursive relation Note that the coefficients f ′ k introduced in (4.24) are defined for every k = 1, . . ., N , but in the above computations we are only interested in the ones associated with the critical nodes.Note that If the first node v 1 ∈ V is subcritical, then its nucleation time does not influence the mean queue lengths by much, as seen in (I).If v 1 is critical, then the mean queue lengths decrease but remain of order r, as seen in (II).We therefore assume that v 1 is supercritical.
as r → ∞.Indeed, from Theorem 1.2 we know that the mean nucleation time of a supercritical fork is given by the expected time it takes for the queue length to hit zero.This holds for every supercritical node in V and therefore it is true also for E 1 U [τ 1 ].Hence, the mean queue length at any node u ∈ U after node v 1 ∈ V activates is More generally, the mean queue lengths become o(r) as soon as the first supercritical node is activated, independently of what was activated before.Thus, after any step k the mean queue length at any node u ∈ U , if any supercritical node has activated, is In summary, we have shown that if we activate a subcritical node, then we do not change the mean queue lengths at nodes in U by much: they only decrease by a factor o(1).On the other hand, if we activate a critical node, then the mean queue lengths drop significantly, but still remain of order r.Finally, if we activate a supercritical node, then the mean queue lengths become o(r), and remain so during all the successive nucleations.Recall that by Remark 2.5 we can approximate the queue lengths with their mean, hence with the help of (4.24) we know how to relate the mean next nucleation times of the forks to the updated queue lengths after each activation.Hence we know that, once we activate a node that contributes order r to the total mean transition time, we can ignore the contribution of all the previous and all the subsequent subcritical nodes.Once we activate a supercritical node, we can ignore the contribution of all the subsequent nodes, since their queue lengths are o(r).

Analysis of the algorithm
In Section 5.1 we describe how the algorithm acts on an arbitrary bipartite graph.(In Section 2.4 we already illustrated this via an example.)In Section 5.2 we prove the greediness and the consistency of the algorithm.In Section 5.3 we discuss the algorithm complexity.

Recursion
Consider the graph G = G 1 = ((U 1 , V 1 ), E 1 ).The first node activating in V 1 is the one with the least degree, since this requires the least number of nodes in U 1 to become simultaneously inactive.Since the expected time until m nodes in U 1 are simultaneously inactive is of order r 1∧β(m−1) , the first node to activate in V 1 is with high probability v Y 1 such that d(v Y 1 ) = d1 = min v∈V 1 d(v), where d(v) denotes the degree of node v in the graph G 1 .We make the algorithm pick as first node a node v Y 1 with least degree in V 1 .If there are multiple nodes with the same least degree, then the algorithm chooses one of them uniformly at random.If the least degree d1 is such that β( d1 − 1) > 1, then the algorithm chooses a node uniformly at random among all nodes in r → ∞. (5.1) Reasoning as above, we see that the algorithm picks as second node a node v Y 2 with the least number of active neighbors left in G. Consider the bipartite graph If there are multiple nodes with the same least degree, then the algorithm again chooses one uniformly at random.If the least degree d2 is such that β( d2 − 1) > 1, then we choose a node uniformly at random among all nodes in Iterating this procedure until all the nodes in V 1 are active, we find an admissible path.Note that, depending on the choice the algorithm makes at each step, there may be multiple admissible paths.

Greediness and consistency
We first prove Lemma 2.9.After that we prove Propositions 2.10 and 2.11. 1 in path a, so it must be that d k a 1 ,a (w 2 ) ≥ d * a , otherwise the algorithm would choose node w 2 before node w 1 .Say that node w 2 will be activated at step k a 2 > k a 1 in path a.Then, d(w 2 ) ≥ d * a in G.As before, this implies that some of its edges have already been processed with previous forks in path b.Again, at least one of these forks must have nucleated before the fork of w 2 , in path b but not in path a, say, the fork of w 3 .Hence there exists a node w 3 ∈ V such that, at some step . This node has not yet been activated at step k a 2 in path a, nor at step k a 1 , so d k a 1 ,a (w 3 ) ≥ d k a 1 ,a (w 1 ) ≥ d * a , otherwise the algorithm would choose node w 3 before node w 1 .Hence d(w 3 ) ≥ d * a in G.We can iterate this argument.Since there are only N nodes in V , we get a contradiction after we have considered all the nodes.
We are now able to prove the greediness and the consistency of the algorithm.
Proof of Proposition 2.10.By Lemma 2.9, we know that the maximum least degree of an admissible path is the smallest possible.We know that the order of the mean transition time along a path is related to d * and depends on the value of β.Hence, Lemma 2.9 implies that the mean transition time along an admissible path is the shortest possible, in the sense that it has the smallest order of r possible.
Proof of Proposition 2.11.Lemma 2.9 proves equality for any two admissible paths.This leads to the same order of the mean transition time.
Despite the fact that d * does not depend on which admissible path the algorithm generates, its multiplicity does.Fig. 6 shows a graph on which the algorithm can generate two different paths with the same maximum least degree but with different multiplicities.

Algorithm complexity
The randomized greedy algorithm we constructed can be implemented in different ways according to what we want to compute.
• In order to know the leading order of the mean transition time as r → ∞, it is enough to recover the maximum least degree d * from the graph.By Proposition 2.11 we know that d * is the same for all the admissible paths.Hence it is enough to run the algorithm once and, by comparing the value of d * with the value of β, we are able to determine whether we are in the subcritical, the critical or the supercritical regime.
In this case the computational complexity of the algorithm is polynomial in the number of nodes in V , and so the leading order of the mean transition time is quickly determined.More precisely, the algorithm has a complexity of O(|U ||V | 2 ).
• If we are interested in the precise asymptotics of the mean transition time and in its law as r → ∞, then we need to compute the pre-factor of the leading order term.To do so, we need to run the algorithm multiple times, until all the admissible paths are generated, in order to recover all the possible sequences ( dk ) N k=1 and (n k ) N k=1 .A proper approach is to let a (deterministic) depth-first search algorithm run through all the admissible paths and enumerate them.Theorem 3.2 shows that if we know the total mean transition time along each path, then we can recover the mean transition time of the graph.
In this case the computational complexity of the algorithm is factorial in the number of nodes in V , since it depends in a delicate manner on the architecture of the graph.More precisely, the algorithm has a complexity of O(|U ||V | 2 |V |!).
See [4] for a deeper analysis of the algorithm complexity.

Proofs of the main theorems
The aim of this section is to prove the theorems in Section 3. In Section 6.1 we introduce some further definitions.In Section 6.2 we prove Lemmas 2.4 and 2.15.In Section 4.2 we prepare for the proof of the main theorems (Propositions 4.3 and 4.6 below).In Sections 6.3-6.5 we prove Theorems 3.2, 3.3 and 3.5, respectively.Throughout the section, recall the notation discussed in Remark 2.7.

Preparatory results
Consider an arbitrary bipartite graph G = ((U, V ), E) with |V | = N and let v 1 , . . ., v N be the nodes in V .The activation path that the network follows is denoted by v * 1 , . . ., v * N , while the indices of the nodes that the algorithm picks are denoted by Y 1 , . . ., Y N (as in Definition 2.1).We want to study the transition time when the network follows an admissible path.When conditioning the network on a specific activation order, we can write in the sense that saying that the k-th index Y k chosen by the algorithm equals i is equivalent to saying that the k-th node v * k activating in the network equals v i .
Definition 6.1.[Iteration graph] For k = 1, . . ., N , suppose that k − 1 nodes in V have already been activated.Denote by • V k ⊆ V , the set of nodes in V that have not been activated yet, i.e., V k = V \{{v Y i } 0<i<k }.
• U k ⊆ U , the set of nodes in U that are not neighbors of any of the nodes in V that have already been activated, i.e., Let dk be the minimum degree of the nodes in V k and n k be the number of least degree forks in G k .
where we use that β ′ > β + 1, and K, K ′ are positive constants.After the first competition, the winner of subsequent competitions at times t > t v is determined by the minimum of two random variables (inactivity periods) with rates g U (Q u (t)) and g V (Q v (t)).Note that the queue lengths in U are always of order r, except when we are in the supercritical regime.
In this regime we are not interested in the competition between u and v anymore, since we know how long the transition takes.The queue lengths in V start being of order r, increase while u is active and decrease when v is active, but remain always of order r.Indeed, in order for the queue lengths in V to become o(r) at least time γ ] which is always smaller by (1.10).Hence, in every subsequent competition v activates first, with high probability, since the probability of u winning is always o 1 r .In the worst case scenario, nodes u and v compete with each other for the duration of the transition, hence, by (6.4), order r times.The probability of u winning at least one competition is r o 1 r = o(1).Hence, with high probability, node u will never win any competition against node v and will remain blocked for the duration of the transition.
(I) β( dk − 1) < 1. Recall from Theorem 1.2 that the expected time it takes for node v to nucleate, which is asymptotically equal to the time it takes for its dk neighbors in U to become simultaneously inactive, is of order r β( dk −1) and satisfies Similarly, the expected time it takes for node w to nucleate is asymptotically equal to the time it takes for its d k (w) neighbors in U to become simultaneously inactive: it is of order r β(d k (w)−1) ≻ r β( dk −1) if β(d k (w) − 1) < 1, and of order r ≻ r β( dk −1) if β(d k (w) − 1) ≥ 1.It also satisfies with P(x) ↑ 1 when x ↓ 0 and P(x) ↓ 0 when x ↑ ∞.Note that, in the limit as r → ∞, the forks of v and w can be treated as independent of each other.Indeed, in Section 4.1 we proved that the nucleation time of a fork is asymptotically not influenced by the behavior of other forks sharing nodes with it.
For any given value of t, (6.8)By (6.6)-(6.7),when choosing ] and taking the limit r → ∞, the right-hand side tends to 0, since t ]. (II) β( dk −1) = 1.As before, we know the law of the nucleation time for the fork of v (critical) and w (supercritical).As shown in [1], with high probability ] tends to 1.Moreover, with high probability any nucleation time of a complete bipartite graph in the critical regime (including the fork of v) is smaller than the transition time of the same graph in the supercritical regime.

Proof: most likely paths
Proof of Theorem 3.2.We prove the three statements separately.
(i) Assuming that the network does not follow the greedy algorithm is equivalent to assuming that at some step k with β( dk − 1) ≤ 1 a node w that does not have a minimum degree is chosen instead of a node v with degree dk .The probability of a group of d > dk nodes being simultaneously inactive before a group of dk nodes is equivalent to the probability of activating w before v, which satisfies lim by Lemma 2.15.Hence, with high probability the network activates nodes in V in a greedy way, as described by the algorithm.By Lemma 2.4, we also know that the nodes in U that have deactivated remain inactive for the duration of the transition process.Consequently, they do not influence any future activation attempt of the nodes in V , whose activation therefore follows the algorithm.In the supercritical regime, we are only interested in the order of activation of the nodes until the first supercritical node, for which the above reasoning still holds.
(ii) Note that the queue lengths Q k depend on the sequence of indices (Y 1 , . . ., Y k−1 ) describing the order of the activating nodes in V .Indeed, we have seen in Section 4.3 that the queue lengths change according to which nodes have already been activated.Moreover, for k > 1, also the probabilities 1 n k depend on the sequence (Y 1 , . . ., Y k−1 ).The reader should keep this in mind while going through the proof.The proof evolves in three steps.

Denote the graph
). (6.10)By Lemma 6.3, when β( d1 −1) ≤ 1, not all the terms in the above sum have positive probability, but only the ones corresponding to forks of minimum degree d1 do (and they all have the same probability).Recall that this probability is 1 n 1 .Also recall that E Q averages over the random values Q 1 , . . ., Q N −1 of the updated queue lengths.We can write the random variable T Q 0 G 1 as sum of three random variables where . The first variable represents the time the network takes to switch the first node on, the second variable represents the time the network takes (after activating the first node) to reach the configuration with all the nodes in U 2 active (see Lemma 4.2), while the third variable represents the transition time of the remaining graph when we take the first activating node out.Note that, by Corollary 4.4, if we condition the network to follow an admissible path with a specific first activating node, then we get (6.12) where f 1 is the factor that arises from considering the minimum of random variables.Also the variable T Q 1 G 2 changes accordingly, but with an abuse of notation we may write it in the same way.Thus, and this holds with high probability due to Lemma 4.2.We want to analyze the latter in a recursive way.The k-th iteration gives (6.14) 2. We can again write the random variable as sum of three random variables where ).By Corollary 4.4, we again have that and also the variable T Q k G k+1 changes accordingly when it is conditioned (again, with an abuse of notation we write it in the same way).The inner conditional expectation in (6.14) can be written as ]1 E(a * ) , r → ∞, (6.17) which holds with high probability due to Lemma 4.2.At each iteration the conditional expectation reduces to a sum of three terms: the first term represents the expected time it takes to switch the following node on (adjusted by a factor that keeps track of the fact that the node activates before the other nodes), the second term represents the expected time the network takes (after activating the previous node) to reach the configuration with all the nodes remaining in U active, while the third term represents the mean transition time of the remaining network when we take the following activating node out.
(I) β ∈ (0, 1 d * −1 ): subcritical regime.Every term in the sum is of order r β(d * −1) = o(r), which means that the significant terms are the ones with dk = d * only.The pre-factors of these terms are given by subcritical forks, and so, for any u ∈ U ,  The last equality is obtained by using (2.4) and (4.26).
(III) β ∈ ( 1 d * −1 , ∞): supercritical regime.Denote by v sc the first supercritical node.We know from (4.30) that, after v sc is activated, the queue lengths become negligible (order o(r)), and the mean transition time is given by the expected time it takes for them to hit zero, i.e., r → ∞. (6.23)

Proof: law of the transition time
Proof of Theorem 3.5.We again distinguish between the three regimes.
(I) β ∈ (0, 1 d * −1 ): subcritical regime.Recall that the significant terms in the sum for the mean transition time are those coming from nodes with degree dk = d * with d * < 1 β + 1.There are m a sub such terms, where m a sub depends on the path a ∈ A, and each term comes with a multiplicative factor f k .We can write the transition time along path a divided by its mean as . (6.24) We know that the law of a sum of independent random variables has a density given by the convolution of their densities.Here the nucleation times and the return times can be considered as independent, since they only depend on the queue lengths, which remain close to the initial value in the subcritical regime.
There are three types of sums in the numerator of the last line of (6.24).The first type of sum has terms of the form τk with k ′ such that dk ′ = d * .As r → ∞, these are the significant terms, since they are of the same order as the mean transition time.For each of them, i.e., for each k ′ , we have where in the second step we use Proposition 4.6.We write the density as x ∈ [0, ∞), (6.26) with S a sub = i : di =d * f i .(6.27) The second type of sum has terms of the form τk with k ′′ such that dk ′′ < d * .As r → ∞, these are negligible, since they are of smaller order than the mean transition time.For each of them, i.e., for each k ′′ , we have x , x ∈ [0, ∞), (6.28) and the density is δ 0 , the Dirac function at 0. The third type is of the form , with k = 2, . . ., N .As r → ∞, these are also negligible, since they are o(1) by Lemma 4.2, and hence their density is also δ 0 .
If we consider X 1 , . . ., X n k to be the nucleation times of independent forks of degree dk , and Z to be the next nucleation time at step k, then we get A.2 Critical regime: polynomial random variables Let X 1 , . . ., X n be i.i.d.polynomial random variables such that Let Z = min{X 1 , . . ., X n }.Then, for t = x E 1 U [X i ], , ∞). (A.7) The density function of Z is (A.10) If we consider X 1 , . . ., X n k to be the nucleation times of independent forks of degree dk , and Z to be the next nucleation time at step k, then we get

Figure 1 :
Figure 1: A random-access network.Each node represents a server with a queue.Packets arrive that require a random service time.

Definition 2 . 3 .
[Transition time along an admissible path] Consider an admissible path a = (v 1 , . . ., v N ) ∈ A. The transition time along path a is denoted by T Q G | A a and it is defined as the transition time of G conditional on the order of activating nodes being as in a and on the initial queue lengths Q.

Lemma 2 .
9. [Comparing maximum least degrees of different paths] Consider two different paths a, b such that a ∈ A is admissible.For k = 1, . . ., N , denote by dk,a and dk,b the minimum degrees at step k in paths a and b.Let d * a = max 1≤k≤N dk,a and d * b = max 1≤k≤N dk,b .Then d * a ≤ d * b .

Theorem 3 .
2 shows how the mean transition time along an admissible path is a sum of terms related to the successive mean nucleation times of complete bipartite subgraphs of G. Theorem 3.3 tells us that, depending on the value of β, this sum reduces to a smaller sum of only a few significant terms.It also tells us how to compute the pre-factors of these terms.Definition 3.4.[Multiplicity of d * ] In the subcritical regime, consider an admissible path a ∈ A and its associated degree sequence ( dk ) N k=1 .Write m a sub to denote the multiplicity of d * in the path a, i.e., m a sub = |{k : dk = d * }|.

Theorem 3 .
5. [Law of the transition time] Consider the bipartite graph G = ((U, V ), E) with initial queue lengths Q 0 = (Q 0 U , Q 0 V ) as in (1.9).The transition time of the graph G given the initial queue lengths Q 0 satisfies the following.(I) β ∈ (0, 1 d * −1 ): subcritical regime.With f k as in (3.12) and m a sub as in (3.19), lim r→∞

Proposition 4 . 3 .
[Mean next nucleation time] Consider the graph G k .Recall that dk is the minimum degree of a node in V k and n k is the number of forks of degree dk in G k .(I) β ∈ 0, 1 dk −1 : subcritical regime.The mean next nucleation time satisfies, for any u .13) (II) β = 1 dk −1 : critical regime.The mean next nucleation time satisfies, for any u ∈ U k ,

Theorem 4 .
7. [Mean updated queue length] Let ( dk ) N k=1 be the sequence of degrees in a fixed admissible path and d * = max 1≤k≤N dk .(I) β ∈ (0, 1 d * −1 ): subcritical regime.After step k, the mean queue length at any node u Proof of Lemma 2.9.The proof is by contradiction.Suppose that d * a > d * b .Denote by d k,a (v) and d k,b (v) the degrees of node v ∈ V k at step k = 1, . . ., N in paths a and b, respectively.Consider the nodew 1 ∈ V such that, at some step k a 1 in path a, d k a 1 ,a (w 1 ) = dk a 1 ,a = d * a .Then d(w 1 ) ≥ d * a in G.On the other hand, in path b, when w 1 is activated at some step k b 1 , it has degree d k b 1 ,b (w 1 ) ≤ d * b .This implies that some of the edges of w 1 (at least d * a − d * b edges) have already been processed via previous forks in path b.At least one of these forks must have nucleated before the fork of w 1 , in path b but not in path a, say, the fork of w 2 .Hence there exists a node w 2 ∈ V such that, at some step k b 2 < k b 1 in path b, d k b 2 ,b (w 2 ) ≤ d * b .This node has not yet been activated at step k a

Figure 6 :
Figure 6: The algorithm may generate the path v 1 , v 2 , v 3 or the path v 3 , v 1 , v 2 with different multiplicity of d * .