Tight bounds for popping algorithms

We sharpen run‐time analysis for algorithms under the partial rejection sampling framework. Our method yields improved bounds for: the cluster‐popping algorithm for approximating all‐terminal network reliability; the cycle‐popping algorithm for sampling rooted spanning trees; and the sink‐popping algorithm for sampling sink‐free orientations. In all three applications, our bounds are not only tight in order, but also optimal in30 constants.


Introduction
The counting complexity class #P was introduced by Valiant [Val79] to capture the apparent intractability of the permanent. Although exactly evaluating #P-complete problems is a task even harder than solving NP-complete problems [Tod91], efficient approximation algorithms may still exist. This possibility was first exhibited through the relationship between approximate counting and sampling [JVV86], and in particular, using the Markov chain Monte Carlo (MCMC) method [Jer03]. Early successes along this line include efficient algorithms to approximate the volume of a convex body [DFK91], to approximate the partition function of ferromagnetic Ising models [JS93], and to approximate the permanent of a non-negative matrix [JSV04]. More recently, a number of exciting new approaches were developed, utilising a variety of tools such as correlation decay [Wei06,BG08], zeros of graph polynomials [Bar16,PR17], and Lovász local lemma [Moi17].
Partial rejection sampling [GJL17] is yet another alternative approach to approximate counting and exact sampling. It takes an algorithmic Lovász local lemma [MT10] perspective for Wilson's cycle-popping algorithm [PW98] to sample rooted spanning trees. The algorithm is extremely simple. In order to sample from a product distribution conditional on avoiding a number of undesirable "bad" events, we randomly initialise all variables, and re-sample a subset of variables selected by a certain subroutine, until no bad event is present. In the so-called extremal instances, the resampled variables are just those involved in occurring bad events. Despite its simplicity, it can be applied to a number of situations that are seemingly unrelated to the local lemma. Unlike Markov chains, partial rejection sampling yields exact samplers. The most notable application is the first polynomial-time approximation algorithm for all-terminal network reliability [GJ18c].
In this paper we sharpen the run-time analysis for a number of algorithms under the partial rejection sampling framework for extremal instances. We apply our method to analyse cluster-, cycle-, and sink-popping algorithms. Denote by n the number of vertices in a graph, and m the number of edges (or arcs) in a undirected (or directed) graph. We summarise some background and our results below.
• Cluster-popping is first proposed by Gorodezky and Pak [GP14] to approximate a network reliability measure, called reachability, in directed graphs. They conjecture that the algorithm runs in polynomial-time in expectation on bi-directed graphs, which is confirmed by Guo and Jerrum [GJ18c]. It has also been shown in [GJ18c] that a polynomialtime approximation algorithm for all-terminal network reliability can be obtained using cluster-popping. We show a pmax 1−pmax mn upper bound for the expected number of resampled variables on bi-directed graphs, where p max is the maximum failure probability on edges. This improves the previous pmax 1−pmax m 2 n bound [GJ18c]. We also provide an efficient implementation based on Tarjan's algorithm [Tar72] to obtain a faster algorithm to sample spanning connected subgraphs in O(mn) time (assuming that p max is a constant). Furthermore, we obtain a faster approximation algorithm for the all-terminal network reliability, which takes O(mn 2 log n) time, improving upon O(m 2 n 3 ) from [GJ18c]. For precise statements, see Theorem 6 and Theorem 13.
• Cycle-popping is introduced by Propp and Wilson [Wil96,PW98] to sample uniformly rooted spanning trees, a problem with a long line of history. We obtain a 2mn upper bound for the expected number of resampled variables, improving the constant from the previous O(mn) upper bound [PW98]. See Theorem 15. • Sampling sink-free orientations is introduced by Bubley and Dyer [BD97a] to show that the number of solutions to a special class of CNF formulas is #P-hard to count exactly, but can be counted in polynomial-time approximately. Bubley and Dyer showed that the natural Markov chain takes O(m 3 log 1/ε) time to generate a distribution ε-close (in total variation distance) to the uniform distribution over all sink-free orientations. Using the coupling from the past technique, Huber [Hub98] obtained a O(m 4 ) exact sampler. Cohn, Pemantle, and Propp [CPP02] introduced an alternative exact sampler, called sink-popping. They show that sink-popping resamples O(mn) random variables in expectation. We improve the expected number of resampled variables for sink-popping to at most n(n − 1). See Theorem 19.
In all three applications, we also construct examples to show that none of the constants (if explicitly stated) can be further improved. Our results yield best known running time for all problems mentioned above except sampling uniform spanning trees, for which the current best algorithm by Schild [Sch18] is in almost-linear time in m. We refer interested readers to the references therein for the vast literature on this problem. One should note that the corresponding counting problem for spanning trees is tractable via Kirchhoff's matrix-tree theorem, whereas for the other two exact counting is #P-hard [Jer81,PB83,BD97a]. It implies that their solution spaces are less structured, and renders sampling much more difficult than for spanning trees.
The starting point of our method is an exact formula [GJL17, Theorem 13] for the expected number of events resampled, for partial rejection sampling algorithm on extremal instances. Informally, it states that the expected number of resampled events equals to the ratio between the probability (under the product distribution) that exactly one bad event happens, and the probability that none of the bad events happens. This characterisation has played an important role in the confirmation of the conjecture by Gorodezky and Pak [GP14], which leads to the aforementioned all-terminal network reliability algorithm [GJ18c].
When bad events involve only constant number of variables, bounding this ratio is sufficient to obtain tight run-time bound. However, in all three popping algorithms mentioned above, bad events can involve as many variables as m or n, and simply applying a worst case bound (such as m or n) yields loose run-time upper bound. Our improvement comes from a refined expression of the number of variables, rather than events, resampled in expectation. (See (2) and Theorem 3.) We then apply a combinatorial encoding idea, and design injective mappings to bound the refined expression. Similar mappings have been obtained before in [GJL17,GJ18c], but our new mappings are more complicated and carefully designed in order to achieve tight bounds. We note that our analysis is completely different and significantly simpler than the original analysis for cycle-popping [PW98] and sink-popping [CPP02].
Since our bounds are tight, one has to go beyond partial rejection sampling to further accelerate cluster-and sink-popping. It remains an interesting open question whether O(mn) is a barrier to uniformly sample spanning connected subgraphs.

Partial rejection sampling
Let {X 1 , . . . , X n } be a set of mutually independent random variables. Each X i can have its own distribution and range. Let {A 1 , . . . , A m } be a set of "bad" events that depend on X i 's. For example, for a constraint satisfaction problem (CSP) with variables X i (i ∈ [n]) and constraints C j (j ∈ [m]), each A j is the set of unsatisfying assignments of C j for j ∈ [m]. Let var(A j ) be the set of variables that A j depends on.
Our goal is to sample from the product distribution of (X i ) i∈ [n] , conditional on none of the bad events (A j ) j∈ [m] occurring. Denote by µ(·) the product distribution and by π(·) the conditional one (which is our target distribution).
A breakthrough result in algorithmic Lovász local lemma is the Moser and Tardos algorithm [MT10], which simply iteratively eliminates occurring bad events. The form we will use is described in Algorithm 1.
Algorithm 1: Partial rejection sampling for extremal instances Draw independent samples of all variables X 1 , . . . , X n from their respective distributions; while at least one bad event occurs do Find the set I of all occurring A i ; Independently resample all variables in i∈I var(A i ); end return the final assignment The resampling table is a very useful concept of analysing Algorithm 1, introduced by [MT10], and similar ideas were also used in the analysis of cycle-popping [PW98] and sink-popping [CPP02]. Instead of drawing random samples, we associate an infinite stack to each random variable. Construct an infinite table such that each row represents random variables, and each entry is a sample of the random variable. The execution of Algorithm 1 can be thought of (but not really implemented this way) as first drawing the whole resampling table, and whenever we need to sample a variable, we simply use the next value of the table. Once the resampling table is fixed, Algorithm 1 becomes completely deterministic.
In general Algorithm 1 does not necessarily produce the desired distribution π(·). However, it turns out that it does for extremal instances (in the sense of Shearer [She85]).

Condition 1. We say a collection of bad events
In other words, if the collection is extremal, then any two bad events are either disjoint or depend on distinct sets of variables (and therefore independent). It was shown [GJL17, Theorem 8] that if Condition 1 holds, then Algorithm 1 indeed draws from π(·) (conditioned on halting). In particular, Condition 1 guarantees that the set of occurring bad events have disjoint sets of variables.
Condition 1 may seem rather restricted, but it turns out that many natural problems have an extremal formulation. Examples include the cycle-popping algorithm for uniform spanning trees [Wil96,PW98], the sink-popping algorithm for sink-free orientations [CPP02], and the clusterpopping algorithm for root-connected subgraphs [GP14,GJ18c]. In particular, the clusterpopping algorithm yields the first polynomial-time approximation algorithm for the all-terminal network reliability problem. More recently, another application is found to approximately count the number of bases in a bicircular matroid [GJ18a], which is known to be #P-hard to count exactly [GN06]. We refer interested readers to [GJL17,GJ18b] for further applications of the partial rejection sampling framework beyond extremal instances.
A particularly nice feature of partial rejection sampling algorithms for extremal instances is that we have an exact formula [GJL17, Theorem 13] for the expected number of events resampled. This formula makes the analysis of these algorithms much more tractable. However, it counts the number of resampled events. In the most interesting applications, bad events typically involve more than constantly many variables. Using the worst case bound of the number of variables involved in bad events yields loose bounds.
We give a sharper formula for the expected number of variables resampled next. Let T i be the number of resamplings of event A i . Let q i be the probability such that exactly A i occurs, and q ∅ be the probability such that none of (A i ) i∈[m] occurs. Suppose q ∅ > 0 as otherwise the support of π(·) is empty. For extremal instances, [GJL17, Lemma 12] and the first part of the proof of [GJL17, Theorem 13] yield The proof of (1) is via manipulating the moment-generating function of T i . Let T be the number of resampled variables. 1 By linearity of expectation and (1), We note that an upper bound similar to the right hand side of (2) was first shown by Kolipaka and Szegedy [KS11], in a much more general setting but counting the number of resampled events. We will use (2) to derive both upper and lower bounds.
In order to upper bound the ratio in (2), we will use a combinatorial encoding idea, namely to design an injective mapping. For an assignment σ, let wt(σ) be its weight so that wt(σ) ∝ Pr µ (σ). Let Ω A i be the set of assignments so that exactly A i occurs, and Ω 1 := m i=1 Ω A i . Note that (Ω A i ) i∈m are mutually exclusive and are in fact a partition of Ω 1 . Also, let Moreover, let Ω 0 be the set of "perfect" assignments such that none of (A i ) i∈[m] occurs.
Definition 2. For a constant r > 0 and an auxiliary set Aux, a mapping τ : Ω var 1 → Ω 0 × Aux is r-preserving if for any i, any σ ∈ Ω A i and X ∈ var(A i ), where τ ′ is the restriction of τ on the first coordinate.
A straightforward consequence of (2) is the following theorem, which will be our main technical tool.
Theorem 3. If there exists a r-preserving injective mapping τ : Ω var 1 → Ω 0 × Aux for some auxiliary set Aux, then the expected number of resampled variables of Algorithm 1 is at most r |Aux|.

Cluster-popping
Let G = (V, A) be a directed graph with root r. The graph G is called root-connected if there is a directed path in G from every non-root vertex to r. Let 0 < p a < 1 be the failure probability for arc a, and define the weight of a subgraph S ⊆ A to be wt(S) := a∈S (1 − p a ) a ∈S p a . Thus the target distribution π(·) is over all root-connected subgraphs, and π(S) ∝ wt(S).
The extremal formulation by Gorodezky and Pak [GP14] is the following. Each arc e is associated with a (distinct) random variable which is present with probability 1 − p a . A cluster is a subset of vertices not containing r so that no arc is going out. We want all vertices to be able to reach r. Thus clusters are undesirable. However, Condition 1 is not satisfied if we simply let the bad events be clusters. Instead, we choose minimal clusters to be bad events.
More precisely, for each C ⊆ V , we associate it with a bad event B C to indicate that C is a cluster and no proper subset C ′ C is one. Each B C depends on arcs going out of C for being a cluster, as well as arcs inside C for being minimal. In other words, var( There are clearly exponentially many bad events. A description of the algorithm is given in Algorithm 2. Algorithm 2: Cluster-popping Let S be a subset of arcs by choosing each arc a with probability 1 − p a independently; while there is a cluster in (V, S) do Let C 1 , . . . , C k be all minimal clusters in (V, S); Re-randomize all arcs in k i=1 var(B C i ) to get a new S; end return S Condition 1 is met, due to the following observation [GJ18c, Claim 3].
Claim 4. Any minimal cluster is strongly connected.
Since Condition 1 holds, Algorithm 2 draws from the desired distribution over root-connected subgraphs in a directed graph. It was further shown in [GJ18c] that the cluster-popping algorithm can be used to sample spanning connected subgraphs in an undirected graph, and to approximate all-terminal network reliability in expected polynomial time. We will first give an improved running time bound for the basic cluster-popping algorithm.
has an anti-parallel twin in A as well. For an arc a = u → v, let a := v → u be its reversal. Let n = |V | and m = |A|. It was shown that cluster-popping can take exponential time in general [GP14] while in bi-directed graphs, the expected number of resampled variables is at most pmax 1−pmax m 2 n [GJ18c], where p max = max a∈A p a . We will now design a pmax 1−pmax -preserving injective mapping from Ω var 1 to Ω 0 × V × A for a connected and bi-directed graph G, where Ω 0 is the set of all rootconnected subgraphs, and Ω 1 is the set of subgraphs with a unique minimal cluster. Applying Theorem 3 with Aux = V × A, the expected number of resampled variables is pmax 1−pmax mn. Thus a factor m is shaved.
Our pmax 1−pmax -preserving injective mapping ϕ : Ω var 1 → Ω 0 × V × A is modified from the one in [GJ18c]. The main difference is that the domain is Ω var 1 instead of Ω 1 , and thus we need to be able to recover the random variable in the encoding. The encoding is also more efficient. For example, we only record u instead of u → u ′ in [GJ18c] as explained below.
For completeness and clarity we include full details. We assume that the bi-directed graph G is connected so that Ω 0 = ∅, and the root r is arbitrarily chosen. (Apparently in a bi-directed graph, weak connectivity is equivalent to strong connectivity.) For each subgraph S ∈ Ω 1 , the rough idea is to "repair" S so that no minimal cluster is present. We fix in advance an arbitrary ordering of vertices and arcs. Let C be the unique minimal cluster in S and v → v ′ be an arc so that v ∈ C, namely v → v ′ ∈ var(B C ). Let R denote the set of all vertices which can reach the root r in the subgraph S. Since S ∈ Ω 1 , where S fix ∈ Ω 0 is defined in the same way as in [GJ18c] and will be presented shortly. In [GJ18c], the mapping is from S to (S fix , v, u → u ′ ).
We consider the directed acyclic graph (  We verify that S fix ∈ Ω 0 . For any x ∈ R, x can still reach r in (V, S fix ) since the path from Proof. It is easy to verify pmax 1−pmax -preservingness, since flipping arcs leaves its weight unchanged. The only move changing the weight is to add the arc u → u ′ in S fix , which results in at most pmax 1−pmax change.
Next we verify that ϕ is injective. To do so, we show that we can recover S and v → v ′ given S fix , u, and v → v ′ . Clearly, it suffices to recover just S.
First observe that in S fix , u ′ is the first vertex on any path from u to r. Thus we can recover . Namely we can recover U and R. As a consequence, we can recover all arcs in S that are incident with R, as these arcs are unchanged.
What is left to do is to recover arcs in S[U ]. To do so, we need to find out which arcs have been flipped. We claim that H fix is acyclic. Suppose there is a cycle in H fix . Since H is acyclic, the cycle must involve flipped arcs and thus vertices in W . Let has not been flipped and is present in H. However, this contradicts the fact that W is a cluster in H.
To summarize, given S fix , u, and v → v ′ , we may uniquely recover S and thus v → v ′ . Hence the mapping ϕ is injective.
Combining Lemma 5 and Theorem 3 (with Aux = V × A) implies an upper bound of the number of random variables drawn in expectation for Algorithm 2.
Theorem 6. To sample edge-weighted root-connected subgraphs, the expected number of random variables drawn in Algorithm 2 on a connected bi-directed graph G = (V, A) is at most m + pmaxmn 1−pmax , where n = |V | and m = |A|.
Another consequence of Lemma 5 is that we can bound the expected number of resamplings of each individual variable. Denote by T a the number of resamplings of a. Then, by (1), where q C is the probability that under the product distribution, there is a unique minimal cluster C, and q ∅ is the probability that under the product distribution, there is no cluster. Through the pmax 1−pmax -preserving injective mapping ϕ, namely (3), if we fix a, it is easy to see that As the above holds for any a, we have that the expected number of resampling of any variable is at most pmaxn 1−pmax . This is an upper bound for the expected depth of the resampling table.
3.2. An efficient implementation. Theorem 6 bounds the number of random variables drawn. However, regarding the running time of Algorithm 2, in addition to drawing random variables, a naive implementation of Algorithm 2 may need to find clusters in every iteration of the while loop, which may take as much as O(m) time by, for example, Tarjan's algorithm [Tar72]. In Algorithm 3 we give an efficient implementation, which can be viewed as a dynamic version of Tarjan's algorithm. Algorithm 3 is sequential, and its correctness relies on the fact that the order of resampling events for extremal instances does not affect the final output. See [GJ18a, Section 4] for a similar sequential (but efficient) implementation of "bicycle-popping".
Two key modifications in Algorithm 3 comparing to Tarjan's algorithm are: (1) in the DFS, once the root r is reached, all vertices along the path are "set" and will not be resampled any more; (2) the first output of Tarjan's algorithm is always a strongly connected component with no outgoing arcs. Such a component (if it does not contain the root r) is a minimal cluster and will immediately be resampled in Algorithm 3.
We first observe that in Algorithm 3, index − 1 is always the number of variables that have already been indexed. Furthermore, inside the function "Dynamic-DFS" of Algorithm 3, for any v ∈ V , if v.root is defined, then v can reach the vertex indexed by v.root. This can be shown by a straightforward induction on the recursion depth, and observing that resampling in any Dynamic-DFS(v) will not affect arcs whose heads are indexed before v.
Lemma 7. At the beginning of each iteration of the while loop in Algorithm 3, all vertices whose indices are defined can reach the root r, and they will not be resampled any more.
Proof. We do an induction on the number of while loops executed. The base case is trivial as only r is indexed before the first iteration. Suppose we are to execute Dynamic-DFS(v) for some v. By the induction hypothesis, for any w such that w.index < v.index, w can reach the root r. As u.root is non-increasing for any u ∈ V , the only possibility to exit any (recursive) call of Dynamic-DFS(u) is that u.root < u.index. Suppose after finishing Dynamic-DFS(v), u is the lowest vertex that is newly indexed but cannot reach r, and u.root = w.index for some w. Then u can reach w and w.index < u.index. However, the latter implies that w can reach r by the choice of u, which is a contradiction.
In particular, Lemma 7 implies that once the algorithm halts, all vertices can reach the root r, and the output is a root-connected subgraph.
Lemma 8. Inside the function "Dynamic-DFS(v)" of Algorithm 3, if v.root = v.index happens, then a minimal cluster is resampled.
Algorithm 3: Cluster-popping with Tarjan's algorithm Input : A directed graph G = (V, A) with a special root vertex r; Output: R ⊆ A drawn according to π(·); Let R ⊆ A be obtained by drawing each arc a ∈ A with probability 1 − p a independently; r.index ← 1, r.root ← 1, index ← 2; Let S be a stack consisting of only r; , draw each arc a ∈ arc(w) with probability 1 − p a independently, and add them to R; // Resampling step until w = v; Dynamic-DFS(v); end end Proof. Suppose v.index = v.root. We want to show that all vertices indexed after v form a minimal cluster.
Clearly v can reach all vertices in U . For any u ∈ U , if u.root < v.index, then v.root ≤ u.root < v.index, contradicting to our assumption. Thus u.root ≥ v.index for all u ∈ U . Let u 0 ∈ U be the lowest indexed vertex that cannot reach v. Since the recursive call of u 0 has exited, v.index ≤ u 0 .root < u 0 .index. Thus u 0 .root is a vertex in U and can reach v due to the choice of u 0 . It implies that u 0 can reach u 0 .root and thus v, a contradiction. Hence, all of U can reach v, and thus U is strongly connected.
If there is any arc in the current R going out from U , say u → w for some u ∈ U , then it must be that w.index < v.index. However, this implies that u.root ≤ w.index < v.index, contradicting to u.root ≥ v.index for all u ∈ U . This implies that there is no arc going out from U . Thus, U is a cluster and is strongly connected, and it must be a minimal cluster. Now we are ready to show the correctness and the efficiency of Algorithm 3.
Theorem 9. The output distribution of Algorithm 3 is the same as that of Algorithm 2, and the expected running time is O m + pmaxmn 1−pmax .
Proof. By Lemma 7 and Lemma 8, in Algorithm 3, only minimal clusters are resampled and the halting rule is when no minimal cluster is present. In other words, the resampling rules are the same for Algorithm 3 and Algorithm 2, except the difference in orderings. Given a resampling table, our claim is that the resampled variables are exactly the same for Algorithm 3 and Algorithm 2, which leads to an identical final state. The claim can be verified straightforwardly, or we can use a result of Eriksson [Eri96]. If any two minimal clusters are present, then they must be disjoint (Condition 1), and thus different orders of resampling them lead to the same state for a fixed resampling table. This is the polygon property in [Eri96], which by [Eri96, Theorem 2.1] implies the strong convergence property. The latter is exactly our claim. Regarding the running time of Algorithm 3, we observe that its running time is linear in the final output and the number of resampled variables. The expected number of resampled variables of Algorithm 3 is the same as that of Algorithm 2 due to the claim above. Thus, Theorem 6 implies the claimed run-time bound.
We can further combine Algorithm 3 with the coupling procedure [GJ18c, Section 5] to yield a sampler for edge-weighted spanning connected subgraphs in an undirected graph, which is key to the FPRAS of all-terminal network reliability. The coupling performs one scan over all edges. Thus we have the following corollary of Theorem 9.

A tight example.
In this section, we present an example that the expected number of random variables drawn in Algorithm 2 (and thus Algorithm 3) is 2m + (1 − o(1)) pmn 1−p , where p a = p for all a ∈ A. Thus, the bound in Theorem 6 is tight.
Our example is the bi-directed version of the "lollipop" graph, where a simple path P of length n 1 is attached to a clique K of size n 2 . A picture is drawn in Figure 1. The main tool is still the formula (2). We have constructed an injective mapping ϕ : Ω var 1 → Ω 0 × V × A. Thus, to derive a lower bound, we just need to lower bound the weighted ratio between ϕ (Ω var 1 ) and Ω 0 × V × A. The main observation is that for most S ′ ∈ Ω 0 , the tuple (S ′ , u, v → v ′ ) ∈ ϕ (Ω var 1 ) as long as u ∈ P is not the right endpoint of P , and v → v ′ is an arc in K. We may choose n 1 and n 2 so that |P | = n 1 = (1 − o(1))n and the number of arcs in K is n 2 (n 2 − 1) = (1 − o(1))m. The bound in Algorithm 2 is tight with this choice.
For concreteness, let n 1 = ⌈n 1+ε 2 ⌉ for some 0 < ε < 1 and consider n 2 → ∞. Then the number of vertices is n = n 1 + n 2 = (1 + n −ε 2 )n 1 and the number of arcs is m = 2n 1 + n 2 (n 2 − 1) = (1 + o(1))n 2 2 . Let the root r be the leftmost vertex of P , and r c be the vertex where P and K intersect. A subgraph S ′ ∈ Ω 0 must contain a directed path from r c to r, as well as a root-connected subgraph in K with root r c . For any constant p, since K is a clique, with high probability, a random subgraph (by drawing each edge independently with probability 1 − p) in K is strongly connected. 2 Let Ω ′ 0 be the set of subgraphs which contain a directed path from r to r c and strongly connected inside K. Clearly the total weight of Ω ′ 0 is 1 − o(1) of that of Ω 0 .
2 This fact is easy to prove. Recall that the analogue connectivity threshold for Erdős-Rényi random graph is p = log n n .
For each S ′ ∈ Ω ′ 0 , u ∈ P \ {r c }, and v → v ′ an arc in K, it is straightforward to verify that the "repairing" procedure in Lemma 5 goes through. We first remove the arc u ′ → u, where u ′ is the next vertex on P , and call the vertex set that cannot reach the root U . Since S ′ ∈ Ω ′ 0 , if we contract strongly connected components in S ′ [U ], it must be a directed path. Flip all arcs in S ′ [U ] to get S. The subgraph S[U ] must have the same collection of strongly connected components as S ′ [U ], and the contracted graph is a directed path in the reverse direction. Clearly there is a unique sink, namely a minimal cluster in S[U ], which is the clique K with possibly a few vertices along the path P . Hence v → v ′ is in the unique minimal cluster of S.
Remark. If we set ε = 0, then the running time becomes Ω(n 3 ). However, the optimal constant (measured in n 3 ) is not clear.
An interesting observation is that the running time of Algorithm 1 depends on the choice of r in the example above, although in the reliability approximation algorithm, r can be chosen arbitrarily. (See [GJ18c, Section 5].) However, choosing the best r does not help reducing the order of the running time. In a "barbell" graph, where two cliques are joined by a path, no matter where we choose r, there is a rooted induced subgraph of the same structure as the example above, leading to the same Ω(n 3 ) running time when ε = 0.

3.4.
Faster reliability approximation. The main application of Algorithm 2 is to approximate the network reliability of a undirected graph G = (V, E), which is the probability that, assuming each edge e fails with probability p e independently, the remaining graph is still connected. Let p = (p e ) e∈E be the vector of the failure probabilities. Then the reliability is the following quantity: The approximate counting algorithm in [GP14,GJ18c] takes O(n 2 /ε 2 ) samples of spanning connected subgraphs to produce a 1 ± ε approximation of Z rel . However, we can rewrite (4) as a partition function of the Gibbs distribution. Thus we can take advantage of faster approximation algorithms, such as the one by Kolmogorov [Kol18].
Let Ω be a finite set, and the Gibbs distribution π(·) over Ω is one taking the following form: where β is the temperature, H(X) ≥ 0 is the Hamiltonian, and Z(β) = X∈Ω exp(−βH(X)) is the normalising factor (namely the partition function) of the Gibbs distribution. We would like to turn the sampling algorithm into an approximation algorithm to Z(β). Typically, this involves calling the sampling oracle in a range of temperatures, which we denote [β min , β max ].
Proposition 11. Suppose we have a sampling oracle from the distribution π β for any β ∈ [β min , β max ]. There is an algorithm to approximate Q within 1 ± ε multiplicative errors using O(q log N/ε 2 ) oracle calls in average. Moreover, the sampling oracle µ β can be approximate as long as µ β − µ β T V = O(1/(q log N )) where · T V is the variation distance.
A straightforward application of Proposition 11 to our problem requires O(m log n) samples. This is because annealing will be done on all edges. Instead, we will choose a spanning tree, and perform annealing only on its edges, whose cardinality is n − 1. This approach uses only O(n log n) samples, but it requires the following slight generalisation of Proposition 11.
Lemma 12. Suppose we have a sampling oracle from the distribution ρ β defined in (5) for any β ∈ [β min , β max ]. There is an algorithm to approximate Q within 1 ± ε multiplicative errors using O(q log N/ε −2 ) oracle calls.
Proof. We claim that we can straightforwardly apply the algorithm in Proposition 11 to get an approximation to Q for ρ β .
To see this, let ℓ ≥ 0 be an integer. Let Ω ′ be the multi-set of Ω where each X is duplicated ⌊ℓF (X)⌋ times. To avoid multi-set, we can simply give each duplicated X an index to make Ω ′ an ordinary set. Consider the following Gibbs distribution over Ω ′ : where Z ′ (β) = X∈Ω ′ exp(−βH(X)). Let We have that Clearly, as ℓ → ∞, δ ℓ → 0. We choose ℓ large enough so that δ ℓ = O(1/q log N ) is within the threshold in Proposition 11 for approximate sampling oracle. Thus, the output of directly applying the algorithm in Proposition 11 with sampling oracle ρ β also yields an ε-approximation to Q ′ := Z ′ (β min ) Z ′ (βmax) . (Note that we do not really run the algorithm on Ω ′ .) Since Q ′ Q → 1 as ℓ → ∞, the output of directly applying Proposition 11 is in fact an ε-approximation to Q.
Suppose we want to evaluate Z rel (p) for a connected undirected graph G = (V, E). Since G is connected, m ≥ n − 1, where m = |E| and n = |V |. Fix an arbitrary spanning tree T of G in advance. Let Let Ω 0 be the set of all spanning connected subgraphs of G, and ρ β (S) be the following distribution over Ω 0 : where β ≥ 0 is the temperature, H(S) := |T \ S| = n − 1 − |T ∩ S| is the Hamiltonian, and the normalising factor Z(β) = S∈Ω 0 exp(−βH(S)) · F (S). For any β ≥ 0, let 0 < p ′ e ≤ p e be the probability such that pe exp(−β) We have that for any S ∈ Ω 0 , To draw a sample from ρ β , we use Corollary 10, for a vector p such that every e ∈ T fails with probability p ′ e , and every other e ∈ T fails with probability p e . To see that this recovers the distribution ρ β , notice that the weight wt(S) assigned to each subgraph S ∈ Ω 0 is Let β min = 0 and β max = ∞. Indeed, β = ∞ corresponds to p ′ e = 0. Hence, ρ ∞ (S) = 0 if and only if T ⊆ S. This condition implies that S ∈ Ω 0 , and Thus, Z(∞) = 1.
On the other hand, and q = log Q ≤ (n − 1) log 1 1−pmax . Clearly, multiplicatively approximating Q is the same as approximating Z rel (p). Moreover, max S∈Ω 0 H(S) ≤ n − 1. Thus, applying Lemma 12, we have an approximation algorithm for Z(β) with O(q log N/ε 2 ) = O(n log n log 1 1−pmax /ε 2 ) oracle calls in expectation. Combining with Corollary 10, we have the following theorem.

Cycle-popping
Cycle-popping [Wil96, PW98] is a very simple algorithm to sample spanning trees in a connected undirected graph G = (V, E). This algorithm actually generates rooted trees, so we will pick a special root vertex r. Of course, there is no real difference between rooted and unrooted spanning trees as the root can be chosen arbitrarily in any given spanning tree.
We consider a slightly more general setting, by giving each edge e a weight w e > 0. For each vertex v ∈ V other than r, we associate a random arc with v, which points to a neighbour of v so that u is chosen with probability proportional to w (u,v) . We denote such an assignment by σ, and σ(v) is the neighbour of v that is pointed at. Any such σ induces a directed graph with n − 1 arcs, where n = |V |. The weight of σ is wt(σ) := v∈V, v =r w (v,σ(v)) , and the target distribution π(·) is π(σ) ∝ wt(σ) with support on the set of σ that induces a spanning tree.
Since we want trees in the end, cycles will be our bad events. For each cycle C, we associate with it a bad event B C which indicates the presence of C. The set var(B C ) consists of random arcs associated with vertices along C. It is clear that if there is no cycle, the graph must be a spanning tree. The number of bad events can be exponentially large, since there can be exponentially many cycles in G. A description is given in Algorithm 4.

Algorithm 4: Cycle-popping
For each vertex v = r, let σ(v) be a neighbour of v with probability proportional to w e independently; while there is at least one cycle under σ do Find all cycles C 1 , . . . , C k under σ; Re-randomize k i=1 var(B C i ); end return the subgraph induced by the final σ Condition 1 is easy to verify. Two cycles are dependent if they share at least one vertex. Suppose a cycle C is present, and C ′ = C is another cycle that shares at least one vertex with C. If C ′ is also present, then we may start from any vertex v ∈ C ∩ C ′ , and then follow the arrows v → v ′ . Since both C and C ′ are present, it must be that v ′ ∈ C ∩ C ′ as well. Continuing this argument we will go back to v eventually. Thus C = C ′ . Contradiction! 4.1. Expected running time. Next we turn to the running time of the cycle-popping algorithm, and define our injective mapping ϕ : Ω var We fix in advance an arbitrary ordering of vertices and edges. Let σ ∈ Ω C ⊆ Ω 1 be an assignment of random arcs so that there is a unique cycle C. Let u be a vertex on the cycle C and suppose σ(u) = u ′ . It is easy to see that there are two components in the subgraph induced by σ: a directed tree with root r, and the directed cycle with a number of directed subtrees rooted on the cycle. Since G is connected, there must be an edge joining the two components. Let this edge be (v 0 , v 1 ), where v 0 is in the tree component and v 1 is in the unicyclic component. Starting from v 1 and following arcs under σ, we will eventually reach the cycle and arrive at u. Let vertices along this path be v 2 , v 3 , . . . , v ℓ = u. (It is possible that ℓ = 1 or u ′ is along the path.) To "fix" σ, we reassign the arrow out of v i from v i+1 to v i−1 for all 1 ≤ i ≤ ℓ (in case of v ℓ = u, it is rerouted from u ′ to v ℓ−1 ), and call the resulting assignment σ fix . Namely, σ fix (v i ) = v i−1 for all i ∈ [ℓ], and σ fix (v) = σ(v) otherwise. It is easy to verify that σ fix ∈ Ω 0 . Now we are ready to define the injective mapping ϕ. For σ ∈ Ω C ⊆ Ω 1 and u ∈ C, let where σ fix and v 0 are defined above, and u ′ = σ(u).
Let r max := max e,e ′ ∈E we w e ′ .
To verify that ϕ is r max -preserving, just notice that directions do not affect weights, and then the only difference between σ and σ fix is the removal of (u, u ′ ) and the inclusion of (v 1 , v 0 ). Thus the change in weights is at most r max .
Combining Lemma 14 and Theorem 3 (with Aux = V × A) implies a run-time upper bound of the cycle-popping algorithm.
Theorem 15. To sample edge-weighted spanning trees, the expected number of random variables drawn in Algorithm 4 on a connected undirected graph G = (V, E) is at most (n − 1) + 2r max mn, where n = |V |, m = |E|, and r max := max e,e ′ ∈E we w e ′ . Remark. It is very tempting to use Aux = V × E instead of V × A in the proof above, which would yield an improvement by a factor 2. Unfortunately that choice does not work. The reason is that, if we use an unordered pair (u, u ′ ), then given σ fix and v 0 , it is not always possible to distinguish u and u ′ . To see this, consider an assignment σ ∈ Ω C , and another assignment σ ′ ∈ Ω C which is the same as σ except that the orientation on C is reversed. It is easy to check that the "fixed" versions of (σ, u) and (σ ′ , u ′ ) are the same if we do not record the direction of (u, u ′ ). We will see next that the factor 2 in fact is unavoidable.

4.2.
A tight example. We also give a matching lower bound to complement Theorem 15. Recall Section 3.3. The general strategy is to construct an example where ϕ(Ω var 1 ) constitutes most of Ω 0 × V × A, and then invoke (2).
We use the undirected version of the same "lollipop" graph as in Section 3.3. (Recall Figure 1.) Namely, a clique K of size n 2 joined with a path P of length n 1 , where n 1 = ⌈n 1+ε 2 ⌉ for some 0 < ε < 1. Consider n 2 → ∞. The number of vertices is n = n 1 + n 2 = (1 + n −ε 2 )n 1 and the number of edges is m = n 1 + n 2 (n 2 −1) 2 = (1/2 + o(1))n 2 2 . Let the root r be the leftmost vertex of P , and r c be the vertex where P and K intersect. Moreover, we put weight w c on all edges in K, and w p ≤ w c on all edges in P .
For a tuple (σ ′ , v 0 , u → u ′ ) to belong to ϕ(Ω var 1 ), the main constraint is that v 0 should be an ancestor of u in the spanning tree induced by σ ′ . In this example, any vertex v ∈ P is an ancestor of any other vertex u ∈ K in an arbitrary spanning tree. Thus, for any σ ′ ∈ Ω 0 , v 0 ∈ P \ {r c }, and u → u ′ where u, u ′ ∈ K, we can apply the "repairing" procedure as given in Lemma 14 to get σ ∈ Ω 1 so that ϕ(σ, u) = (σ ′ , v 0 , u → u ′ ). This is easy to verify, by finding the unique path between v 0 and u, and then reassign σ along the path. Since we remove one edge on the path and include one edge in the clique, wt(σ) = wc wp · wt(σ ′ ) = r max wt(σ ′ ). Thus, by (2), the expected number of random variables drawn of Algorithm 4 is at least n − 1 + r max · |P \ {r c }| · n 2 (n 2 − 1) = n − 1 + (2 − o(1))r max mn, where the term n − 1 accounts for the initialisation.
Similarly to Section 3.3, the choice of r affects the running time of Algorithm 4. Indeed, in Wilson's algorithm [Wil96,PW98], the root is chosen randomly according to the stationary distribution of the random walk, and with that choice the running time is O(n 2 ) in a lollipop graph (with ε = 0, namely n 1 ≍ n 2 ). However, also similarly to Section 3.3, the running time in a "barbell" graph is still Ω(n 3 ). Once again, the optimal constant measured in n 3 is still not clear.

Sink-popping
In this section, we describe and analyse the sink-popping algorithm by Cohn, Pemantle, and Propp [CPP02]. The goal is to sample a sink-free orientation in an undirected graph. This problem was introduced by Bubley and Dyer [BD97a] as an early showcase of the power of path coupling for Markov chains [BD97b]. This problem was also reintroduced more recently to show lower bounds for distributed Lovász local lemma algorithms [BFH + 16], where the goal is to find, instead of to sample, a sink-free orientation.
The formulation is as follows. In an undirected graph G = (V, E), we associate a random variable a e to indicate the orientation for each edge e ∈ E, and associate a bad event B v for each v ∈ V to indicate that v is a sink. Then var(B v ) = {a e | v is an endpoint of e}, and |var(B v )| = d v where d v is the degree of v. Condition 1 is easy to verify -if a vertex v is a sink, then none of its neighbours can be a sink. A description of the algorithm is given in Algorithm 5.
Algorithm 5: Sink-popping Orient each edge independently and uniformly at random; while there is at least one sink do Re-orient all edges that are adjacent to sinks uniformly at random; end return the final orientation As usual, let Ω v be the set of orientations where v ∈ V is the unique sink, Ω 1 = v∈V Ω v , and Ω 0 be the set of all sink-free orientations. The set Ω var 1 is also defined as usual. The general strategy is once again to "repair" orientations in Ω v . The first step is to associate a path to each v ′ ∈ V such that it can be flipped without creating new sinks, and v ′ is guaranteed not a sink. For each (v, v ′ ) ∈ E, we then flip v ← v ′ , and flip the path if v ′ is a sink now. However, there are a few cases where we cannot recover the original orientations if we simply record v ′ and the other endpoint of the path. For example, if v is along the path, then v ← v ′ is flipped twice, and there is no hope to find out v. There are ways to fix these "special" cases, and it is relatively straightforward to design the mapping if we are happy to hardcode each special case. However, to achieve a tight bound, we will do a more complicated mapping so that the special cases can be detected given the image.
A simple observation is that a graph has a sink-free orientation if and only if no connected component of it is a tree. Moreover, the expected running time of Algorithm 5 on a graph with more than one component is simply the sum of the expected running time of each component. We will assume that G is connected and not a tree, and denote n = |V | and m = |E|. Since G is not a tree, there must be a cycle C in G. Contract the cycle C, and pick an arbitrary spanning tree in the resulting graph. Denote by R ⊆ E this spanning tree combined with the cycle C. Thus |R| = |V | = n and (V, R) is unicyclic, namely (V, R) is composed of C attached with a number of trees. Define depth(v) by the distance from v to C in (V, R), and depth(v) = 0 if v ∈ C.
Fix an arbitrary ordering of all vertices and edges of G.
We say a vertex is good (under an orientation σ) if it has at least two outgoing arcs in (V, R).
Lemma 16. Let σ ∈ Ω v be an orientation with the unique sink v. Restricted to the unicyclic subgraph (V, R), there must be a good vertex u such that depth(u) < max{1, depth(v)}. In particular, if depth(u) = depth(v) = 0, u can be chosen so that it has two outgoing arcs in C.
Proof. First we prune all vertices below v and other trees not containing v attached to C. Call this remaining graph H v , which is still unicyclic and has as many edges as its vertices.
The total out-degree is same as the number of vertices in H v . Thus, there must be a vertex u with out-degree at least 2. By construction u satisfies the other requirements as well.
• If (v, v ′ ) ∈ R, then let u be the good vertex in Lemma 16 that is closest to v ′ .
• If (v, v ′ ) ∈ R, then consider whether v ′ is a sink in the pruned subgraph H v ′ of (V, R) defined as above.
-If v ′ is a sink in H v ′ , then we apply Lemma 16 to H v ′ and sink v ′ . Choose the closest good vertex to v ′ . Note that in this case, if u is on C, then it must have at least one outgoing arc in C and not along the path between u and v ′ . -If v ′ is not, then we choose u = v ′ . All ties are broken according to the ordering chosen a priori. Observe that if u = v ′ , then depth(u) < max{1, depth(v ′ )}.
We "repair" σ by flipping a path between u and v ′ in (V, R). If u and v ′ are both on C, then we choose the one that does not contain v. (If neither contains v, then we pick the shortest one. Further ties are broken according to the ordering chosen a priori.) Otherwise we simply choose the shortest path between u and v ′ in (V, R). (Again, ties are broken according to the ordering chosen a priori.) Denote this path by v ℓ = u, v ℓ−1 , . . . , v 1 , v 0 = v ′ where ℓ ≥ 0. After flipping the path, we further flip the edge (v, v ′ ) and denote the resulting orientation by σ fix .
We claim that σ fix ∈ Ω 0 . If ℓ = 0, then v ′ has at least two outgoing arcs and only v ← v ′ is flipped. The claim holds. Otherwise ℓ ≥ 1 and none of u = v ℓ , . . . , v 2 can be a sink under σ fix . For u, this is because it is good under σ, and only one of its adjacent edges is flipped. For v ℓ−1 , . . . , v 2 , they cannot have two outgoing arcs under σ since u is the closest good vertex to v ′ . Thus after flipping, at least one of their adjacent edges along the path is still outgoing. For v 1 , v 0 and v, there are two cases: (1) if v 1 = v, then v 1 cannot be the sink either due to the same reasoning above. Moreover, Hence v ′ and v are not sink either; (2) otherwise v 1 = v. In this case v ← v ′ is flipped twice. It must be that ℓ ≥ 2, and thus v is not a sink as v 2 ← v = v 1 under σ fix . No orientation of an adjacent edge of v ′ is changed, so it is still not a sink. All other vertices are unchanged between σ and σ fix . Thus, σ fix ∈ Ω 0 .
We make a simple observation first.
Lemma 17. Let v, u, σ fix be given as above. If depth(v) = depth(u) = 0, then u must have at least one outgoing arc in C under σ fix .
Proof. By our choice of u, it has either two outgoing arcs in C under σ (when u is chosen directly using Lemma 16), or one outgoing arc in C and one pointing towards v ′ (when u is chosen by applying Lemma 16 to H v ′ ). After repairing, σ fix only flips one of the adjacent edge of u, leaving the other outgoing arc unchanged. In particular, in the latter case, the one flipped is not in C.
The main technical lemma is the following.
Proof. Since there is no weight involved, ϕ is 1-preserving. To verify the injectivity, we just need to recover (σ, v ′ ) from (σ fix , v 1 , v 2 ). We need to figure out whether (σ, v ′ ) is special first. There are a few cases: • First we check if v 1 is a sink under σ fix in (V, R). This happens if and only if (v, v ′ ) ∈ R, v 1 = v, and (σ, v ′ ) is not special. • We may now assume (v, v ′ ) ∈ R and v 1 is not a sink. If v 1 = v 2 , then it must be a special case. Otherwise no matter whether depth(v ′ ) = depth(v) ± 1 or depth(v ′ ) = depth(v) = 0, depth(u) ≤ depth(v) and depth(u) ≤ depth(v ′ ). Thus, if depth(v 1 ) < depth(v 2 ) or depth(v 1 ) > depth(v 2 ), the one with smaller depth is u, and we can tell whether it is special or not. The remaining case is that depth(v 1 ) = depth(v 2 ) = 0. This case must be special, since if not, v 1 = v and depth(v) = 0. Since (v, v ′ ) ∈ R, either depth(v ′ ) = 1 or depth(v ′ ) = 0. Both cases are special. Contradiction. In summary, we can distinguish the special case and the other one.
If (σ, v ′ ) is not special, then v has a unique outgoing arc under σ fix , which is v → v ′ . We recover v ′ and thus the path between u and v ′ , and it is easy to figure out the rest.
If (σ, v ′ ) is special. We handle the two special cases differently: (1) if depth(v ′ ) = depth(u) = 0, then the path between v ′ and u must goes from v ′ to u under σ fix along C. Moreover, v must be the unique vertex going toward v ′ under σ fix , since we choose the path between v ′ and u not passing v.
(2) if depth(v ′ ) = depth(v) + 1, then the path between v ′ and u is the shortest path, and v must be the ancestor of v ′ in R (namely the first vertex on the path from v ′ to u). In both cases, we can recover the path between v ′ and u as well as the original sink v. Given these information, recovering σ is straightforward.
Lemma 18 directly gives an n 2 upper bound for |Ω var 1 | |Ω 0 | . We can improve it to n(n − 1). For any fixed σ fix , we want to bound the number of pairs of vertices that can be the possible output of ϕ along with σ fix . If v is fixed in the non-special case of (7), then u = v. In the special case, if u is fixed, then in the first special case, v ′ cannot be the vertex u points to along C under σ fix (recall Lemma 17); whereas in the second special case, v ′ cannot be u. Thus Lemma 18 implies that |Ω var 1 | |Ω 0 | ≤ n(n − 1). Then (2) yields the following theorem.
Theorem 19. Let G = (V, E) be a connected graph that is not a tree. The expected number of random variables drawn in Algorithm 5 on G to sample a sink-free orientation is at most m + n(n − 1), where n = |V | and m = |E|.
It is easy to see that Theorem 19 is tight, since on a cycle of length n the upper bound is achieved.

Concluding remarks
Perhaps the most interesting open problem is whether there are faster algorithms to sample root-connected subgraphs or sink-free orientations. For spanning trees, Kelner and Mądry [KM09] has shown that the random walk based approach can be accelerated to O(m √ n), which has kindled a sequence of improvements [MST15, DKP + 17, DPPR17]. This line of research culminates in the almost-linear time algorithm by Schild [Sch18]. It would be interesting to see whether these (such as graph sparsification and fast linear system solver) or other ideas can be used to accelerate cluster-popping and sink-popping as well.