Mixing of the Glauber dynamics for the ferromagnetic Potts model

We present several results on the mixing time of the Glauber dynamics for sampling from the Gibbs distribution in the ferromagnetic Potts model. At a fixed temperature and interaction strength, we study the interplay between the maximum degree ($\Delta$) of the underlying graph and the number of colours or spins ($q$) in determining whether the dynamics mixes rapidly or not. We find a lower bound $L$ on the number of colours such that Glauber dynamics is rapidly mixing if at least $L$ colours are used. We give a closely-matching upper bound $U$ on the number of colours such that with probability that tends to 1, the Glauber dynamics mixes slowly on random $\Delta$-regular graphs when at most $U$ colours are used. We show that our bounds can be improved if we restrict attention to certain types of graphs of maximum degree $\Delta$, e.g. toroidal grids for $\Delta = 4$.


Introduction
The Potts model was introduced in 1952 [23] as a generalisation of the Ising model of magnetism. The Potts model has been extensively studied not only in statistical physics, but also in computer science, mathematics and further afield. In physics the main interest is in studying phase transitions and modelling the evolution of non-equilibrium particle systems; see [30] for a survey. In computer science, the Potts model is a test-bed for approximation algorithms and techniques. It has also been heavily studied in the areas of discrete mathematics and graph theory, through an equivalence to the Tutte polynomial of a graph [29], and thereby links to the chromatic polynomial and many other graph invariants. The Potts model and its extensions have also appeared many times in the social sciences, for example in modelling financial markets [28] and voter interaction in social networks [5], and in biology [13].
In graph-theoretic language, the Potts model assigns a weight to each possible colouring of a graph (not necessarily proper), and we are interested in sampling from the distribution induced by the weights. The main obstacle to sampling is that the appropriate normalisation factor, the sum of the weights of all colourings, is hard to compute. To be precise: for a graph G = (V, E), a configuration σ is a function which assigns to each vertex one of q possible colours (also called states or spins). The probability of finding the system in a given configuration σ is given by the Gibbs distribution: where δ(σ i , σ j ) is the Kronecker-δ (taking value 1 if the colours σ i and σ j , assigned to vertices i and j respectively, are the same, and taking value 0 otherwise), β = (kT ) −1 > 0 is the inverse temperature (k is Boltzman's constant and T is temperature), and Z = Z(G, β, J), the partition function, is the appropriate normalisation factor to make this a probability distribution. The strength of the interaction between neighbouring vertices is given by the coupling constant J. If J > 0 then the bias is towards having many edges with like colours at the endpoints; this is the ferromagnetic region. Conversely, if J < 0 then the bias is towards few edges with like colours at the endpoints: the anti-ferromagnetic region. In our treatment we will regard e βJ as a single parameter λ, which we will call the fugacity since this combined parameter plays the same role as the fugacity in lattice gas models. Setting µ(σ) to be the number of monochromatic edges in a colouring σ (that is, µ(σ) = (i,j)∈E δ(σ i , σ j )), we obtain the equivalent formula Z(G, λ) = σ λ µ(σ) .
When q = 1 the evaluation of the partition function is trivial. In almost all other cases it is #P-hard to compute the partition function exactly, and thus there can be no efficient algorithm (running in time polynomial in the size of the underlying graph) if P =NP [16]. Therefore attention has been focused on approximation algorithms. The specific question is: for what classes of graphs and what ranges of q and λ is there a fully polynomial randomised approximation scheme (FPRAS) for the partition function? In the anti-ferromagnetic case, λ < 1, there can be no FPRAS for the partition function unless NP=RP [12], except when q = 1 and at a few special points. For the ferromagnetic region, λ > 1, there is only known to be an FPRAS when q = 2 (the Ising model) for general graphs at any temperature [19]. There is also an FPRAS for the entire ferromagnetic region (no restriction on q) if we restrict the underlying graphs to the class of dense graphs (those having minimum degree Ω(n) [1], or having edge connectivity at least Ω(log n) [21]). In terms of approximation complexity, approximating the partition function of the ferromagnetic Potts model is equivalent to #BIS -the problem of approximating the number of independent sets in a bipartite graph [11]. This puts it in an interesting class of approximation problems not known to be hard, but none of which have been shown to exhibit an FPRAS [6].
A standard approach to approximating the partition function is to simulate Glauber dynamics. In Glauber dynamics the following process is iterated (starting from any given configuration): a random vertex updates its colouring by selecting a colour according to the local Gibbs distribution induced by the current colourings of its neighbours. (This will be formalised in the next subsection.) The distribution on configurations obtained after t steps of Glauber dynamics converges to an equilibrium given by the global Gibbs distribution on the whole graph, as t goes to infinity. The approximation is achieved by simulating the Glauber dynamics for long enough to generate a sample that is distributed with very nearly the equilibrium distribution. This process is Markov chain Monte Carlo sampling (MCMC) [20]. The close link between sampling and approximate counting means that if Glauber dynamics gets sufficiently close to equilibrium in polynomial time (in the size of the graph) then there is an FPRAS for the partition function. In this case the dynamics is said to mix rapidly. Physicists' understanding of phase transitions indicate that at sufficiently high temperature (all other things being equal) Glauber dynamics will mix rapidly, whereas at sufficiently low temperature Glauber dynamics will mix slowly [22]. However, determining the exact range of temperature in which Glauber dynamics mixes rapidly is, in general, open.
In the anti-ferromagnetic case, where it is known that there can be no FPRAS in general, the MCMC technique has still yielded many results approximating the partition function for restricted classes of graph, notably bounded-degree graphs. In the zero temperature limit of the anti-ferromagnetic Potts model only proper vertex colourings have non-zero weight. Thus approximating the partition function is equivalent to approximately counting proper q-colourings of the underlying graph. Jerrum [17] first showed that provided the number of colours is more than twice the maximum degree of the graph then the Glauber dynamics will mix rapidly, also proved independently in the physics community by Salas and Sokal [24]. This result has been followed by numerous refinements gradually reducing the ratio of colours to degree required for rapid mixing: see [10] for a recent survey. In this paper we shall investigate the interplay of the maximum degree ∆ of the graph G and the number of colours q in determining whether the convergence of Glauber dynamics for the ferromagnetic Potts model is fast (rapid mixing) or slow.
Acknowledgements: We are grateful to Ostap Hryniv and Gregory Markowsky for leading us to the generalised Hölder's inequality (and to [9]) for Lemma 3.5. We are also grateful to Mario Ullrich for his helpful comments.

Definitions
Throughout we shall be concerned with discrete-time, reversible, ergodic Markov chains with finite state space Ω. Let M be such a Markov chain with transition matrix P and (unique) stationary distribution π. For ε > 0 and x ∈ Ω, we define where · T V denotes total variation distance between two distributions: that is, for any two probability distributions φ, φ ′ on Ω. We define τ (M, ε) = max x τ x (M, ε). Let G = (V, E) be a graph with n := |V |, and let [q] = {1, . . . , q} be a set of colours. We write Ω = [q] V for the set of (not necessarily proper) q-colourings of G. Fix a constant λ > 1, which is called the fugacity. The Gibbs distribution π = π(G, λ, q) on Ω is given by for all σ ∈ Ω, where µ(σ) denotes the number of monochromatic edges of G in the colouring σ. More precisely, π(σ) = λ µ(σ) /Z, where Z is the partition function The Glauber dynamics is a very simple Markov chain on Ω, with stationary distribution given by the Gibbs distribution. Given a colouring X ∈ Ω, a vertex v ∈ V , and a colour c ∈ [q], let n(X, v, c) denote the number of neighbours of v with colour c in X.
The transition procedure of the Glauber dynamics from current state X t ∈ Ω is as follows: • choose a vertex v of G uniformly at random; • choose a colour c ∈ [q] according to the distribution φ = φ v Xt ; Then X t+1 is the new state. We write M GD = M GD (G, λ, q) for the Glauber dynamics as described above. We say that M GD mixes rapidly if τ (M, ε) is polynomial in log |Ω|, that is, polynomial in n. If τ (M, ε) is exponential in n, then we say that M GD mixes slowly.

Results
Our main results are stated below. In order to keep the presentation simple at this stage, we sometimes postpone giving the explicit relationships amongst constants and mixing times until later, but in each case, we direct the reader to where a more detailed statement can be found.
In Theorem 1.1 we present our first, and simplest, bound on the number of colours, as a function of λ and ∆, that guarantees rapid mixing of Glauber dynamics. Although Theorem 1.1 follows from a standard coupling argument, for completeness we prove it here, as we will need this result later to establish our improved bounds. Theorem 1.1. Let ∆, q ≥ 2 be integers and take λ > 1 such that q ≥ ∆λ ∆ + 1. Then the Glauber dynamics of the q-state Potts model at fugacity λ mixes rapidly for the class of graphs of maximum degree ∆. Theorem 1.1 will be proved in Section 2.2; see Proposition 2.2 for a more detailed statement.
There is some overlap between Theorem 1.1 and a result of Hayes [15,Proposition 14] for q = 2, which was generalised to arbitrary q by Ullrich [27,Corollary 2.14]. Ullrich showed that when the inverse temperature β satisfies β ≤ 2c/∆ for some 0 < c < 1, then the Glauber dynamics is rapidly mixing, with mixing time bounded above by (1−c) −1 n log(nε −1 ). Since λ = e β , the lower bound on q in the statement of Theorem 1.1 can be expressed as Hence our result holds for a wider range of β when q > e 2 ∆ + 1, while if q ≤ ∆ + 1 then Theorem 1.1 does not apply but [27, Corollary 2.14] is valid. For intermediate q, both results apply. In Theorem 1.2 we improve the exponent of λ in the bound, but at the expense of a larger constant. We also show that the exponent achieved is close to the best possible, by proving a corresponding slow-mixing bound for almost all regular graphs of degree ∆.
Theorem 1.2. Fix an integer ∆ ≥ 2. For any η ∈ (0, 1) there are constants c 1 and c 2 (depending on η and ∆), such that for any integer q ≥ 2 and any λ > 1 (i) if q > c 1 λ ∆−1+η then the Glauber dynamics of the q-state Potts model at fugacity λ mixes rapidly for the class of connected graphs of maximum degree ∆; (ii) if q < c 2 λ ∆−1− 1 ∆−1 −η then the Glauber dynamics of the q-state Potts model at fugacity λ mixes slowly for almost all regular graphs of degree ∆ ≥ 3. Theorem 1.2 is proved at the end of the paper: a more detailed statement of Theorem 1.2(i) can be found in Theorem 2.14, while a more detailed statement of Theorem 1.2(ii) can be found in Theorem 4.4. Theorem 1.2(ii) is proved using a conductance argument. It turns out that conductance for the Glauber dynamics is related to the expansion properties of the underlying graph, and so we prove that almost all ∆-regular graphs have the relevant property. This argument alone gives a worse bound than that in Theorem 1.2(ii), but combined with the solution of an interesting extremal problem (proved in Section 3), which we believe may be of independent interest, we are able to obtain the required improvement. Theorem 1.2(i) is proved by first using a coupling argument to prove a rapid-mixing result for block dynamics (a more general form of dynamics than Glauber dynamics) and then using a Markov chain comparison argument to obtain rapid mixing for Glauber dynamics. In proving Theorem 1.2(i), we derive a general combinatorial condition on graphs that guarantees rapid mixing of Glauber dynamics (Theorem 2.4 combined with Corollary 2.13). This condition can be used to improve the bounds of Theorem 1.2(i) for graph classes of maximum degree ∆ with "low expansion". We illustrate this in Theorem 1.3 below with the example of the toroidal grid. (ii) if q > c 4 λ 2+η then the Glauber dynamics of the q-state Potts model at fugacity λ mixes rapidly for the toroidal grid; (iii) if q < c 5 λ 2 2 3 −η then the Glauber dynamics of the q-state Potts model at fugacity λ mixes slowly for almost all regular graphs of degree 4.
In particular, for sufficiently large λ there is a positive integer q such that the Glauber dynamics of the q-state Potts model at fugacity λ mixes rapidly for the toroidal grid, but slowly for almost all regular graphs of degree 4.
The purpose of Theorem 1.3 is illustrative and it is proved at the end of the paper. We note that much sharper bounds for the grid are known than are presented here. It is known that for the infinite 2-dimensional grid, the phase transition occurs at q = (λ−1) 2 [30], and that rapid mixing occurs for finite grids when q > (λ−1) 2 = O(λ 2 ) [22] and Theorem 2.10 of [27]. Corresponding slow-mixing results for the grid also hold; see [3] and Remark 2.11 of [27].
Section 2 contains our results on rapid mixing of Glauber dynamics. Section 3 is devoted to an extremal problem whose solution allows us to obtain improved bounds for our slow-mixing results in Section 4.

Mixing time upper bounds
Our goal in this section is to give good lower bounds on the number of colours needed for the Glauber dynamics to mix rapidly. We begin by describing the notions of coupling and path coupling, which are very useful tools in proving upper bounds on mixing times for Markov chains. In Section 2.2, we apply path coupling directly to the Glauber dynamics of bounded-degree graphs to obtain our first lower bound on the number of colours needed for rapid mixing. In Section 2.3, we consider block dynamics, a more general type of dynamics that can be used to sample from the Gibbs distribution. We give a general lower bound on the number of colours needed for rapid mixing of block dynamics (Theorem 2.3). We illustrate how to apply Theorem 2.3 to bounded-degree graphs in Section 2.4. In Section 2.5, we relate the mixing times of Glauber dynamics to that of the block dynamics and show how this gives various improvements to the bounds obtained in Section 2.2. This enables us, in Theorems 2.14 and 2.15, to prove what is needed for Theorem 1.2 part (i), and Theorem 1.3 parts (i) and (ii). Note that the final proofs of Theorems 1.2 and 1.3 are left until we have all the pieces, at the end of Section 4.

Coupling
The notion of coupling (more specifically path coupling [4]) lies at the heart of our proofs of upper bounds for mixing times. We give the basic setup in this section.
Let M = (X t ) be a Markov chain with transition matrix P . A coupling for M is a stochastic process (A t , B t ) on Ω × Ω such that each of (A t ) and (B t ), considered independently, is a faithful copy of (X t ). Since all our processes are time-homogeneous, a coupling is determined by its transition matrix: given elements (a, b) and Under path coupling, the coupling is only defined on a subset Λ of Ω × Ω. This restricted coupling is then extended to a coupling on the whole of Ω × Ω along paths in the state space Ω. In our setting, we have Ω = [q] V , where V is the vertex set of some fixed graph. For σ, σ ′ ∈ Ω, we write d(σ, σ ′ ) for the number of vertices on which σ and σ ′ differ in colour (that is, the Hamming distance). Define Λ ⊆ Ω × Ω by The key property of Λ required for the path coupling method is that for any σ, σ ′ ∈ Ω, by recolouring the d(σ, σ ′ ) disagree vertices one by one in an arbitrary order, we obtain a path of length d(σ, σ ′ ) from σ to σ ′ , with consecutive elements of the path corresponding to an element of Λ.

Lemma 2.1 (See [8] for example).
Let Ω = [q] V and Λ be as above, with n := |V |, and let M be some Markov chain on Ω. Suppose that we can define a coupling (A, B) → (A ′ , B ′ ) for M on Λ such that for some constant β < 1 and all (A, B) ∈ Λ we have Then by path coupling we may conclude that

Glauber dynamics
Our goal in this subsection is to prove Theorem 1.1. In the subsections that follow, we shall see how we can improve Proposition 2.2 in some special cases, but in Section 4, we shall see that the bound given below is close to best possible, at least in terms of the exponent of λ.
We actually prove the following proposition, which immediately implies Theorem 1.1 but also provides a bound on the mixing time. The proof is a standard coupling calculation.
Proof. Fix (A, B) ∈ Λ and let u be the (unique) vertex which is coloured differently by A and B. We define a coupling (A, B) → (A ′ , B ′ ) as follows: choose a vertex v of G uniformly at random, and obtain A ′ (respectively, B ′ ) by updating the colour of the vertex v in A (respectively, B) according to the distributions is not a neighbour of u (because in both cases, A and B assign the same colours to the neighbours of v and so φ A and φ B are the same distribution). Now assume that v is a neighbour of u, so that φ A and φ B are different distributions. Without loss of generality, we may assume that A(u) = 1 and B(u) = 2. Let a i := n(A, v, i), that is, a i is the number of neighbours of v coloured i by A. Similarly, let b i := n(B, v, i). Note that b 1 = a 1 − 1, b 2 = a 2 + 1 and b i = a i for i = 3, . . . , q. Define and assume without loss of generality that Observe that We give an easy upper bound for g(λ, q) as follows. First, for all a ∈ [∆] q we have The right hand side of the above is increasing in all directions of the form e 1 − e i , where e 1 , . . . , e q is the standard basis for R q . Therefore the right hand side is maximised when using the lower bound on q to obtain the final inequality. Therefore.
Applying Lemma 2.1 completes the proof.

Block dynamics
In this section we begin the analysis of block dynamics in which, at each step, the colours of several vertices (or a block of vertices) are updated. We first present the framework and show general results on block dynamics. In the next subsection we discuss suitable choices of blocks and, in Theorem 2.7, show rapid mixing of block dynamics for certain block systems. As before, let G = (V, E) be a graph, fix λ > 1 and let Each element of S is called a block, and we call S a block system for G. Fix a probability distribution ψ on S. We define a Markov chain M BD = M S,ψ BD (G, λ, q) with state space Ω, which we call the (S, ψ)-block dynamics. We ensure that the new chain also has the Gibbs distribution as its stationary distribution. First we need some more notation.
Given S ∈ S, for c ∈ [q] S and X ∈ Ω we let X (S,c) ∈ Ω be the colouring defined by Let µ X,S (c) denote the number of monochromatic edges in X (S,c) which are incident with at least one vertex of S. Finally, define the distribution φ X,S on [q] S by The transition procedure of the (S, ψ)-block dynamics can now be described. From current state X t ∈ Ω, obtain the new state X t+1 ∈ Ω as follows: • choose S ∈ S according to the distribution ψ; • choose a colouring c ∈ [q] S for S from the distribution φ Xt,S ; The stationary distribution of this chain is the Gibbs distribution on Ω. Theorem 2.3 below gives a sufficient condition on the number of colours for the (S, ψ)block dynamics to be rapidly mixing. The result is stated in terms of three parameters which we now define.
For S ⊆ V , write ∂S for the set of vertices in V \ S that have a neighbour in S. Write s := max S∈S |S| for the size of the largest block in S. Let S ∈ S be a random block chosen according to the distribution ψ. Given v ∈ V , define Our first parameter ∂ + is Let ψ min := min v∈V ψ(v) and define our second parameter Ψ by Given A ⊆ V and X ∈ Ω, write X| A for the colouring X restricted to A. Consider a colouring c ∈ [q] S . A colour used by c is called free with respect to X, S if it appears in X| S but not in X| ∂S . Write f (X, S, c) for the number of free colours in c with respect to X, S. Our third parameter is defined by The parameter µ + is a scaled maximum value of µ X,S (c), ruling out colourings in which every element of S receives a distinct free colour (such colourings will be treated separately in the proof below). Although the definition of µ + gives a dependency on q, in our applications on bounded-degree graphs we can bound µ + independently of q (see Proposition 2.4). Hence we suppress this dependence in our notation.
The following theorem gives a sufficient condition on the number of colours for (S, ψ)block dynamics to be rapidly mixing. We restrict our attention to connected graphs, and to avoid trivialities we do not allow the vertex set V to be a block.
be a connected graph and let S be a block system for G such that V ∈ S. Let ψ be a distribution on S and fix λ > 1. If (where parameters s, ∂ + , Ψ and µ + are as defined above) then the (S, ψ)-block dynamics We remark that for the bound q ≥ (2s) s+1 ∂ + Ψ λ µ + in Theorem 2.3, we expect the constant multiplicative factor (2s) s+1 ∂ + Ψ can be improved; however we have not attempted to do this in order to keep our treatment simple.
We choose S ∈ S at random using the distribution ψ and obtain A ′ (respectively, B ′ ) by updating the colouring of S in A (respectively, B) according to the distribution φ A = φ A,S (respectively, φ B = φ B,S ); this will give a coupling since A and B are updated using the transition procedure of M BD . We choose the joint distribution on (φ A , φ B ) so as to maximise the probability that A ′ | S = B ′ | S . Call this maximised probability p(S, A, B). Observe that p(S, A, B) = 1 if u ∈ ∂S (because A and B assign the same colours to ∂S, so φ A and φ B are the same distribution). For the case that u ∈ ∂S, we uniformly bound p(S, A, B) by setting We claim that for all X ∈ Ω and S ∈ S. If (5) holds then substituting into (4) gives The theorem follows from this, by Lemma 2.1. So it remains to establish (5). Fix X ∈ Ω and S ∈ S. For any colouring c, write Q(c) for the set of colours used by c. Given a colouring c ∈ [q] S , the colour classes of c define a partition P of S into (unordered) nonempty parts. (Here, we think of a partition P of S as a set of nonempty parts {P 1 , . . . , P t } where P i ⊆ S are disjoint and ∪ A∈P A = S.) Let F ⊆ P be the set of colour classes corresponding to colours which are free with respect to X, S (in the given colouring c).
Conversely, we can start from a partition P of S and a subset F of P . Given a set of |P | colours, we can form a colouring of S by assigning a distinct colour to each part of P such that the colour assigned to A ∈ P belongs to [q] \ Q(X| ∂S ) if and only if A ∈ F . Any colouring which can be formed in this way is called a (P, F )-colouring of S. (Such a colouring is uniquely determined by (P, F ) and the map P → [q] which performs the assignment of colours.) Let n(S, P, F ) be the number of (P, F )-colourings of S. By definition of µ + we have The first term corresponds to P = F with |P | = |S|, arising from a colouring c ∈ [q] S in which every vertex in S receives a distinct free colour. We use q |S| as an upper bound for the number of such colourings. For all other values of (S, P, F ) we have the following crude bound: where q 1 = |Q(X| ∂S )| and we recall that all parts must be coloured differently. Substituting gives Now applying the bound on q from the theorem statement gives Z X,S q |S| ≤ 1 + The number of terms in the above sum is at most (2|S|) |S| , since there are at most |S| |S| choices of the partition P and at most 2 |P | ≤ 2 |S| choices of F . Next, note that for any probability distribution ρ on V . In particular, we can take With this choice of ρ, we obtain the bound since ∂S is nonempty for all S ∈ S, as G is connected and V ∈ S. It follows that (2s) s+1 Ψ > 1, and combining this with (6) gives .

Block dynamics for specific examples
In this subsection we illustrate how one can use Theorem 2.3 to obtain rapid mixing results for block dynamics on graphs of bounded degree. In the next subsection, we shall see how these results for block dynamics can be translated into rapid mixing results for Glauber dynamics. In order to build some intuition, we begin by investigating the range of possible values of the parameter µ + . We will need the following notation: given T ⊆ T ′ ⊂ V , we write vol(T, T ′ ) for the set of edges of G that are contained in T ′ and have at least one endvertex in T .
Proposition 2.4. Let G = (V, E) be a graph of maximum degree ∆ and let S be any block system for G. Then If in addition G is regular then Proof. First fix X ∈ Ω and S ∈ S. Given a colouring c ∈ [q] S , let P be the partition of S defined by the nonempty colour classes of c. Define F ⊆ P to be the set of colour classes of c which correspond to a colour which does not appear on X| ∂S . Let A.
Since G has maximum degree ∆, a trivial upper bound on µ X,S (c) is ∆|S|. But note that if a monochromatic edge e is incident to a vertex in A F , then e must have both endpoints in the same part A of F . Thus edges incident to vertices in A F \ A ′ F do not contribute to µ X,S (c) and monochromatic edges incident to vertices in A ′ F are double counted in the trivial bound. Hence Hence the upper bound holds, by definition of µ + .
Next, suppose that G is ∆-regular with X ∈ Ω and S ∈ S. Consider any colouring c ∈ [q] S which assigns a single colour to all of S, and where this is the only colour used in X| ∂S . Then where the last inequality follows because G is regular of degree ∆.
Next we show how to improve the upper bound on µ + given in Proposition 2.4 by choosing our block system more carefully.
Let k ≥ 2 be an integer and let G = (V, E) be a graph with n vertices and with maximum degree ∆. Let Then S is called a k-block system for G. Let ψ be the uniform distribution over S. To apply Theorem 2.3 to the (S, ψ)-block dynamics we will calculate upper bounds on the parameters ∂ + , Ψ and µ + . Clearly |∂S| ≤ ∆k and min{k, |∂S|} ≤ k for all S ∈ S. Hence To compute Ψ, observe first that ψ(v) ≥ 1/n for all v ∈ V as there are n blocks and each vertex belongs to at least one block. Next, observe that ψ ∂ (v) ≤ ∆ k n . Indeed, if v ∈ ∂S u for some u ∈ V then u is at distance at most k from v. Since there are at most ∆ k vertices (excluding v) at distance at most k from v in G, there are at most ∆ k out of n blocks containing u in their boundary. Therefore In order to calculate an upper bound on µ + we first prove a preliminary result. For Then vol(C i , V ) has at least |C i | edges and is disjoint from vol(C j , V ) for all j = i. Thus Next we give an upper bound on the parameter µ + for k-block systems. For k ≥ 2 this bound is a slight improvement on the upper bound given in Proposition 2.4. Lemma 2.6. Let G = (V, E) be a connected graph with n vertices and maximum degree ∆. Fix an integer k ∈ {2, . . . , n − 1} and let S be any k-block system for G. Then Given a colouring c ∈ [q] Sv , let P be the partition of S v defined by the nonempty colour classes of c. Define F ⊆ P to be the set of colour classes of c which correspond to a colour which does not appear on X| ∂Sv . Let and define a F = |A F | and a F = |A F |. Writing µ X,v = µ X,Sv for ease of notation, we have Observe that where the last inequality follows by Proposition 2.5 and noting that δ A F ,Sv = δ F,∅ . Next we claim that for A ∈ P we have To ease notation, write a = |A|. If a = 1, 2 then (11) clearly holds (noting that ∆ ≥ 2 since G is connected). Next, (11) holds for ∆ = 2 since we have |vol(A, A)| ≤ a−1, where the "−1" appears because there is at least one edge leaving A (since G is connected). If a = 3 and ∆ ≥ 3 then |vol(A, A)| ≤ 3 and (a − 1)(∆ − 1) ≥ 4, so (11) holds. For a ≥ 4 and ∆ ≥ 3, we note that |vol(A, A)| ≤ ∆a/2 and check that ∆a/2 ≤ (a − 1)(∆ − 1) holds in this case. This proves the claim, establishing (11). Therefore Combining (9), (10) , and (12), we have Assuming that |F | = k, dividing by k − |F | gives the ratio ∆ − 1 if F = ∅ and gives ∆ − 1 + k −1 if F = ∅. This completes the proof.
Substituting (7), (8) and the result of Lemma 2.6 into Theorem 2.3 gives the following, noting that ψ min ≥ 1 n . Theorem 2.7. Let G = (V, E) be a connected graph with n vertices and maximum degree ∆. Fix an integer k ∈ {2, . . . , n − 1} and let S be a k-block system for G. Let ψ be the uniform distribution on S. Fix λ > 1. If To further illustrate the use of Theorem 2.3 we apply it to the grid. Although our results are not as sharp as those discussed in [27], using the structure of the grid we are able to prove an upper bound on µ + which is close to the lower bound given in Note that the toroidal n-grid has n 2 vertices. The arguments below can be adapted to higher dimensions and to graphs with different grid topologies provided that the graph is locally a grid.
Let S be the set of all r × r subgrids of G, where r ≤ n − 2. Then S is a r 2 -block system: given S ∈ S, write S = S v where v is the vertex at the "top left" corner of S. Let ψ be the uniform distribution on S. To apply Theorem 2.3 we must calculate upper bounds on the parameters.
In order to obtain a tighter bound on µ + we need more information about expansion properties of the grid.
To prove the claim, let us choose T such that |T | is maximised subject to |E(T, T )| ≤ 4t. We may assume that G[T ] is connected or else we can translate components to connect G[T ] without increasing |E(T, T )|. Furthermore, we may assume that T is convex (that is, T is a rectangular subgrid) because if T has any "missing corners" (that is, a vertex outside T with at least two neighbours in T ) then we can add the missing vertex without increasing |E(T, T )|. It is also easy to verify that amongst the rectangles with |E(T, T )| = 4t, the square (with t 2 vertices) has the largest area. This completes the proof of the claim. Now suppose that |T | = t ′ . Using the contrapositive of (15), we have Lemma 2.9. Let G be the toroidal grid with n 2 vertices, and let S be the r 2 -block system consisting of all r × r subgrids of G. Then Proof. Suppose that X ∈ Ω and v ∈ V . For a given c ∈ [q] Sv , let P be the corresponding partition of S v given by the colour classes of c. As usual, let F ⊆ P be the set of colour classes corresponding to colours which do not appear on X| ∂Sv . Recall the notation A F , A F , a F and a F introduced in Lemma 2.6. As in (9) we write µ X,v for µ X,Sv , and find that Combining the three inequalities above, we have For all F with |F | = r 2 , dividing by r 2 − |F | gives the value 2 + 2 r , completing the proof.

Glauber dynamics via Markov chain comparison
We now describe the machinery needed to compare the mixing times of the Glauber dynamics and the block dynamics considered in the previous two subsections. We closely follow [7]. Suppose that M is a reversible, ergodic Markov chain on state space Ω with transition matrix P and stationary distribution π. Let M ′ be another reversible, ergodic Markov chain on Ω with transition matrix P ′ and the same stationary distribution.
We say a transition (x, y) of M (respectively, M ′ ) is positive if P (x, y) > 0 (respectively, P ′ (x, y) > 0); here we allow the possibility that x = y. For every positive transition (x, y) of M ′ , let P x,y be the set of paths γ = (x = x 0 , . . . , x k = y) such that all the x i are distinct and each (x i , x i+1 ) is a positive transition of M. Let P = ∪P x,y , where the union is taken over all positive transitions (x, y) of M ′ with x = y.
An (M, M ′ )-flow is a function f from P to the interval [0, 1] such that for every positive transition (x, y) of M ′ with x = y, we have For a positive transition (z, w) of M, the congestion of (z, w) is defined to be The congestion of the flow is defined to be A(f ) = max A z,w (f ), where the maximum is taken over all positive transitions (z, w) of M with z = w. We now state a result that relates the mixing times of M and M ′ in terms of A(f ).
We will use the following theorem, which is slightly less general than [7, Theorem 10].
Theorem 2.11. [7, Theorem 10] Suppose that M is a reversible ergodic Markov chain with transition matrix P and stationary distribution π and that M ′ is another reversible ergodic Markov chain with the same stationary distribution. Suppose that f is an (M, M ′ )-flow. If M has no negative eigenvalues then for any 0 < δ < 1 2 , we have Now we apply the above theorem to compare the mixing time of the Glauber dynamics and the block dynamics. Write τ (M ′ ) = τ (M ′ , 1 2e ). Lemma 2.12. Let G = (V, E) be an n-vertex graph of maximum degree ∆. Given λ > 1, a positive integer q, a block system S for G with maximum block size s, and ψ a probability distribution on S, write M = M GD (G, λ, q) and M ′ = M S,ψ BD (G, λ, q). Then for all ε > 0 we have τ (M, ε) ≤ 2s q s+1 λ ∆(s+1) τ (M ′ ) n n log (qλ ∆/2 ) + log(ε −1 ) .
Proof. As before, let P and P ′ be the transition matrices of M and M ′ respectively. We note at the outset that both M and M ′ have the Gibbs distribution π as their stationary distribution. It is proved in [14, Section 1.1] that the Glauber dynamics M has no negative eigenvalues, so we may apply Theorem 2.11.
We construct an (M, M ′ )-flow and analyse its congestion. Recall that a transition in M ′ is obtained by starting at some X ∈ Ω = [q] V , selecting S ∈ S at random using the distribution ψ and then updating the colouring of S to some colouring c ∈ [q] S chosen randomly using the distribution φ = φ X,S . The resulting colouring is denoted by X (S,c) . Let h(X, S, c) := ψ(S)φ X,S (c) be the probability that this pair (S, c) is chosen.
Fix an ordering of the vertices of G. For each X ∈ Ω, S ∈ S, and a colouring c ∈ [q] S of S, we define the path γ(X, S, c) from X to X (S,c) as follows: starting from X, consider each vertex v ∈ {u ∈ S : X(u) = c(u)}, one at a time and in increasing vertex order, and change the colour of v from X(v) to c(v). Thus γ(X, S, c) is a path in Ω from X to X (S,c) using positive transitions of M.
Next we bound the congestion of this flow. Let (Z, W ) be a positive transition of M with Z = W . Then the colourings Z and W differ on only one vertex, say v. The path γ(X, S, c) uses the transition (Z, W ) only if v ∈ S and the colourings X and Z differ on a subset of S. Thus we have If X and Z differ on at most s vertices, and hence on at most ∆s edges, then π(X) π(Z) ≤ λ ∆s .
Also, for any positive transition (Z, W ) of M we have Substituting these upper bounds gives Now apply Theorem 2.11 with δ = 1/(2e). For all Z ∈ Ω, we have the crude bound which leads to as claimed.
We would expect that the mixing time for Glauber dynamics should decrease as q increases, but the bound given in Lemma 2.12 becomes worse for larger values of q. However, by combining Lemma 2.12 with Proposition 2.2, we can avoid this problem.
Corollary 2.13. Let G = (V, E) be an n-vertex graph of maximum degree ∆. Given λ > 1, a positive integer q, a block system S for G with maximum block size s, and ψ a probability distribution on S, write M = M GD (G, λ, q) and M ′ = M S,ψ BD (G, λ, q). Then for ε > 0 we have if q ≥ ∆λ ∆ + 1.
We complete this section by applying the previous corollary to the block dynamics results obtained in the previous subsection to obtain rapid mixing results for Glauber dynamics.
Proof. Apply Corollary 2.13 to the mixing time of the block dynamics in Theorem 2.7.
Proof. We apply Corollary 2.13 to the mixing time of the block dynamics in Theorem 2.10.

An extremal problem
In this section, we investigate how large the partition function of a bounded-degree graph can be. We require this result in the next section, where we give bounds on the number of colours below which Glauber dynamics mixes slowly, although the result may be of independent interest. In this section, we allow graphs to have multiple edges, but not loops. For fixed numbers n the number of vertices, m the number of edges, ∆ the maximum degree, λ ≥ 1 the fugacity, and q the number of colours, we define where the maximum is over all graphs G with n vertices, m edges, and maximum degree ∆.
We now describe the class of graphs that will turn out to be extremal for the above parameter. Fix positive integers n, m, and ∆ such that ∆ divides m and m ≤ ∆n/2. Let H(n, m, ∆) = (V, E), where V is a set of n vertices and E is obtained by taking any set of m/∆ independent edges on V and replacing each edge with ∆ multi-edges. Thus H(n, m, ∆) has m edges and maximum degree ∆.
The main result of this section is the following.
Theorem 3.1. If G is an n-vertex graph with m edges and maximum degree ∆, and q ∈ N and λ ≥ 1 are given, then In particular, if ∆ divides m, we have equality above for G = H(n, m, ∆).
This will immediately give us the following corollary.
We begin by giving a brief outline of the proof. Given an n-vertex multigraph G = (V, E), and a uniformly random (non-proper) q-colouring σ of V , let X be the number of monochromatic edges of G in σ. Observe that Z(G, λ, q) = E(λ X )q n . We proceed by decomposing the edges of G into ∆ forests with ⌈m/∆⌉ or ⌊m/∆⌋ edges each. Then we establish that the number of monochromatic edges in a forest with m ′ edges is distributed as X ∼ Bin(m ′ , q −1 ). This allows us to obtain a bound on E(λ X ) and hence prove Theorem 3.1. Lemma 3.3. Let G = (V, E) be a multigraph with n vertices, m edges, and maximum degree ∆. We can find ∆ forests F 1 , . . . , F ∆ on the vertex set V such that each F i has ⌈m/∆⌉ or ⌊m/∆⌋ edges and the edges of F 1 , . . . , F ∆ form a partition of E.
Proof. We begin by disregarding the condition that the forests should have almost equal size. In this case, we can decompose E into ∆ forests by iteratively removing a maximal acyclic set of edges F i from E i = E\(F 1 ∪ · · · ∪ F i−1 ). By removing F i from G i = (V, E i ), we reduce the degree of every non-isolated vertex by at least one, and so, in particular, we reduce the maximum degree of G i by at least one. Thus, after at most ∆ steps there are no edges remaining in the graph, giving a decomposition of E into (the edge sets of) ∆ forests, F 1 , . . . , F ∆ .
We denote by |F i | the size of the forest F i , that is, the number of edges in F i . In order to make these forests of equal size, observe that if |F i | > |F j | + 1, then F i has fewer components than F j , so F i has at least one edge that connects two components of F j . Removing this edge from F i and adding it to F j keeps both F i and F j acyclic, but reduces the imbalance in their sizes. Iteratively applying this operation to any pair of forests of unequal size eventually results in all forests being of size ⌈m/∆⌉ or ⌊m/∆⌋. Proof. It is sufficient to consider the case when F is a tree. For if not, then we can consider the components of F independently, and use the fact that the sum of t independent binomial random variables of the form Bin(m j , p) is a binomial random variable Bin(m 1 + · · · + m t , p). Now assume that F is a tree, and root F at a vertex v 0 . Let v 0 , . . . , v n−1 be any ordering of the vertices in V such that for every i, the parent of v i is a member of {v 1 , . . . , v i−1 }. We generate a uniformly random q-colouring of V by colouring each vertex with a uniformly random colour from [q], independently, in the specified order. Each vertex has probability 1/q of being given the same colour as its parent, independently of all previous choices, and hence each edge has probability 1/q of being monochromatic, independently of all previous choices. Therefore the total number of monochromatic edges satisfies X ∼ Bin(m, q −1 ).
We will also need the following result, which follows from a generalization of Hölder's inequality.
Lemma 3.5. Let (X 1 , . . . , X d ) be a random, R d -valued vector, and suppose there exists a random variable X such that X i ∼ X for all i = 1, . . . , d. Then for all λ > 0 we have Proof. Let Z j = λ X j and p j = d for j = 1, . . . , d. Then the result follows from the generalised Hölder's inequality, which states that (E|Z j | p j ) 1/p j for any random variables Z 1 , . . . , Z d and any p j ≥ 1 such that d j=1 1/p j = 1. (See for example [9].) We are now ready to prove Theorem 3.1.
Proof of Theorem 3.1. By Lemma 3.3, we can decompose the edges of G into ∆ forests F 1 , . . . , F ∆ , such that m i , the number of edges in F i , is either ⌈m/∆⌉ or ⌊m/∆⌋.
Let σ be a random q-colouring of V , and let X i be the number of monochromatic edges of F i in the colouring σ. We know by Lemma 3.4 that X i ∼ Bin(m i , q −1 ). Then µ(σ), the number of monochromatic edges of G in σ, is given by µ(σ) = X 1 + · · · + X ∆ and Z(G, λ, q) = q n E(λ µ(σ) ) = q n E(λ X 1 +···+X ∆ ).

Slow mixing
We have seen in Section 2.2 that for general graphs with maximum degree ∆, the Glauber dynamics mixes rapidly if q ≥ ∆λ ∆ + 1. Some improvements on this were given in Section 2.5. In this section, we shall see that these general bounds cannot be improved by much (in terms of the exponent of λ). We give a bound on the number of colours below which Glauber dynamics almost surely mixes slowly for a uniformly random ∆-regular graph.
The technical tool used for most slow-mixing proofs is conductance. We now introduce the necessary definitions, following [7]. Again, M is a Markov chain with state space Ω, transition matrix P and stationary distribution π. For A, B ⊆ Ω, define  Suppose now that G = (V, E) is an n-vertex graph, λ ≥ 1 is given, and q is a number of colours. By Theorem 4.1, in order to show that M = M GD (G, λ, q) mixes slowly, it is sufficient to show that its conductance Φ M is exponentially small in n.
We will need some more definitions. For i ∈ [q] and σ ∈ Ω, define Next, define the r-shell and r-ball around a colour i as follows: We see that B r (i) is the set of colourings at distance at most r from the all-i colouring, and S r (i) is the set of colourings at distance exactly r from the all-i colouring. To simplify notation, we write B r = B r (1) and S r = S r (1) for the r-ball and r-shell around colour 1.
For an n-vertex graph G = (V, E) and r is a positive integer satisfying r ≤ n/2, we define where e G (S) is the number of edges of G inside S. This quantity is low when the edgeexpansion of r-vertex subgraphs of G is high. We now establish a uniform bound on the conductance of M GD (G, λ, q) which holds when α r (G) and q are sufficiently small. Lemma 4.2. Let λ ≥ 1 and let ∆ ≥ 2 be an integer. Fix κ ∈ 1, ∆ 2 and let β ∈ (0, 1). Suppose that n ≥ β −1 (2 + ∆ log 2 λ) is an integer and let r = ⌊βn⌋. Let G be a ∆-regular, n-vertex graph such that α r (G) ≤ κ. Finally, suppose that q ≥ 2 is an integer which satisfies Then the conductance of the Markov chain M = M GD (G, λ, q) is bounded by Proof. We bound Φ M by estimating Φ M (B r ). Let P be the transition matrix for M and let π be the stationary distribution of M (that is, the Gibbs distribution). We have where the last inequality follows because π(B r ) ≥ 1 2 (assuming that q ≥ 2). Let Z = Z(G, λ, q) be the partition function and write m = ∆n/2 for the number of edges in G. Now π(B r ) ≥ Z −1 λ m since the all-1 colouring belongs to B r . Next we obtain a lower bound on π(S r ).
Suppose that A ⊆ V with |A| = r. Writing E(A) for the set of edges of G inside A, we know that |E(A)| ≤ α r (G)r ≤ κr, and hence The final inequality uses the fact that when λ ≥ 1, the partition function is nondecreasing under the addition of edges. Combining these bounds shows that Φ M ≤ 2 n r λ −(∆−κ)r · Z((r, ⌈κr⌉, ∆), λ, q − 1).
Using Corollary 3.2, we have Here the second inequality uses the fact that q − 1 ≤ λ ∆ (which follows from (16)), and the final inequality follows since κ/∆ ≤ 1 2 as well as the fact that 2 r ≥ 2λ ∆ (by our choice of sufficiently large n). Substituting this into (18) and applying the well-known inequality Now raising both sides of (16) to the power (∆ − κ)/∆ and rearranging shows that Therefore Φ M ≤ 2 √ 2πr 2 −r , as claimed. Let G n,∆ denote the uniform probability space of all ∆-regular graphs on the vertex set [n] = {1, 2, . . . , n}, restricting to n even if ∆ is odd. That is, "G ∈ G n,∆ " means that G is a uniformly chosen ∆-regular graph on the vertex set [n]. In a sequence of probability spaces indexed by n, an event holds asymptotically almost surely (a.a.s.) if the probability that the event holds tends to 1 as n → ∞.
Next, given κ we show how to choose r in order to ensure that with high probability, a random ∆-regular graph G satisfies α r (G) ≤ κ.
Proof. We use the configuration model of Bollobás [2] to construct random regular graphs. In this model, to construct a random ∆-regular graph on n vertices, we take n sets (called buckets) each containing ∆ labelled objects called points. Then we take a random partition P of the ∆n points into ∆n/2 pairs, where each pair is a set of two distinct points. We call P a pairing. By replacing each bucket by a vertex and replacing each pair by an edge between the two corresponding vertices, we obtain a multigraph G(P ), which may have loops and multiple edges. If G(P ) is simple then it is ∆-regular. It has been shown [2] that a random pairing is simple with probability tending to exp (− ∆ 2 −1 4 ) as n → ∞. Let m(2a) denote the number of pairings of 2a points. It is well known that m(2a) = (2a)! a! 2 a . Write [x] a = x(x − 1) · · · (x − a + 1) to denote the falling factorial. Now let P n,∆ denote the uniform probability space on the set of pairings with n buckets, each containing ∆ points. Let B be a fixed set of r buckets. Given a positive integer s, let m B (r, s) be the number of pairings in P n,∆ in which at least s pairs are contained in B. We can obtain an overcount of m B (r, s) in the following way. We first select s pairs within B, in [∆r] 2s s!2 s ways. Then we pair up the remaining ∆n − 2s points in m(∆n − 2s) ways. Hence m B (r, s) ≤ [∆r] 2s s!2 s (∆n − 2s)! (∆n/2 − s)!2 ∆n/2−s = (∆r)!(∆n − 2s)! 2 ∆n/2 s!(∆r − 2s)!(∆n/2 − s)! .
Therefore the probability p(r, s) that a random pairing in P n,∆ has at least s pairs within B is .
This shows that when (19) holds, a.a.s. G ∈ G n,∆ has the property that all subsets of vertices of size r have fewer than κr edges. Now we can easily show that when q is sufficiently small and n is sufficiently large, the mixing time of the Glauber dynamics is slow for almost all ∆-regular graphs.
We conclude this section by proving Theorem 1.2 and Theorem 1.3.