Cycles in Mallows random permutations

We study cycle counts in permutations of 1,…,n$$ 1,\dots, n $$ drawn at random according to the Mallows distribution. Under this distribution, each permutation π∈Sn$$ \pi \in {S}_n $$ is selected with probability proportional to qinv(π)$$ {q}^{\mathrm{inv}\left(\pi \right)} $$ , where q>0$$ q>0 $$ is a parameter and inv(π)$$ \mathrm{inv}\left(\pi \right) $$ denotes the number of inversions of π$$ \pi $$ . For ℓ$$ \ell $$ fixed, we study the vector (C1(Πn),…,Cℓ(Πn))$$ \left({C}_1\left({\Pi}_n\right),\dots, {C}_{\ell}\left({\Pi}_n\right)\right) $$ where Ci(π)$$ {C}_i\left(\pi \right) $$ denotes the number of cycles of length i$$ i $$ in π$$ \pi $$ and Πn$$ {\Pi}_n $$ is sampled according to the Mallows distribution. When q=1$$ q=1 $$ the Mallows distribution simply samples a permutation of 1,…,n$$ 1,\dots, n $$ uniformly at random. A classical result going back to Kolchin and Goncharoff states that in this case, the vector of cycle counts tends in distribution to a vector of independent Poisson random variables, with means 1,12,13,…,1ℓ$$ 1,\frac{1}{2},\frac{1}{3},\dots, \frac{1}{\ell } $$ . Here we show that if 01$$ q>1 $$ there is a striking difference between the behavior of the even and the odd cycles. The even cycle counts still have linear means, and when properly rescaled tend to a multivariate Gaussian distribution. For the odd cycle counts on the other hand, the limiting behavior depends on the parity of n$$ n $$ when q>1$$ q>1 $$ . Both (C1(Π2n),C3(Π2n),…)$$ \left({C}_1\left({\Pi}_{2n}\right),{C}_3\left({\Pi}_{2n}\right),\dots \right) $$ and (C1(Π2n+1),C3(Π2n+1),…)$$ \left({C}_1\left({\Pi}_{2n+1}\right),{C}_3\left({\Pi}_{2n+1}\right),\dots \right) $$ have discrete limiting distributions—they do not need to be renormalized—but the two limiting distributions are distinct for all q>1$$ q>1 $$ . We describe these limiting distributions in terms of Gnedin and Olshanski's bi‐infinite extension of the Mallows model. We investigate these limiting distributions further, and study the behavior of the constants involved in the Gaussian limit laws. We for example show that as q↓1$$ q\downarrow 1 $$ the expected number of 1‐cycles tends to 1/2$$ 1/2 $$ —which, curiously, differs from the value corresponding to q=1$$ q=1 $$ . In addition we exhibit an interesting “oscillating” behavior in the limiting probability measures for q>1$$ q>1 $$ and n$$ n $$ odd versus n$$ n $$ even.


Introduction and statement of main results
Let S n denote the set of permutations of [n] := {1, . . ., n}.For a permutation π ∈ S n the ordered pair (i, j) ∈ [n] 2 is an inversion of π if i < j and π(i) > π(j).We denote the number of inversions of a permutation π by inv(π).For n ∈ N and q > 0, the Mallows distribution Mallows(n, q) samples a random element Π n of S n in such a way that each π ∈ S n has probability proportional to q inv(π) .That is, σ∈Sn q inv(σ) . ( This distribution on S n was introduced in the late fifties by C.L. Mallows [27] in the context of statistical ranking models.It has since been studied in connection with a diverse range of topics, including mixing times of Markov chains [4,11], finitely dependent colorings of the integers [22], stable matchings [2], random binary search trees [1], learning theory [8,34], qanalogs of exchangeability [16,17], determinantal point processes [7], statistical physics [32,33] and genomics [12].
In the special case when q = 1 the Mallows distribution coincides with the uniform distribution on S n .A classical result going back to Gontcharoff [18] and Kolchin [25] states that in this case, for every fixed : where C i (π) denotes the number of cycles of length i in the permutation π, and X 1 , . . ., X are independent and X i is Poisson distributed with mean 1/i for each i = 1, . . ., .In spite of the long history and considerable attention received by the Mallows distribution, until very recently the problem of determining analogues of this result for the Mallows(n, q) distribution with q = 1 seems to have escaped attention.In a recent paper, Gladkich and Peled [15] studied the cycle structure of the Mallows distribution when q = q(n) depends on n and approaches one as n → ∞.Here we will focus instead on the limiting distribution of the cycle counts when q = 1 is fixed and n tends to infinity.
Our first result shows that for 0 < q < 1 each C i (Π n ) has a mean that is linear in n, and that for every fixed , the vector (C 1 (Π n ), . . ., C (Π n )) can be suitably rescaled so that it tends to a jointly normal limiting distribution.
Theorem 1.1.Fix 0 < q < 1 and let Π n ∼ Mallows(n, q).There exist positive constants m 1 , m 2 , . . .and an infinite matrix P ∈ R N×N such that for all ≥ 1 we have As it happens, for q > 1, there is a major difference between the behaviour of even cycles and odd cycles.For even cycle counts we have a result analogous to the previous theorem.Theorem 1.2.Fix q > 1 and let Π n ∼ Mallows(n, q).There exists constants µ 2 , µ 4 , . . .and an infinite matrix Q ∈ R N×N such that for all ≥ 1 we have where N (•, •) denotes the -dimensional multivariate normal distribution and Q is the submatrix of Q on the indices [ ] × [ ].
We will describe the limiting distributions for odd cycles in the case when q > 1 in terms of the bi-infinite analogue of the Mallows distribution that was introduced by Gnedin and Olshanski [17].This is a random bijection Σ : Z → Z, whose distribution we'll denote by Mallows(Z, q).See Section 2 for more discussion and relevant facts.
Throughout the paper r, ρ denote the bijections of Z defined by r(i) := −i and ρ(i) := 1 − i.
The permutations r • Σ and ρ • Σ almost surely have only finitely many odd cycles, as we will see in more detail later on.
Next, we study the properties of the constants m 1 , m 2 , . . .occurring in Theorem 1.1.The first part of the next result gives an interpretation of these constants in terms of the Mallows(Z, q) distribution.
The next result provides similar results for the constants appearing in Theorem 1.2.
Again, combining Part (i) with Theorem 5.1 in [17] gives an expression for µ 2 as an explicit function of q. Figure 2 provides a plot of µ 2 versus q together with the results of computer simulations.We mention that Pitman and Tang ( [30], Proposition 3.3) give a result for so-called regenerative random permutations, that is closely related to parts (i) and (ii) of Theorems 1.4 and 1.5.
Next we provide some results on the asymptotic expected number of 1-cycles when q > 1.For notational convenience let us write where again Σ ∼ Mallows(Z, 1/q) and r, ρ are given by r(i) Theorem 1.6.Let q > 1 and c e , c o as given by (2) and Σ ∼ Mallows(Z, 1/q) .Moreover, as q → ∞ we have We note that Part (i) of the above theorem gives that in particular c e + c o = 1, for all q > 1.Again, Theorem 5.1 in [17] allows us to convert the probabilities given in Part (i) of the above theorem into explicit functions of q.Plots of c e and c o as a function of q, together with the results of computer simulations are shown in Figure 3.
As mentioned previously, when q = 1 we retrieve the uniform distribution on S n , for which the expected number of 1-cycles equals one.So the fact that the limits for q ↓ 1 of c e , c o equal 1/2 is pretty curious.Of course there is no contradiction, since our results apply to the situation where q > 1 is fixed and we send n to infinity.Our results however do suggest something interesting must be going on in the "phase change" when q = q(n) is a function of n that approaches one from above as n → ∞.
Our final (main) result highlights an interesting "oscillating" behaviour in the probability measures corresponding the limit of C 1 (Π 2n ), respectively C 1 (Π 2n+1 ), when q > 1.The probability that Π n has at least m one cycles is a lot larger when the parities of m and n agree than when they don't (for m large but fixed and n → ∞).
Theorem 1.7.For 0 < q < 1 we have, as k → ∞ (The notation g(k) f (k) means that the ratio g(k)/f (k) tends to zero as k → ∞.) Sketches of some ideas used in the proofs.
The proofs of Theorems 1.1 and Theorem 1.2 are adaptations of a proof technique developed by Basu and Bhatnagar [3] to prove a Gaussian limit law for the length of the longest monotone subsequence of a Mallows permutation, and in fact Theorem 1.1 closely follows the original proof.The intuition behind it is that if Π n ∼ Mallows(n, q) with 0 < q < 1 then given that Π n [{1, . . ., j}] = {1, . . ., j} the remainder of the permutation behaves like a Mallows random permutation of length n − j.As it turns out, there will typically be linearly many such j.
A very rough sketch of the argument giving Theorem 1.1 is as follows.If . ., T i } then each cycle must lie completely in {T i−1 + 1, . . ., T i } for some i (setting T 0 = 0).This allows us to show that the cycle counts behave approximately like a stopped two-dimensional random walk.This refers to the situation where (X 1 , Y 1 ), (X 2 , Y 2 ), . . .are i.i.d. and we are interested in τ i=1 Y i where τ := inf{k : Here the X i correspond to T i − T i−1 and Y i counts the number of cycles contained in the interval {T i−1 + 1, . . ., T i }.A convenient result of Gut and Janson [19] allows us to derive that the mentioned sum is approximately Gaussian after suitable rescaling.The same argument applies to arbitrary linear combinations of the cycle counts, so that we can employ the Cramer-Wold device to deduce that the -suitably rescaled -vector of cycle counts is multivariate Gaussian.
The proof of Theorem 1.2 goes along the same lines.Now it turns out that there are linearly many (Almost) every even cycle must then be contained in some set {T i−1 + 1, . . ., T i } ∪ {n + 1 − T i , . . ., n − T i−1 }, and we can adapt the proof strategy that gave Theorem 1.1 to work also here.We mention that Theorem 1.2 can also be proved by starting from the center of the permutation, rather than the sides, see [21].
In the proof of Theorem 1.3, we rely on results of Gnedin and Olshanski [17] that show that "locally", for 0 < q < 1, the finite Mallows permutation resembles the bi-infinite Mallows permutation Σ defined and analyzed in [17].An elementary, but crucial, observation is that Π n = d Mallows(n, q) if and only if r n • Π n = d Mallows(n, 1/q) where r n (i) := n + 1 − i. (See the next section for the explanation.)Note that if n is odd then r n leaves (n + 1)/2 invariant, but when n is even no element of {1. . . ., n} is invariant (n/2 and n/2 + 1 are flipped).For q > 1, the relation with the Mallows(n, 1/q) distribution translates to Π n ∼ Mallows(n, q) being "approximated" by r • Σ with r(i) := −i when n is odd; and ρ • Σ with ρ(i) := 1 − i when n is even.In particular we for instance have that the number of 1-cycles (fixed points) of Π n approximately behaves like the number of i ∈ Z such that Σ(i) = −i when n is odd, and the number of i ∈ Z such that Σ(i) = 1 − i when n is even.
For the proof of Theorem 1.4 we again use that the Mallows(n, q) distribution locally looks like the bi-infinite Mallows model.The main intuition is the elementary observation that the number of i-cycles equals 1/i times the number of points that are in i-cycles.That im i = 1 is then more or less immediate from the observation that, almost surely, all cycles of Σ have finite length.The statements about the limits as q ↓ 0 and q ↑ 1 can be derived by using explicit expressions for the expected number of 1-cycles that follow by combining our work with results of Gnedin and Olshanski [17] and Gladkich and Peled [15].
The idea behind the proof of Theorem 1.5 is very similar, but more technical.When q > 1 then the Mallows(n, q) model is approximated well locally by the composition of two independent bi-infinite Mallows models.
The first part of Theorem 1.6 will follow from the aforementioned fact that, when q > 1, the expected number of 1-cycles is well-approximated by the number of i ∈ Z such that Σ(i) = −i when n is odd, and the number of i ∈ Z such that Σ(i) = 1 − i when n is even.The limit of 1/2 for the expected number of 1-cycles when q ↓ 1 will follow from the relatively elementary observation that 1/q < P(Σ(0) = j + 1)/P(Σ(0) = j) < q for all j ∈ Z.For the other limits we again analyze the various explicit expressions in q.
The proof of Theorem 1.7 is technically involved, but the intuition behind it is relatively easy to explain.When k is large the "most likely" way in which r • Σ will have at least 2k + 1 fixed points is if Σ(−k) = k, . . ., Σ(k) = −k, or some minor perturbation of this configuration.For ρ • Σ the "most likely" way to have 2k + 1 fixed points is something like Σ(−k) = 1 + k, Σ(−k + 1) = k, . . ., Σ(k) = 1 − k, or some minor perturbation of that situation.However, as shown by Gnedin and Olshanski [17], Σ is almost surely balanced : the number of i < 0 with Σ(i) ≥ 0 is finite and equals the number of i ≥ 0 with Σ(i) < 0. This forces the existence of one more i ∈ Z with |Σ(i) − i| = Ω(k) -which makes the probability exponentially smaller.The intuition for 2k fixed points similar.
Remark 1.8.Let us note that Theorems 1.1, 1.2, and 1.3 all hold for a more general class of permutation statistics.
Let w ∈ S n , and suppose that w = w 1 w 2 where w 1 sends [1, i] to itself and [i + 1, n] to itself.Say that a function f : ) for all w = w 1 w 2 decomposing in this way, where f (w) for w a permutation on an interval [i, j] is defined by shifting down the permutation to [1, j − i + 1].Then Theorem 1.1 holds for any (non-trivial) additive function satisfying |f (w)| ≤ Cn k for w ∈ S n , where C and k are constants.The proof is exactly the same, with the key being that f (w) decomposes into a sum of independent pieces in the same way as the number of cycles.The other assumptions are to ensure that the moments are finite, and that the variance is non-zero.
Let w ∈ S n , and suppose that w = w 1 w 2 , where w 1 sends [i, n−i] to itself and w 2 sends [1, i] to [n − i + 1, n] and vice versa.Say that f : for all w = w 1 w 2 decomposing in this way.Then Theorems 1.2 and 1.3 holds for any (nontrivial) anti-additive function satisfying |f (w)| ≤ Cn k for w ∈ S n , where C and k are constants.Again, the proof is the same, writing f (w) as a sum of independent pieces.
Here, non-trivial means that f , when restricted to permutations of size n for which no nontrivial decomposition of the form w = w 1 w 2 exists, is non-constant.This assumption is needed to ensure that the variances β ii and β ii are non-zero.Almost any reasonable function satisfies this, but note in particular that f (w) = n the size of the permutation does not satisfy this.
Note that if f (w) is either additive or anti-additive, then so is f (w −1 ).Thus, the theorems also apply to joint statistics for a Mallows permutation and its inverse, giving another proof of Theorem 1.2 of [20].

Notation and preliminaries
Here we collect some notation, definitions and results from the literature that we will use in our proofs.Throughout the paper we use [n] := {1, . . ., n} to denote the set consisting of the first n natural numbers, and [a, b] := {a, . . ., b} for a < b.If f (n), g(n) are two functions depending on the parameter n, we will use f (n) = o (g(n)) to denote that f (n)/g(n) → 0, we will use ).We will use Bi(n, p) to denote the binomial distribution with parameters n and p and we use Geo(p) to the note the geometric distribution with parameter p.So X ∼ Geo(p) means that for all k ∈ N. We use TruncGeo(n, p) to denote the truncated geometric distribution, truncated at n.That is, if Y ∼ TruncGeo(p) and X ∼ Geo(p) then As is usual in the literature on the Mallows distribution, we denote by Z(n, q) := σ∈Sn q inv(σ) , the denominator in (1).By a standard result in enumerative combinatorics (see Corollary 1.3.13 in [31]) we have An elementary observation is that the indices i, j ∈ [n] form an inversion for π ∈ S n if and only if π(i), π(j) form an inversion for π −1 .In particular inv(π −1 ) = inv(π).
Similarly, letting r n ∈ S n denote the "reversal map" given by r n (i) := n + 1 − i, we have that i, j ∈ [n] are an inversion in π if and only if they are not an inversion in r n • π.The same holds true for π • r n .In other words and hence also As a direct consequence of these observations and the definition of the Mallows probability measure, we have: Corollary 2.1.Let q > 1 and Π n ∼ Mallows(n, q) and let r n be given by r n (i) = n + 1 − i.The following hold.
(i) Π −1 n = d Π n , and; (ii) r n • Π n • r n = d Π n , and; (iii) r n • Π n = d Mallows(n, 1/q), and; (To see the third and fourth parts of the lemma, note that P(r ) is proportional to (1/q) inv(π) , and similarly for Π n • r n .)The last two parts of the corollary provide a way to express the Mallows distribution with q > 1 in terms of the Mallows distribution with 0 < q < 1.We will rely on this a lot in our proofs of the results for q > 1.
For 0 < q < 1, there is an iterative procedure for generating Π n ∈ Mallows(n, q), going back to the work of Mallows [27].We let Z 1 , . . ., Z n be independent with Put differently, having determined Π n (1), . . ., Π n (i − 1), we determine Π n (i) by writing [n] \ {Π n (1), . . ., Π n (s − 1)} in increasing order as {j 1 , j 2 , . . ., j n−s+1 }, and setting set Π n (i) := j Z i .To see that this procedure indeed generates a random element of S n chosen according to the Mallows(n, q) distribution, we can argue as follows.We first note that for each π ∈ S n there is exactly one choice of (k In particular where the symbol ∝ denotes "proportional to", and hides a multiplicative term not depending on k 1 , . . ., k n .We now note that for each i ∈ N we must have In other words, inv(π) = k 1 + • • • + k n − n, which shows that we've indeed sampled according to the Mallows(n, q)-distribution.
There is a natural extension of the Mallows model to random functions Π : N → N, called the Mallows process by some authors.Similarly to the iterative procedure for generating Π n ∼ Mallows(n, q) described above, we let Z 1 , Z 2 , . . .be an infinite sequence of i.i.d.Geom(1 − q) random variables and we iteratively construct an infinite sequence Π of natural numbers by setting We denote the probability distribution of Π generated in this manner by Mallows(N, q) (see also Figure 4).For a non-empty, finite A ⊆ R and a bijection π : A → A we define inv(π) in exactly the same way as for bijections from [n] to itself.The distribution of the random bijection Π A : A → A satisfying  N, q).The red squares indicate the first time that an interval is sent to itself, and the lengths and contents of the red squares are independent and identically distributed.
will be denoted by Mallows(A, q).For σ : B → B a bijection with B ⊆ R and A ⊆ B finite (but B may be infinite) we denote by σ A the bijection we obtain by setting where a (i) is the i-th smallest element of A.
As shown by Basu and Bhatnagar ([3], Lemma 2.1) and independently Crane and DeSalvo ([9], Lemma 5.2), if Π ∼ Mallows(N, q) with 0 < q < 1 and I = {a, . . ., a + n} ⊆ N is a finite "interval" of consecutive integers then Moreover, as is easily seen from the definitions, if Here and in the rest of the paper s denotes the shift map given by i → i + 1 and f (n) denotes the n-fold composition of the function f with itself and We next discuss Gnedin and Olshanski's bi-infinite Mallows model.For every 0 < q < 1, this is a random bijection of Z, whose distribution we will denote by Mallows(Z, q).The work of Gnedin and Olshanski provides several definitions, but all of them are rather involved.So we refer the reader to the original paper [17] for the precise definition and mention only the properties and relevant facts we will be using in what follows.
We will need the following notion of convergence.For a sequence σ 1 : As in [17], we call a permutation π of Z balanced if As noted in [17], we may replace the 1 above by any fixed number in Z and obtain an equivalent definition of balanced permutations.The random permutation Σ ∼ Mallows(Z, q) is almost surely balanced.The q-Pochhammer symbols (a; q) n and (a; q) ∞ are defined as Recall that, thoughout the paper, we will use s, r, ρ to denote the maps given by The following lemma lists the facts on the Mallows(Z, q) distribution we will be relying on in our proofs below.
We remark that it follows from Part (iv) that also for all k ∈ Z.It now also follows that Finally, let us also remark that the Mallows(Z, q) process can be thought of as a stationary version of the Mallows(N, q) process, as described in [30].In particular, we will use the following result on renewal processes, applied to the Mallows process ([13], Equation 5.72).
Proposition 2.3.Let X i be a sequence of independent and identically distributed random variables taking values in N, with EX i = µ > 0, and let τ (n) = inf{t : distribution, where X * has the size-bias distribution of the X i , which means We will also use some of the tools developed in [15].In particular we will use the arc chain {κ t } n t=0 corresponding to Π n ∼ Mallows(n, q) with 0 < q < 1, defined by We speak of the (n, q) arc chain.We have the following.
Lemma 2.4 ([15], Proposition 3.3).Let Π n ∼ Mallows(n, q) with 0 < q < 1.The arc chain (κ t ) t=0,...,n of Π n is a time-inhomogeneous Markov chain with transition probabilities 15], Lemma 3.4).Let Π n ∼ Mallows(n, q) and κ be its arc chain.Then (We mention that we have slightly adapted the statements from [15] in the above two lemmas.)Analogously to the arc chain κ t for Π n ∼ Mallows(N, q), we can define the arc-chain for Π ∼ Mallows(N, q) with 0 < q < 1 setting We speak of the (∞, q)-arc chain.It is straightforward to verify that (κ t ) t≥0 forms a Markov chain with the following transition probabilities.
Alternatively, this can be seen by combining Lemma 2.4 with Lemma 3.2 below.As a side remark, we mention that Gladkich and Peled [15] defined the (∞, q)-arc chain directly via the transition probabilities given here, without explicitly mentioning the connection to the definition we give here.As explained in [15] (Section 3.2), the Markov chain κt has a unique stationary distribution ν given by (3) Here, as usual, an empty product equals 1.
The following result provides a useful link between the (n, q) and the (∞, q) arc chains.
Proposition 2.6 ([15], Proposition 3.8, ).Set t = t(n).If both t → ∞ and n − t → ∞ then the law of κ t converges to the stationary distribution ν s as n tends to infinity with q fixed.
Given two discrete probability distributions µ 1 and µ 2 on a countable set Ω, their total variation distance is defined as A useful alternative expression is (For a proof, see for instance Proposition 4.2 in [26].)As is common, we will interchangeably use the notation A coupling of two probability measures µ, ν is a joint probability measure for a pair of random variables (X, Y ) satisfying X = d µ, Y = d ν.We will also speak of a coupling of X, Y as being a probability space for (X , Y ) with X = d X, Y = d Y .Another useful characterization of the total variational distance is as follows.
Lemma 2.7.Let µ and ν be two probability distributions on the same countable set Ω. Then There is a coupling that attains this infimum.
(For a proof, see for instance [26], Proposition 4.7 and Remark 4.8.) For the proofs of the normal limiting laws in Theorem 1.1 and Theorem 1.2 we will make use of a result on stopped two-dimensional random walks by Gut and Janson [19] that seems tailor made for our purposes.Here we consider an i.i.d.sequence (X 1 , Y 1 ), (X 2 , Y 2 ), . . .and for t > 0 we define τ (t) as the first k such that The result of Gut and Janson we'll use states that: Theorem 2.8 ([19], Theorem 3).Let (X 1 , Y 1 ), (X 2 , Y 2 ), . . .be an i.i.d.sequence and let τ (n) be as given by (4).Suppose that For the proofs of the normal limiting laws in Theorem 1.1 and Theorem 1.2 it will be convenient to use the Cramer-Wold device.A proof can for instance be found in [6] (Theorem 29.4).Theorem 2.9 (Cramer-Wold device).For random vectors Several times, we are going to rely on the following result of Basu and Bhatnagar [3].
Lemma 2.10 ([3], Lemma 5.5).Let W 1 , W 2 , . . .be an i.i.d.sequence of random variables with We will also make use of the following fact.Even though it seems pretty standard we have not been able to find a convenient reference.We therefore provide a short proof.Lemma 2.11.Suppose that (W t ) t≥0 is a Markov chain with state space {0} ∪ N, started in state W 0 = 0, and whose transition probabilities satisfy p i,j = 0 if and only if |i − j| = 1 and lim inf Proof.Let T i denote the number of steps to reach i − 1, in the chain starting from W 0 = i.Let i 0 ∈ N, p > 1/2 be such that p i,i−1 > p for all i ≥ i 0 .We have for each i ≥ i 0 , using the Chernoff inequality (see for instance [23], Corollary 2.3).This implies for all i ≥ i 0 and k ∈ N.
Starting from W 0 = 0, we of course move to state 1 with probability one in the first step, giving for all k ∈ N. In particular, it suffices to show ET k 1 < ∞ for all k ∈ N. Similarly, by considering the first step of the chain we see for each i ≥ 1 and k ∈ N: where we take independent in the first line.(To see the first inequality, note that in the first step we move to i − 1 with probability p i,i−1 .If, on the other hand, we move to i + 1 in the first step then we first have to wait until we reach state i again, and then we have to wait until we reach i − 1 from i).Rewriting (5), we obtain We can thus apply induction on k to show that ET k i 0 −1 < ∞ for all k ∈ N. Repeating the argument, we also have ET k i < ∞ for i = i 0 − 2, i 0 − 3, . . .and so on until i = 1.
3 The proof of Theorem 1.1 We define a sequence of regeneration times T 0 < T 1 < T 2 < . . .as follows: T 0 := 0, In Section 4 of [3], Basu and Bhatnagar show that T 1 has finite second moment.
(We combine Lemmas 4.1 and 4.5 of [3].)We also define the interarrival times Looking at the description of Π in Section 2, it is not difficult to see that conditional on the event T 1 = t, the bijection i → Π(i + t) − t is distributed like Π.It follows that the interarrival times X 1 , X 2 , . . .are i.i.d.Moreover, writing X i := {T i−1 + 1, . . ., T i } we see that Π maps X i bijectively onto X i , and in fact the permutations Σ With this regenerative structure, the following lemma follows.
Lemma 3.2.For 0 < q < 1 and n ∈ N, let Π n ∼ Mallows(n, q) and Π ∼ Mallows(N, q).There exists a coupling of Π n and Π satisfying and in fact, log 2 n can be replaced with any function going to ∞ with n.
Proof.Let Π ∼ Mallows(N, q), and recall that Π [n] ∼ Mallows(n, q).We claim that this coupling works.Indeed, it's clear that Π and X τ (n) converges to a limiting distribution by Proposition 2.3.In particular, the probability that it is larger than log 2 n (or any function going to ∞) goes to 0.
By this last lemma, with probability 1 − o(1), the number of i-cycles in Π n differs by at most 2 log 2 n from the number of i-cycles of Π that are completely contained in [n] (for each i = 1, . . ., ).
Fix an ∈ N, and let a 1 , . . ., a be a sequence of real numbers, not all zero.For π a permutation, we define ϕ(π) := j=1 a j C j (π) and let Y i = ϕ(Σ i ).
Proof.We first note that, for each i ∈ N, there is a positive probability that Σ 1 consists of a single i-cycle.(This happens for instance when Aiming for a contradiction, assume that Y 1 EX 1 − X 1 EY 1 is almost surely constant.Whenever Σ 1 consists of a single > cycle, we have Y 1 = 0.In particular Y 1 EX 1 − X 1 EY 1 can equal both ( + 1)EY 1 and ( + 2)EY 1 with positive probability.The quantity Y 1 EX 1 − X 1 EY 1 being an almost sure constant now implies EY 1 = 0.
Let 1 ≤ i ≤ be such that a i = 0.There is a positive probability that Y 1 EX 1 = 0 and a positive probability that Y 1 EX 1 = a i EX 1 .But that implies a i = 0, contradicting the choice of i.
Having established Lemmas 3.3 and 3.4, we can apply Theorem 2.8 to conclude that By Lemma 3.2 and the definition of τ (n) we have with probability 1 − o(1), under the coupling provided by Lemma 3.2.Moreover, applying Lemmas 2.10 and 3.1 we have that, with probability 1 − o(1), the RHS of ( 6) is o( √ n).We can conclude: Recalling that Y 1 = i=1 a i C i (Σ 1 ) and setting m i := EC i (Σ 1 ) EX 1 , we can write Setting we see that Therefore, if we set This shows that if (N 1 , . . ., N ) ∼ N (0, P ) then We've thus shown that for all a 1 , . . ., a .An application of Theorem 2.9 now allows us to conclude completing the proof of Theorem 1.1.
4 The proof of Theorem 1.2 The proof is very similar to the proof of Theorem 1.1.We first introduce a two-sided sampling procedure in the case 0 < q < 1 for a Mallows(n, q) distributed permutation Π n taking n 2 iterations.During iteration i ≥ 1 we determine the images of i and n − i + 1. Again we take Z 1 , . . ., Z n independent with Z i ∼ TruncGeo(n + 1 − i, q).In the first iteration we set In the i-th iteration we set , the image of n 2 has formally speaking not yet been determined, but of course there will be only one possible element of [n] left.) That this adapted procedure indeed produces a random permutation sampled according to the Mallows(n, q) measure follows analogously the the corresponding argument for the original sampling procedure : For every π ∈ S n there is a choice of (k 1 , . . ., We also again have inv(π) = k 1 + • • • + k n − n, because when we are determining Π n (i) with i ≤ n/2 then the number of i < j < n + 1 − i such that i, j form an inversion is precisely Z 2i−1 − 1, and similarly for Π n (n + 1 − i).
Recall that we use r n to denote the map i → n + 1 − i. Analogously to Lemma 3.2, we have Lemma 4.1.Let 0 < q < 1 and Π n ∼ Mallows(n, q) and let Π, Π ∼ Mallows(N, q) be independent.There is a coupling for Π n , Π, Π such that Moreover, the log 2 n can be replaced with any function going to ∞ with n.
Proof.The proof is similar to the proof of Lemma 3.2.By Lemma 3.2 applied to Π and Π , we can couple Π, Π with independent Mallows(n/2, q) permutations, Π n/2 , Π n/2 (for simplicity, we write n/2 even if n is odd, where it should be rounded either up or down as needed), such that . We now claim that we can couple Π n with Π n/2 , Π n/2 such that and this would immediately finish the proof, since any coupling with these bivariate marginals would satisfy the lemma.
To see the claim, note that (Π n ) [n/2] and (Π n ) [n/2+1,n] are independent Mallows(n/2, q) (see e.g.Lemma 2.3 of [20]), and so can be coupled to perfectly agree with Π n/2 and r n .However, this random variable is stochastically dominated by X τ (n/2) , since if we couple Π n with a Mallows(N, q) process, then the length of the interval is b − a = X τ (n/2) unless X τ (n/2) > n, in which case X τ (n/2) is strictly larger.But now we are done, since X τ (n/2) converges to a limiting distribution by Proposition 2.3, and so the probability that Π n disagrees with either Π n/2 or r n • Π n/2 • r n for a growing number of locations goes to 0.
We also define X i := T i − T i−1 and τ (t) = inf{j : T j > t} for all t > 0. Again it can be easily seen from the iterative procedure generating Π and Π than X 1 , X 2 , . . .are i.i.d.Moreover, if we define the maps Σ We write X i := {T i−1 + 1, . . ., T i }.Observe that, with probability 1 − o(1), for each i such that T i < n/2 − log 2 n we have by Corollary 4.2.In other words, writing for each i such that T i < n/2 − log 2 n.In particular, every cycle of Π n is either completely contained in one of Y 1 , . . ., Y τ (n/2)−1 or it contains some number between min(n/2 − log 2 n, T τ (n/2)−1 ) and max(n/2 + log 2 n, n + 1 − T τ (n/2) ).We observe that the number of cycles of Π n length 2i contained in Y i equals the number of cycles of Π n • Π n of length i contained in X i .Now note that on In particular, the number cycles of Π n • Π n of length i contained in X i equals the number of cycles of Σ i • Σ i of length i.
We fix a 1 , . . ., a ∈ R, not all zero, and set By the previous We can thus conclude from Theorem 2.8 that Setting µ 2i := and using (7) this gives where (N 1 , . . ., N )= d N (0, P ) with Again the result follows by an application of the Cramer-Wold device.
5 The proof of Theorem 1.3 Part (ii) of Lemma 2.2 says that, almost surely, Σ In → Σ for I n := {−n, . . ., n}.It however leaves open how fast the convergence is.The following lemma shows that in fact, with high probability, for the vast majority of elements of I n , the values of Σ and Σ In agree.This will be very helpful for us.
Proof.We define the event B n by B n := |Σ(i) − i| < 5 log 1/q n for all i ∈ I n , and |Σ(j) − j| ≤ 5 log 1/q n + |j| − n for all j / ∈ I n .
By Lemma 2.2, Part (vi) we have We will show that B n implies the conclusion of the lemma.Let a ≥ 0 and consider the intervals I n+a .Let i be such that |i| < n − 10 log 1/q n.If B n holds, then for all j < −n we have Σ(j) the sequence Σ I n+a (i) is constant for all a ≥ 0. By Lemma 2.2, Part (ii), with probability one there is some n such that for all n ≥ n we have Σ I n (i) = Σ(i).There is some a ≥ 0 such that n ≤ n + a, so that in particular we must have Σ In (i) = Σ(i).Similarly, Σ Jn (i) = Σ(i).
By the Borel-Cantelli Lemma B n holds for all but finitely many n.
We also require the following Markov chain representation for the times T i .Let Π ∼ Mallows(N, q).Consider the process on N.This is a positive recurrent Markov process -see [3].The Markov process can be described in terms of the geometric random variables defining the Mallows process.Specifically, the walk can be described as moving from M n to M n+1 = max(M n , Z n )−1 where the Z n are independent geometric random variables.Let R i denote the hitting time of i and let R + i denote the return time at i. Then if the chain is started from 0, R + 0 is distributed as the size of an excursion in the Mallows process.This Markov chain was introduced in [3] to study the moments of the T i .We are now ready for the proof of Theorem 1.3.
We start by considering Π 2n+1 .We let Σ ∼ Mallows(Z, 1/q), and let S 0 be the smallest integer (we will show that this exists) such that Σ([−S 0 , S 0 ]) = [−S 0 , S 0 ], and then let S i be defined inductively as the smallest number larger than S i−1 such that Σ preserves [−S i , S i ].
To see that all these values are finite almost surely, we note that the times with the convention that T 0 contains 0, forms a stationary renewal process, with the T i for i = 0 the same as for the Mallows(N, 1/q) process, and T 0 having its size-bias distribution (see Theorem 3.2 of [30]).Then T 0 is finite almost surely, and given T 0 , the two sides are independent and behave like Mallows(N, 1/q) processes.Then the S i correspond to a simultaneous renewal on both sides of this process, with S 0 being the first time this occurs, which are the return times in a product of two independent copies of the positive recurrent Markov chain M i defined above, which is thus also positive recurrent.Thus, all S i are finite almost surely.By definition, Σ • r preserves [−S 0 , S 0 ] and exchanges We then immediately see that there are no infinite cycles in Σ•r, and the odd cycles must be contained in the interval The proof of the result for Π 2n follows in exactly the same manner, except centered at 1 2 rather than 0, and using ρ instead of r.
We also note the following consequence which will be useful later.

Proof. Any cycle must be contained in an interval
all of which are finite almost surely.
It follows that for all j with |j| < n − i • log 2 n we have The number of elements of where we use Part (iv) of Lemma 2.2 (together with the remarks following the lemma) for the last line.Dividing the LHS and RHS of ( 9) by 2n + 1 and sending n → ∞, and recalling (8), proves the result.

The proof of Part (ii) of Theorem 1.4
By Lemma 5.2, Σ ∼ Mallows(Z, q) for 0 < q < 1 almost surely has no infinite cycles.By Part (i) of Theorem 1.4 we have

The proof of Part (iii) of Theorem 1.4
We start by giving an alternative expression for m 1 by employing the tools developed by Gladkich and Peled [15].Lemma 6.1.
Proof.Let Π 2n+1 ∼ Mallows(2n + 1, q).By Lemma 5.1 we have that where Σ ∼ Mallows(Z, q).Now, By Lemma 2.5 and Proposition 2.6 we have for all s ≥ 0 that So the summands on the right hand side of ( 11) converge pointwise to ν s q 2s (1 − q) as n → ∞, and are uniformly bounded by 1 for all n.By the bounded convergence theorem we thus conclude that Next, we will show that m 1 = 1 − 2q + O(q 2 ) as q ↓ 0 be analyzing (10).Let K q denote the denominator in the expression for ν given in (3).Define t s = s i=1 q 2i−1 /(1 − q i ) 2 .The t s satisfy for s ≥ 1 the recursion relation so that by m 1 ≥ ν 0 (1 − q) we obtain We also have the simple bound The function (1 − q) 3 /(1 − q + q 2 ) is infinitely differentiable at q = 0, where its first derivative equals −2.
7 The proof of Theorem 1.5 7.1 The proof of Part (i) of Theorem 1.5 We start by proving the existence of a coupling in the same spirit as the couplings used in previous proofs.Lemma 7.1.Let 0 < q < 1 and Π ∼ Mallows(N, q) and Σ ∼ Mallows(Z, q).There exists a coupling between Π and Σ such that Moreover, the log 2 n can be replaced with any function going to ∞ with n.
Proof.We can run both Π and Σ until both processes regenerate simultaneously, in the sense that Π(i) ≥ k for all i ≥ k and Σ(i) ≥ k for all i ≥ k.After this time, both processes will have the same distribution, and so can be coupled to be equal.The first time that both processes regenerate is equal in distribution to the hitting time of (0, 0) for the Markov chain defined by taking two independent copies of M i , where we start by first running both chains until Σ regenerates (so the Markov chain starts from a random state (0, X), where X is the state of the second copy of the Markov chain stopped at the first time that Σ regenerates).Since the product chain is still positive recurrent, this hitting time is finite almost surely, and the result follows.
We also have P(F ) = 1 − o(1), where by an application of Part (vii) of Lemma 2.2 and the union bound.Hence, if A j denotes the event that j is in an i-cycle of Π • Π that is completely contained in [ n/2 ], and B j denotes the event that j is in an i-cycle of Σ • Σ then using Part (iv) of Lemma 2.2 and the remarks following that lemma for the last identity (applied to both Σ and Σ ).Dividing LHS and RHS by n and sending n → ∞ (and recalling ( 14)) gives Finally, we briefly clarify how the expression µ 2i = i∈Z P(Σ(0) = i) 2 is obtained.We have using that Σ, Σ are i.i.d. and Part (iv) of Lemma 2.2, and the remarks following that lemma, for the second identity; and in the last identity that also by the remarks following Lemma 2.2.

The proof of Part (ii) of Theorem 1.5
Let ≥ 1 and define The function Y 1 counts the number of elements in [X 1 ] ∪ {n − X 1 + 1, . . ., n} that are contained in even cycles of length at most 2 .If X 1 ≤ then we have We have EX 1 = a + b.We also have the bounds 2a ≤ EX 1 ≤ 2a + 2b.As EX 1 < ∞, for any ε > 0 we can choose 0 = 0 (ε) large enough so that for every > 0 we have b < ε.As 1 ≤ EX 1 , having chosen 0 sufficiently large, we can also ensure a ≥ 1 − ε for all > 0 .In this case

. Let us define
A j := {j is the largest element of a 2r-cycle of Π n }.
(To clarify j being the largest element, we of course just mean j = max(i 1 , . . ., i 2r ) if i 1 , . . ., i 2 r is a 2r-cycle as above.)We consider the iterative procedure for generating Π * n .When we generate the image of Π * n (j), having already determined Π * n (1), . . ., Π * n (j − 1) it may not be possible that j is the largest element of some 2r-cycle.If it is still possible, then we need Π * n (j) to be some specific value among the still available ones.(To be precise, (r n • Π * n ) (−2r−1) (j).)Since we sample according to a truncated geometric distributions, the probability is thus at most where the last inequality holds for all j ≤ n − log 2 n.To follows that dividing by n, sending n → ∞ and recalling (15), shows µ 2r ≤ 2(1 − 1/q).In particular lim q↓1 µ 2r = 0.

8.2
The proof of Part (ii) of Theorem 1.6 We will make use of the following relatively elementary observation.
The last lemma allows us to give short proof of the following explicit bounds on c e , c o , that will immediately imply the value of the q ↓ 1 limits equals 1/2.Proof of Proposition 9.1, Parts (i) and (iii).The proofs of (i) and (iii) are very similar.We start with (i).Consider Π n ∼ Mallows(I n , q) with I n := {−n, . . ., n}.By Part (ii) of 2.2, it suffices to show that lim inf n→∞ P((ρ • (1 − q) k .For J ⊆ Z let S J denote the set of all permutations of J. Consider the set of all permutations π ∈ S In constructed as follows.We pick arbitrary permutations σ ∈ S {−n,...,−k} , σ ∈ S {k+1,...,n} and set Notice such a permutation satisfies We have

Now we recall that
(This last inequality can be easily seen using the Taylor expansion log(1 and Part (i) follows.
The proof of (iii) is essentially the same as the proof of (i).Now we put the elements of {−k, . . ., k} in reverse order, and put arbitrary permutations on {−n, . . ., −k − 1} and {k + 1, . . ., n} and the proof carries through with only minor adaptations in notation.
We next turn attention to Part (ii) of Proposition 9.1.This proof is a bit more involved, and we break it down into several steps.The first step is the following observation.Lemma 9.2.For 0 < q < 1 we have Proof.This immediately follows by the result on displacements P(|Σ(i) − i| > m) = Θ(q m ) as in Lemma 2.2 part (vi) and the union bound.
We point out that, as Σ −1 and Σ follow the same distribution, we have Lemma 9.3.There is a constant c such that we have where Proof.We are again going to consider Π n ∼ Mallows(I n , q) with I n = {−n, . . ., n} and n large.
It follows that In other words, Note that for A n to hold, it must be the case that In particular, when choosing the image of j 1 we must "skip" at least the first r − 1 available numbers, when choosing the image of j 2 we must "skip" at least r − 2 available numbers, and so on.
We let c be a large constant to be specified later on.Suppose that, for some c ≤ a ≤ r − c, there exists a j a < j < j a+1 .If both A n and Π n (j) > −j a are to hold then when determining Π n (j) we must skip over the (still available) numbers −j r , . . ., −j a+1 .I.e., we skip over at least the first r − a of the available numbers.If on the other hand, A n and Π n (j) < −j a are to hold then when determining each of j 1 , . . ., j a must have skipped an additional available number.
It follows that , where in the last line we assume without loss of generality that c has been chosen large enough so that q c/2 < 1/2.(Which implies that q r−a + q a ≤ 2q min(a,r−a) ≤ q 1 2 min(a,r−a) , for all c ≤ a ≤ r − c.) Combining the bounds on P(A n |B n ) and P(B n ) gives the inequality in the lemma follows.
We make an additional definition: We observe that, analogously to (17), we have p y→x j,i = p x→y i,j .Since r • Σ • r (the map i → −Σ(−i)) also has the same distribution as Σ, we have in addition that Lemma 9.4.We have (i) If x < 0 and y > 0 are such that |x|, y ∈ {i 1 , . . ., i } then ≤ (1 − q) +r q Ψ(i,j)+|x|+y .
Proof.We start with the proof of (i).The proof is nearly identical to the proof of the previous lemma, and we only mention the changes that need to be made.Now, we define and let m ≥ 0 be such that i a > x for all a > m.
We have We can conclude that The same reasoning as before shows For the proof of (ii), we again proceed similarly.This time we define Let m ≥ 0.8r be such that j 1 , . . ., j m < x < j m+1 (where the upper bound is void if m = r).We can compute P(D n |B n ) in a manner analogous to the way we determined P(A n |B n ) in the proof of Lemma 9.3.When computing P(A n |B n ) in the term for the m-th gap, the choice of Π n (x) contributed a factor q 1 2 min(m,r−m) 1−q n−x in case m ≤ r − c and contributed a factor one otherwise.If we know that Π n (x) ≤ j r/2 then we can replace this by q r/2 as all of j 1 , . . ., j r/2 must have skipped over the additional available number y.This gives Hence also and the result follows.
For notational convenience we introduce a notation for the gaps between consecutive entries of i and j: and This gives the following alternative expression for the upper bound on p i,j : Next we establish that the probability that C 1 (r • Σ) ≥ 2k and Σ(0) = 0 is small compared to the target expression (1 − q) 2k • q ( 2k 2 ) .
We proceed by showing that the the probability that a) C 1 (r • Σ) ≥ 2k, and; b) Σ(0) = 0, and; c) the number of fixed points of r • Σ below zero differs by more than one from the number of fixed points of r • Σ above zero, is small compared to the target expression (1 − q) 2k • q ( 2k 2 ) .
Proof.Arguing as in the previous lemma, the sought sum is at most q ( +1−r) .
Proof.If i a = a for some 1 ≤ a ≤ 0.9k then there is also a 1 ≤ a ≤ 0.9k for which g a = 0.For this a we have 2(k + 1 − a) • g a ≥ 0.2k.Also note that Arguing as in previous lemmas, it follows that the sought sum is at most (The polynomial term k O(1) also absorbs the k ways of choosing an index a for which g a = 0 and the k 3 ways of choosing a value for g a .)Lemma 9.8.We have 0<i 1 <•••<i k <k 3 , 0<j 1 <•••<j k−1 <k 3 , ja =j a−1 +1 for some 0.1k ≤ a ≤ 0.9k p i,j = o (1 − q) 2k • q ( 2k 2 ) , as k → ∞.
(The polynomial term also absorbs the 0.9k ways of choosing a and the k 3 ways of choosing h a .)Lemma 9.9.We have as k → ∞.
In the case b-1) the same bound on p x→y o,i,j applies as in the case a), and via similar computations we obtain that the corresponding contribution to the sum satisfies ≤ k O(1) (1 − q) 2k−1 q ( 2k 2 )+r/4 = o (1 − q) 2k q ( 2k 2 ) .
To deal with case b-3) we note that in this case

(i)Figure 2 :
Figure2: A plot of µ 2 versus q.The simulations were done sampling a Mallows(1000, q) distribution 10000 times, and taking the average number of 2-cycles.

Figure 3 :
Figure 3: The graph of c o and c e for q > 1. Simulations were done for n = 1000 and n = 1001, each sampled 10000 times.

Figure 4 :
Figure4: Sample of Mallows(N, q).The red squares indicate the first time that an interval is sent to itself, and the lengths and contents of the red squares are independent and identically distributed.
and similarly for r n • Π n/2 • r n and intervals of the form [k, n].Thus, the number of i with disagreements is bounded by the length of the smallest interval [a, b] with a ≤ n/2, b ≥ n/2, and Π n ([a]) = [a] and Π n ([b, n]) = [b, n]

P
[0 in an i-cycle of Σ] = P [0 lies in a finite cycle of Σ] = 1, the last equality due to Lemma 5.2.