Random walks on the circle and Diophantine approximation

Abstract Random walks on the circle group R/Z whose elementary steps are lattice variables with span α∉Q or p/q∈Q taken mod Z exhibit delicate behavior. In the rational case, we have a random walk on the finite cyclic subgroup Zq, and the central limit theorem and the law of the iterated logarithm follow from classical results on finite state space Markov chains. In this paper, we extend these results to random walks with irrational span α, and explicitly describe the transition of these Markov chains from finite to general state space as p/q→α along the sequence of best rational approximations. We also consider the rate of weak convergence to the stationary distribution in the Kolmogorov metric, and in the rational case observe a phase transition from polynomial to exponential decay after ≈q2 steps. This seems to be a new phenomenon in the theory of random walks on compact groups. In contrast, the rate of weak convergence to the stationary distribution in the total variation metric is purely exponential.


Introduction
Convergence of random walks on compact groups to the uniform distribution (Haar measure) is a classical topic in probability theory.Such a convergence takes place under very general assumptions, but its finer properties are rather intricate.A classical example is the cutoff phenomenon of Aldous and Diaconis [2], [3], [16,Chapter 4] for random walks on the finite symmetric group corresponding to card shuffling, when the distance from uniformity in the total variation metric remains close to 1 for a long time, then drops almost immediately near 0. In this paper we consider random walks on the circle group R/Z, which also exhibit surprisingly delicate phenomena.
Let X 1 , X 2 , . . .be independent, identically distributed (i.i.d.) nondegenerate integer-valued random variables, and put S k = k j=1 X j .Given an irrational α, the sequence S k α (mod Z) is a random walk on R/Z whose asymptotic behavior depends sensitively on the Diophantine approximation properties of α.In the terminology of probability theory, S k α (mod Z) is a discrete time Markov chain on a general state space, with the uniform distribution on R/Z as its stationary distribution.Note that the chain is irreversible, and, as we will see, its terms are polynomially weakly dependent.We can also consider a discrete analogue by choosing a reduced fraction p/q instead of an irrational α.In this case S k p/q (mod Z) is a random walk (in particular, a discrete time Markov chain) on the finite cyclic group Z q := {0, 1/q, . . .(q − 1)/q}.The main purpose of this paper is to study the asymptotic behavior of these random walks, and to explicitly describe the transition from finite to general state space as p/q → α along the sequence of best rational approximations to a given irrational α.
In the rational case it is natural to define the distance from uniformity in the "discrete Kolmogorov metric" as ψ disc (k) := max 0≤a<q |Pr({S k p/q} ≤ a/q) − (a + 1)/q| , where the maximum is over integer values of a.Note that the discrete Kolmogorov metric metrizes weak convergence on Z q , and so ψ disc (k) → 0 if and only if the maximal span of X 1 (the largest integer D such that D | (X 1 −X 2 ) a.s.) is relatively prime to q.Our first result establishes an unexpected behavior in the discrete setting: a transition from polynomial to exponential decay.This behavior has not yet been described for a random walk on any compact group in any probability metric.
Theorem 1. Assume that EX2 1 < ∞, and let ϕ(x) = Ee ixX1 denote the characteristic function of X 1 .There exists a constant q 0 depending only on the distribution of X 1 with the following property.If p/q is a reduced fraction such that q ≥ q 0 , the maximal span D of X 1 and q are relatively prime, and min 0<|h|≤q/2 |h| • hp/q ≥ A > 0 with some constant A > 0, then and |ϕ(2π/(Dq) The implied constants in the lower bounds depend only on the distribution of X 1 ; the implied constants in the upper bounds depend, in addition, on A.
Note that |ϕ(2π/(Dq))| k is roughly exp(−ck/q 2 ) with c = 2π 2 (Var X 1 )/D 2 .The value of A > 0 depends only on the maximal partial quotient in the continued fraction expansion of p/q, but not on its length.In particular, if p/q is a best rational approximation to a given badly approximable irrational α, this condition is satisfied with an A > 0 depending only on α.In this case, for the first few steps the Markov chains S k α (mod Z) and S k p/q (mod Z) are practically indistinguishable.Theorem 1 describes with striking precision, that it takes constant times 1 q 2 steps for S k p/q (mod Z) to start to behave like a Markov chain on a finite state space; exponentially fast convergence to the stationary distribution is a hallmark property of finite state space Markov chains.We have an explicitly described transition from rational to irrational behavior: as p/q → α, the time of transition, constant times q 2 goes to infinity; in the limiting case of S k α (mod Z) we end up with the polynomial decay k −1/2 for all k ≥ 1. Obvious modifications of the proof of Theorem 1 yield similar results under more general Diophantine conditions on p/q, and for heavy-tailed X 1 .
In contrast, for any q ≥ q 0 the distance ψ TV (k) of the distribution of {S k p/q} from uniformity in the total variation metric satisfies where q 0 and the implied constants depend only on the distribution of X 1 , provided that EX2 1 < ∞ and that D and q are relatively prime.The rate does not depend on the Diophantine properties of the fraction p/q; indeed, multiplication by an integer p relatively prime to q is a bijection of Z q , and thus does not change the distance in total variation.In particular, it takes constant times q 2 steps to get close to uniformity in the total variation metric as well; however, there is no transition from polynomial to exponential decay.Note that in the irrational setting S k α (mod Z) does not converge to the uniform distribution on R/Z in total variation.Special cases of (2) were first proved by Chung, Diaconis and Graham [15], see also [3] and [16,Section 3.C].
In addition to weak convergence, we also study the empirical distribution by considering additive functionals, that is, sums of the form where f = f − E(f ), and U is a random variable uniformly distributed on R/Z, independent of X 1 , X 2 , . . . .Our second result is the central limit theorem (CLT) and the law of the iterated logarithm (LIL) for the sum N k=1 f (S k α).
Note that (4) expresses convergence in distribution to the mean zero normal distribution with variance σ 2 (interpreted as the constant 0 if σ = 0).In addition to badly approximable irrationals, the Diophantine condition is satisfied by all algebraic irrationals and also by a.e.real number in the sense of the Lebesgue measure.We also prove an almost sure approximation of the same sum by a Wiener process.
Theorem 3. Assume that the conditions of Theorem 2 hold, and let f ∈ F .After a suitable extension of the probability space, there exists a stochastic process ζ(t) in the Skorokhod space D[0, ∞) with the same distribution as with σ = C(α, f ), a standard Wiener process W (t), and some constant ε > 0 depending only on γ.
The almost sure approximation in Theorem 3 immediately implies the CLT and the LIL in Theorem 2, as well as the almost sure asymptotics and the limit distribution of more general functionals of the process 1≤k≤t f (S k α).Using the piecewise linear functions 1≤k≤⌊t⌋ f (S k α) + (t − ⌊t⌋)f (S ⌊t⌋+1 α) instead, Theorem 3 holds in the space of continuous functions C[0, ∞).
In a previous paper [6] we observed a transition in the behavior of the chain from weak to strong dependence as γ passes the critical value 2; the sequence S k α (mod Z) behaves, from the point of view of discrepancy, like independent random variables if 1 ≤ γ < 2, but not when γ > 2, see Section 2. For this reason we conjecture that Theorems 2 and 3 in fact hold for all 1 ≤ γ < 2.
Starting the chain from its stationary distribution corresponds to U + S k α (mod Z), where U is as in (3).This stationary sequence exhibits the same transition from weak to strong dependence: the sum N k=1 f (U + S k α), f ∈ F satisfies the CLT and the LIL if 1 ≤ γ < 2, but not when γ ≥ 2. The same holds in the quenched setting; that is, the sum N k=1 f (x + S k α), f ∈ F satisfies the CLT and the LIL for a.e.x ∈ R if 1 ≤ γ < 2, but not when γ ≥ 2. A detailed proof of these results will be given in an upcoming paper.The novelty of Theorems 2 and 3 lies in the fact that the chain is started from a nonstationary distribution; in fact, from the specific point x = 0 instead of a typical x ∈ R. Using x = 0 as starting point makes the problem considerably harder, and requires a blend of arithmetic and Fourier analytic arguments to complement classical methods of probability theory.
The rational case is of course much simpler.Indeed, if the maximal span of X 1 is relatively prime to q, then S k p/q (mod Z) is an irreducible Markov chain on Z q , and from classical theorems for finite state space Markov chains [14,Chapter 16] it follows that for any f ∈ F , the CLT and the LIL lim sup hold with σ = C(p/q, f ); the only difference is that E(f ) = q −1 q−1 a=0 f (a/q) is the average of f on Z q , and that in the definition of C(p/q, f ) the variable U is uniformly distributed on Z q .In this case C(p/q, f ) ≥ 0 with equality if and only if f = 0 on Z q .
Informally, Theorem 2 states that the Markov chain S k p/q (mod Z), whose state space is finite but of increasing size as q → ∞, begins to behave more and more like the Markov chain S k α (mod Z) with an irrational α.What we can formally say is that key parameters of these Markov chains, such as the expected value and the variance converge: as p/q → α along the sequence of best rational approximations to a given irrational α, see Proposition 4. Once again, we have an explicitly described transition from rational to irrational behavior, with Theorem 2 corresponding to the limiting case.The rest of the paper is organized as follows.Several previous results related to Theorems 2 and 3 are listed in Section 2. We prove Theorems 2 and 3 in Section 3, and the remarks made about the rational case at the end of Section 3.3.The proof of Theorem 1 and (2) are given in Section 4.

Related results
Given a sequence of i.i.d.R/Z-valued random variables ζ 1 , ζ 2 , . . ., the qualitative behavior of the random walk Z k = k j=1 ζ j (mod Z) is fairly straightforward.A classical result of Lévy [19] states that Z k converges in distribution to the uniform distribution on R/Z if and only if the distribution of ζ 1 is not supported on a translate of a finite cyclic subgroup.The corresponding ergodic theorem is due to Robbins [24]: we have for all continuous functions f : R/Z → R if and only if the distribution of ζ 1 is not supported on a finite cyclic subgroup.Note that (8) expresses weak convergence of the empirical distribution N −1 N k=1 δ Z k , with δ denoting the Dirac measure; in other words, the equidistribution of the random sequence Z k .Both facts generalize to compact groups.
Quantitative forms of the ergodic theorem (8) are more involved.In a recent paper [8] we proved that lim sup holds with some constant 0 < σ < ∞ if and only if ζ 1 is nondegenerate.A remarkable perturbation method first used by Schatte [26], [27] allows more general test functions.Improving results of Schatte, in [7] we showed that the sum N k=1 f (Z k ) satisfies the CLT and the LIL for any function f ∈ F , provided that the rate of weak convergence of Z k to uniformity in the Kolmogorov metric 1+ε) with some ε > 0; we conjecture that the assumption on the rate is optimal.Similar results are known for p-Hölder functions (resp.bounded Borel measurable functions) with the Kolmogorov metric replaced by the p-Wasserstein metric (resp.total variation metric); in fact, these hold on any compact group [12], [13].
The assumption ψ(k) ≪ k −(1+ε) holds for a wide class of distributions (e.g.under Cramér's condition), but not when ζ 1 is a real-valued lattice variable with finite expectation taken mod Z, as in Theorems 2 and 3. Indeed, in the latter case by the Markov inequality and the pigeonhole principle the distribution of Z k has an atom of weight ≫ k −1 , and thus ψ(k) ≫ k −1 ; under a finite variance condition we even have ψ(k) ≫ k −1/2 .The main goal of this paper is to establish quantitative ergodic theorems for such random walks.Instead of fast enough convergence in the Kolmogorov metric, the crucial assumption is E|X 1 | < ∞ and EX 1 = 0; for technical reasons in Theorems 2 and 3 we assume the slightly stronger condition X 1 > 0 a.s. and EX 1 < ∞.Proving the CLT (4) and the LIL (5) solely under a condition on the expected value presents considerable arithmetic and analytic difficulties; the proof of Theorems 2 and 3 is much more technical than under ψ(k) ≪ k −(1+ε) in [7].We do not know whether these methods generalize to other compact groups.
From a broader perspective, our results fit into the subject of subsequences {n k α} of the classical {nα} sequence.For given n k and α, quantitative equidistribution results are notoriously difficult to prove, and are known only in very special cases.Considerable effort has been made to understand the case of a randomly chosen α.R. Baker [4] showed that for any strictly increasing sequence of positive integers n k , the mod 1 discrepancy 3/2+ε for a.e.α; this is known to be sharp up to factors of log N.For a fixed f ∈ F with 1 0 f (x) dx = 0 Lewko and Radziwi l l [20] have recently improved this to for a.e.α, which is known to be sharp up to factors of log log N; see also [1] and [9].For lacunary sequences n k both D N (n k α) and N k=1 f (n k α) with a fixed f ∈ F satisfy a sharp LIL for a.e.α, see Philipp [23].
In contrast, our results concern the case when n k is random and α is deterministic; note that under X 1 > 0 a.s., {S k α} is a random subsequence of {nα}.In a recent paper [6] we found the discrepancy D N (S k α) up to logarithmic factors for a large class of distributions; an analogue of Baker's theorem.Theorems 2 and 3 in the present paper represent an improvement for a fixed f ∈ F similar to (9).In particular, for certain special distributions with E|X 1 | < ∞, EX 1 = 0 and an irrational α satisfying 0 < lim inf h→∞ h γ hα < ∞ in [6] we showed that D N (S k α) is, up to logarithmic factors, N max{1/2,1−1/γ} a.s.We thus have a transition from weak dependence to strong dependence at γ = 2.We also mention that in [7] we actually proved that D N (Z k ) satisfies the LIL and found the nondegenerate limit distribution of N −1/2 D N (Z k ) under the assumption ψ(k) ≪ k −(1+ε) ; the assumptions of Theorem 2, however, seem not to be strong enough to find the sharp asymptotics of the discrepancy.
The growth rate of the integer sequence S k plays an important role in our setup as well.Indeed, under the condition X 1 > 0 a.s. and EX 1 < ∞ of Theorems 2 and 3, S k is a linearly increasing sequence of integers, and we have the CLT (4) and the LIL (5) with a fix f ∈ F .If X 1 has heavy tails Pr(|X 1 | ≥ x) ∼ cx −β with some 0 < β < 1 and c > 0 instead, and inf h∈Z\{0} |h| γ hα > 0 with some γ < 1/β, then by the results in [5] we have ψ(k) ≪ k −1/(βγ) ≪ k −(1+ε) , and consequently the precise asymptotics of D N (S k α) is also known.Note that in this heavy-tailed case S k grows, in a stochastic sense, roughly at the rate k 1/β .In particular, for a random version of Philipp's LIL for the discrepancy, polynomial growth suffices instead of lacunarity.

Proof of Theorems 2 and 3
Throughout this section X 1 , X 2 , . . . is a sequence of i.i.d.nondegenerate integervalued random variables with characteristic function ϕ(x) = Ee ixX 1 , and S k = k j=1 X j .Further, e(x) = e 2πix , and f (h) = 1 0 f (x)e(−hx) dx, h ∈ Z are the Fourier coefficients of f .Finally, V (f ) denotes the total variation of f on [0, 1].

A lemma on characteristic functions
In this section we prove a technical lemma on the characteristic function ϕ.Let supp X 1 = {n ∈ Z : Pr(X 1 = n) > 0} denote the support of (the distribution of) X 1 .Further, let gcd(A) denote the positive greatest common divisor, and In particular, for any The implied constants in (i) and (ii) depend only on the distribution of X 1 .
Proof.Throughout this proof constants and implied constants depend only on the distribution of X 1 .Replacing X 1 , X 2 , . . .by X 1 /d, X 2 /d, . . ., we may assume that d = 1.In particular, the smallest period of ϕ(2πx) is 1.Note that X 1 − X 2 has characteristic function |ϕ| 2 , and supp(X 1 − X 2 ) = supp X 1 − supp X 1 .It is thus easy to deduce that |ϕ(2πx)| = 1 if and only if x = n/D with some integer n.In fact, we have the estimate Indeed, note that Here E(X 1 − X 2 ) 2 > 0 (possibly infinite) as X 1 is nondegenerate, therefore (10) holds in an open neighborhood of 0; by periodicity and continuity (10) holds for all x ∈ R.
Furthermore, there exists an integer a such that X 1 ≡ a (mod D) a.s., and the assumption d = 1 ensures that a and D are relatively prime.Hence for any integer n we have e(nX 1 /D) = ω n a.s. with the primitive Dth root of unity ω = e(a/D).This means that within a period x ∈ [0, 1) the curve ϕ(2πx) touches the unit circle at each Dth root of unity exactly once; moreover, the derivative at these points is nonzero.Indeed, we have We now prove (i).Note that The derivative ϕ ′ is uniformly continuous, which in turn ensures that the convergence is uniform as |x − y| → 0. Thus it is not difficult to see that there exists a constant δ > 0 such that for any x, y Since |ϕ(2πx)| = 1 outside J, using the compactness of [0, 1]\J and the periodicity of ϕ we also have |ϕ(2πx)| ≤ 1 − r for every x ∈ R\J with some constant r > 0. (12) we have Finally, suppose that, say, x ∈ U and y ∈ U.If y ∈ J, then ( 12) and (13 This finishes the proof of the first claim in (i).The second claim in (i) follows e.g. from setting y = 0 and letting N → ∞ in the first claim (although it would not be difficult to give a direct proof).
To prove (ii), first note that B and B ′ play almost symmetric roles; that is, Thus we may assume that B ≤ B ′ , and so B ≤ √ B + B ′ .From (11) we deduce the asymptotics Let, say, K = 3π|EX 1 |, and recall (10).There exists a constant δ ′ > 0 such that for any term in ( 14) has absolute value less than 1/2, and so with some constant L > 0, where n = n(x) is the integer for which |x − n/D| < δ ′ .We may assume Kδ ′ < 1.As in the proof of (i), for any x ∈ R\J ′ we have |ϕ(2πx)| ≤ 1 − r ′ with some constant r ′ > 0. To proceed, we will also need the simple estimate which holds with implied constant e − 2. (This can be seen e.g. by expanding for some integer n.From (15) we similarly get From (15) we also have Here ω −nB = 1 because B was assumed to be divisible by D. Therefore we have Here ), and we are done.

An exponential sum
In this section we approximate a general sum involving f by an exponential sum.
where the supremum and the infimum are over all intervals J ⊆ R of length λ(J) = |y|, and I * J (x) = n∈Z I J (x + n) denotes the indicator of J extended with period 1. Proof.For any f ∈ F and any random variables X and Y , where {•} denotes fractional part.This fact is usually stated when the distribution of X is finitely supported with equal weights, and Y is uniformly distributed on [0, 1]; see Koksma's inequality [18, p. 143].The general case formally follows from integration by parts; for a detailed proof see [7, Lemma 1].
Let us apply (17) to the random variables X and Y with distribution 1/N N k=1 δ x k and 1/N N k=1 δ x k −y , respectively.If 0 ≤ y ≤ 1/2, then for any x ∈ [0, 1] we have A similar argument shows that the same holds if −1/2 ≤ y ≤ 0, and the claim follows.
Lemma 3. Let f ∈ F , and let with a universal implied constant.
Proof.Let F H (x) = |h|<H (1 − |h|/H) e(hx) denote the Fejér kernel, and recall the convolution identity Applying this with x = x 1 , x 2 , . . ., x N and using the fact that the total integral of F H on [−1/2, 1/2] is 1, the error term in the claim can be written in the explicit form The assumption x k − x ℓ ≥ r, k = ℓ and the pigeonhole principle imply that the periodic extension of any interval of length |y| contains at most |y|/r + 1 of the points x 1 , x 2 , . . ., x N .According to Lemma 2 we thus have The claim then follows from the estimate −1/2 |y|F H (y) dy ≪ log H/H, which can be seen directly from the definition of F H .

The variance
In this section we prove two lemmas closely related to the variance of N k=1 f (S k α).In particular, we find the variance of the corresponding stationary process N k=1 f (U + S k α), and prove the properties of the constant C(α, f ).At the end of the section we prove the remarks made in the Introduction about the rational case.
We will need the fact that the distance from the nearest integer function is symmetric and subadditive; that is, − x = x and x + y ≤ x + y for any x, y ∈ R. Further, we will need the classical Diophantine estimate which states that for any irrational α such that inf h∈Z\{0} |h| γ hα > 0 and any integer H > 1, with implied constants depending only on α and γ.A similar estimate claims For a detailed proof of ( 18) we refer to [6, Corollary 4.3]; for related results on Diophantine sums see [18,Chapter 2].Lemma 4. Assume that E|X 1 | < ∞ and EX 1 = 0, and let α be irrational such that inf h∈Z\{0} |h| γ hα > 0 with some constant γ ≥ 1.Further, let H > 1 be an integer, and let c h ∈ C, 0 < |h| < H be a finite sequence such that |c h | ≤ 1/|h| for every h.For any integers M ≥ 0 and N ≥ 1, with implied constants depending only on the distribution of X 1 , α and γ.
, and E H = H 2γ−2 if γ > 1 denote the error term in the claim.The classical estimate (18) shows that here the contribution of 1/(|j| , hence the contribution of all such terms is also O(E H ). A similar claim holds if hα ≤ jα /2.Thus it is enough to consider the terms for which jα and hα are equal up to a factor of 2. In particular, we need to estimate 0<|j|,|h|<H j =h This is symmetric in j, h, thus it is enough to consider the terms for which, say, |j| ≤ |h|.For any such term |h| ≥ |j − h|/2.Hence summing over j and i = j − h = 0 instead of j and h, we get from ( 18) that ( 21) is The total contribution of all off-diagonal terms j = h in ( 19) is thus O(E H ). Next, we estimate the diagonal terms j = h in (19).Using (20) and the fact that ϕ(−2πhα) = ϕ(2πhα), after some simplification we get Finally, applying Lemma 1 (i) and the classical estimate (18) we get that the total contribution of this error term in (19) also satisfies Lemma 5. Assume that E|X 1 | < ∞ and EX 1 = 0, and let U be uniformly distributed on R/Z, independent of X 1 , X 2 , . . . .Further, let α be irrational such that inf h∈Z\{0} |h| γ hα > 0 with some constant 1 ≤ γ < 2.
(i) For any f ∈ F the infinite series in (3) is convergent, and C(α, f ) ≥ 0 with equality if and only if f = 0 a.e.
(ii) For any f ∈ F and any integer N ≥ 1, with implied constants depending only on the distribution of X 1 , α and γ.
Proof.Let d = gcd(supp X 1 ) and D = gcd(supp X 1 − supp X 1 ).Replacing X 1 , X 2 , . . .by X 1 /d, X 2 /d, . . .and α by dα, we may assume that d = 1.We may also assume that E(f ) = 0. Since the variable U is independent of X 1 , X 2 , . . ., we have for any integer h, and that f (0) = 0. Further, integration by parts shows that | f (h)| ≤ V (f )/(2π|h|) for any integer h = 0.In particular, the Fourier series of g is absolutely convergent.Clearly g is continuous, hence the Fourier series of g converges uniformly to g. Therefore we can write First, we prove (i).Recall from the proof of Lemma 1 that ϕ(2πx) = 1 if and only if x ∈ Z; also, |ϕ(2πx)| = 1 if and only if Dx ∈ Z.In particular, |ϕ(2πhα)| < 1 for any h = 0.For any positive integer K we thus have Using Lemma 1 (i), the classical estimate (18) and a dyadic decomposition, in the case 1 < γ < 2 we get A similar estimate holds if γ = 1.We can thus take the limit as K → ∞ in (23) to obtain and the convergence of the series in (3) follows.Combining the h and −h terms in (24) and using Ef (U) Here 0 Further, C(α, f ) = 0 if and only if f (h) = 0 for all integers h = 0; the latter condition is equivalent to f = 0 a.e.
Next, we prove (ii).We may assume Expanding the square we get Let us write Using (22) we thus have Here the inner sum is By Lemma 1 (i) the second term on the right hand side of the previous line has absolute value at most From | f (h)| ≤ 1/|h|, the classical estimate ( 18) and a dyadic decomposition, in the case 1 < γ < 2 we get ).In the case γ = 1 the same holds with error term O(log N).Combining the h and −h terms, and using as claimed.
Also, the zeroes of the function above are at points x such that Dx ∈ Z but x ∈ Z.In the special case D = 1 we thus also have (1 − |ϕ(2πhα)| 2 )/|1 − ϕ(2πhα)| 2 ≫ 1, and so C(α, f ) ≫ f 2 2 .We now prove the remarks made in the Introduction about the rational case.Let p/q be a reduced fraction, and assume that D = gcd(supp X 1 − supp X 1 ) is relatively prime to q.Let fq (h) = q −1 q−1 a=0 f (a/q)e(−ha/q), h = 0, 1, . . ., q − 1 be the Fourier coefficients of f on Z q = {0, 1/q, . . ., (q − 1)/q}, and let E(f ) = fq (0) and f = f − E(f ).Finally, let U be a random variable uniformly distributed on Z q , independent of X 1 , X 2 , . . ., and define Following the proof of Lemma 5 (i) using Fourier analysis on Z q instead of R/Z, we get In particular, C(p/q, f ) ≥ 0 with equality if and only if f = 0 on Z q .For a far reaching generalization of these Fourier analytic expressions for C(α, f ) and C(p/q, f ) to compact groups we refer to [12].We also note that the condition gcd(D, q) = 1 is equivalent to the distribution of X 1 p/q (mod Z) not being supported on a translate of a proper subgroup of Z q ; consequently, the CLT ( 6) and the LIL ( 7) are special cases of the quantitative ergodic theorems for random walks on compact groups in the same paper.
Proposition 4. Assume that E|X 1 | < ∞ and EX 1 = 0, and let α be irrational such that inf h∈Z\{0} |h| γ hα > 0 with some constant 1 ≤ γ < 2. For any f ∈ F , as p/q → α along the sequence of best rational approximations, provided that D = gcd(supp X 1 − supp X 1 ) is relatively prime to all but finitely many q in this sequence.
Proof.By ( 25) and ( 27), we need to prove that Since f is Riemann integrable and ϕ is continuous, we have term by term convergence for any fixed h = 0. Note that f is of bounded variation also on Z q in the sense that From summation by parts we thus get that remains true for the Z q -Fourier transform for all 0 < |h| ≤ q/2.On the other hand, by Lemma 1 (i), for all 0 < |h| ≤ q/2; the second inequality follows from the best rational approximation property hα ≥ qα .Therefore and, as we have seen before, h =0 1/(h 2 hα ) < ∞ is ensured by the assumption γ < 2. The convergence of the series in (28) thus follows e.g. from the dominated convergence theorem.

Approximation by independent variables
The main idea of the proof of Theorems 2 and 3 is that applying suitable small perturbations to the terms of N k=1 f (S k α) introduces independence; the CLT and the LIL then follow from classical results of probability theory.This method goes back to Schatte [26], [27]; we have recently improved [7], and generalized his approach to compact groups [12], [13].In our setup the source of independece is Lemma 6.We approximate the error of the perturbations in Lemma 7, and then finish the proof of Theorems 2 and 3 at the end of the section.
As before, let d = gcd(supp X 1 ) and D = gcd(suppX 1 −suppX 1 ).Let c > 0 be a small constant, to be chosen, and let B n = (D/d) 2 ⌈2 (1/2−c)n ⌉ and B ′ n = (D/d)⌈2 cn ⌉.We choose the sizes of the blocks so that |H n,i | = B n and |J n,i | = B ′ n for all 1 ≤ i < r n , and and consider the block sums ) denote the corresponding perturbed block sums.By Lemma 6 we have that , therefore by Lemma 5 the variance is with implied constants depending only on the distribution of X 1 , α and γ; the same holds for ED * 2 n,i with |H n,i | replaced by |J n,i |.
Proof.We only give a proof for R i=2 (T n,i −T * n,i ), as the proof for R i=2 (D n,i −D * n,i ) is analogous.Replacing X 1 , X 2 , . . .by X 1 /d, X 2 /d, . . .and α by dα, we may assume that d = 1.We may also assume that V (f ) = 1.Since n ≥ 0 is fixed, for the sake of simplicity we can write n , r = r n , ξ i = ξ n,i and δ i = δ n,i .Let c > 0 be a small enough constant to be chosen, and note that min{B, B ′ } = B ′ ≤ √ B + B ′ .Further, let H > 1 be an integer, to be chosen.We start by approximating T i and T * i by two exponential sums.For every 2 ≤ i < r let and let us write

r, and let us apply Lemma 3 to the points
Since X 1 attains positive integers only, these points satisfy for any k = ℓ, and hence j=2 n X j γ + 1.The same holds for the maximum over 2 ≤ R < r, thus by EX 1 < ∞ and the Markov inequality, (30) The last term in the exponent satisfies c(2 − γ)/(8γ) ≤ c.
Using the fact that the vectors (S k α − δ i (mod Z) : k ∈ H i ), 2 ≤ i < r are independent and their coordinates are uniformly distributed on R/Z, we get that E * i , 2 ≤ i < r are independent and EE * i = 0. Letting we have Using the nonnegativity of the Fejér kernel it is also not difficult to see that the Cesàro mean in the previous line has total variation ≤ V (f ), and hence V (g) ≤ 2V (f ) = 2.
From the results seen in the proof of Lemma 5 it thus follows that and hence by the Kolmogorov inequality, after simplifying the exponent, provided that c > 0 is small enough.Combining (30) and (31) we can estimate the error of replacing T i by A i and T * i by A * i , and we obtain Next, fix 1 ≤ R < S < r, and let us estimate where the factor (e(hY i α) − e(h(Y i α − δ i )) is independent of the sum over k ∈ H i .Let F i denote the σ-algebra generated by X k , k ∈ J i−1 and ξ i , and note that Y i and δ i are F i -measurable.We can apply Lemma 4 to the i.i.d.sequence obtained from X 1 , X 2 , . . .by deleting the terms X k , k ∈ J i−1 to estimate the conditional expectation with respect to F i as Here On the other hand, using . The main term of (34) is therefore Taking the (total) expectation of (34) and summing over R + 1 ≤ i ≤ S, we obtain Now fix the indices R + 1 ≤ i < j ≤ S, and let us estimate the off-diagonal term Note that (33) still holds.We can thus derive a factorization of A j − A * j similar to (33), from which we altogether get We now take the expected value of (36).Note that the factor in terms of Y i , δ i , Y j , δ j is independent of the sum over k, ℓ.Observe also, that depends only on h 1 , h 2 but not on i, j.Indeed, since we chose the sizes of the blocks J i−1 , J j−1 defining Y i , δ i , Y j , δ j to be the same for all i, j (within the interval [2 n , 2 n+1 )), the random variable in (37) has the same distribution for all i < j.We have Let a = ϕ(2π(h 1 − h 2 )α) and b = ϕ(−2πh 2 α).Summing (38) over k ∈ H i and ℓ ∈ H j we get From (36), (37) and the estimates |c(h we get by fixing R + 1 ≤ i < S and summing over Applying the classical Diophantine estimate (18), and then summing over R + 1 ≤ i < S we can thus estimate the contribution of the off-diagonal terms i < j.The terms j < i can be estimated similarly, and we finally get Combining ( 35) and (39) we get that for any 1 ≤ R < S < r, and thus by the Rademacher-Menshov inequality [22, Theorem F], Note that r ≪ 2 (1/2+c)n .Applying the Chebyshev inequality, from (32) we finally deduce Choosing H = ⌈2 (3γ−1)n/(4γ 2 −4γ+2) ⌉ the first two error terms have roughly the same order of magnitude.The estimate then simplifies to ≪ 2 (−τ +3c)n nt −1/γ with some τ depending only on γ; moreover, we have τ > 0 whenever 1 < γ < (3 + √ 5)/4.Therefore if c > 0 is small enough depending only on γ, the estimate is ≪ t −1/γ , as claimed.
Proof of Theorem 2. The properties of C(α, f ) were proved in Lemma 5 (i); it remains to prove the CLT (4) and the LIL (5).We may assume that E(f ) = 0 and V (f ) = 1, and that 1 < γ < Applying Lemma 7 with t = m −1 2 εm where, say, ε = c(2 − γ)/16, we get and thus by the Borel-Cantelli lemma Summing over m = 0, 1, . . ., n − 1, we see that replacing T m,i by T * m,i and D m,i by D * m,i , the double sum on the right hand side of (40) changes by O(2 The same holds if we replace T n,i by T * n,i and D n,i by D * n,i in the second sum on the right hand side of (40), and so we get Recall that the variables D * m,i , 2 ≤ i < r m , m = 0, 1, . .., viewed as a single sequence, are independent, mean zero random variables with variance ED * 2 m,i ≪ |J m,i | ≪ 2 cm , see (29).By the strong law of large numbers, the contribution of D * m,i is negligible: and so Here T * m,i , 2 ≤ i < r m , m = 0, 1, . .., viewed as a single sequence, are also independent, mean zero random variables with variance ), the Lindeberg condition [21, p. 292], as well as Kolmogorov's condition for the LIL [21, p. 272] are satisfied, and consequently with σ = C(α, f ).By (41), the CLT (4) and the LIL (5) follow.
Proof of Theorem 3. We may assume that E(f ) = 0 and V (f ) = 1, and that 1 < γ < (3 + √ 5)/4.In the proof of Theorem 2 we wrote 1≤k≤t f (S k α) in the form 1≤k≤t f (S k α) = X(t) + Y (t) with stochastic processes and Y (t) = O(t Applying a theorem of Strassen on the almost sure approximation of sums of independent variables by a Wiener process [28,Theorem 4.4], we get that after a suitable extension of the probability space there exists a stochastic process X(t) in the Skorokhod space D[0, ∞) with the same distribution as X(t), such that X(t) = σW (t) + O(t 1/2−ε ′ ) a.s. with σ = C(α, f ), a standard Wiener process W (t) and some constant ε ′ > 0 depending only on γ.After another extension of the probability space we can introduce independent variables Xk with ε ′′ = min{ε, ε ′ } > 0, as claimed.

Proof of Theorem 1
As before, X 1 , X 2 , . . . is a sequence of i.i.d.nondegenerate integer-valued random variables with maximal span D = gcd(supp X 1 − supp X 1 ) and characteristic function ϕ, and S k = k j=1 X j .Further, p/q is a reduced fraction, and Z q = {0, 1/q, . . ., (q − 1)/q} is the cyclic group of order q.Let ψ disc be as in (1).
First of all note that gcd(D, q) = 1 is equivalent to the distribution of X 1 p/q (mod Z) not being supported on a translate of a proper subgroup of Z q .In particular, S k p/q (mod Z) converges in distribution to the uniform distribution on Z q if and only if gcd(D, q) = 1; this is further equivalent to ψ disc (k) → 0 as k → ∞.
Consider now where the maximum is over all cyclic intervals J ⊆ Z q , and note that ψ disc (k) ≤ ψ * disc (k) ≤ 2ψ disc (k).The rate of convergence in ψ disc and ψ * disc are thus the same; however, using all cyclic intervals is in a sense more natural.Indeed, since the family of all cyclic intervals is translation invariant, adding an arbitrary integer to X 1 does not change the value of ψ * disc (k); it is also not difficult to see that ψ * disc (k) is nonincreasing in k.
Most importantly, to prove Theorem 1 we may assume that D = 1.Indeed, we can first translate X 1 so that Pr(X 1 = 0) > 0; this changes neither |ϕ| nor the order of magnitude of ψ disc (k).But then D | X 1 a.s., and so we can replace X 1 , X 2 , . . .by X 1 /D, X 2 /D, . . ., and p/q by Dp/q.Note that the characteristic function of X 1 /D is ϕ(x/D), and that min 0<|h|≤q/2 |h| • hDp/q ≥ A/D > 0 follows from the assumption gcd(D, q) = 1.This reduces the general case of Theorem 1 to the special case D = 1.

Upper bounds
The second upper bound in Theorem 1 will follow from the following general upper estimate.
Before we give the proof, let us make a few observations about the possible choices of the function g; one could call it a submultiplicative upper envelope of |ϕ(2πx)|.Note first, that since we necessarily have g(0) = 1, log-concavity on some interval [0, r] implies submultiplicativity on the same interval.
Proof of Lemma 8. Let 0 ≤ a < q be an integer, and let w : Z q → {0, 1} be the indicator function of the set {0, 1/q, . . ., a/q}.Its Z q -Fourier coefficients satisfy for any integer 0 < |h| ≤ q/2.Using the Fourier series expansion of w we thus get and consequently we have the Berry-Esseen type inequality For the rest of the proof constants and implied constants will depend only on the distribution of X 1 , c, r, g and A. By the assumption D = 1, the function |ϕ(2πx)| is even and has smallest period 1; in addition, |ϕ(2πx)| = 1 if and only if x ∈ Z. Therefore |ϕ(2πx)| ≤ e −τ whenever x ≥ r/2 with some constant τ > 0, and hence Consider now the continued fraction representation p/q = [a 0 ; a 1 , . . ., a M ], and let p m /q m = [a 0 ; a 1 , . . ., a m ] denote the convergents.Since p/q is reduced, we have p M = p and q M = q.Let us now estimate hp/q <r/2 g( hp/q ) k .
By the assumptions on g, consecutive terms in the previous sum satisfy g((ℓ + 1) q m p/q ) k g(ℓ q m p/q ) k ≤ g( q m p/q ) k ≤ e −ck qmp/q 2 ≤ e −ck/q 2 ≤ e −c .
The terms thus decay exponentially fast, hence 2 q m 1≤ℓ<r/(2 qmp/q ) g(ℓ q m p/q ) k ≪ g( q m p/q ) k q m , and by summing over m we get Here the last term dominates.Indeed, by the assumptions on g, Using q m p/q ≥ 1/(q m+1 + q m ) it is not difficult to see that the terms of the last sum decay exponentially fast as m decreases, showing that the sum is ≪ 1.In the main factor we have p/q = p M /q M .Recalling the identity p m q m−1 − p m−1 q m = (−1) m−1 from the theory of continued fractions, we get that q M −1 p M /q M = 1/q M , hence we altogether deduce The last relation together with (43) shows q−1 h=1 |ϕ(2πhp/q)| k h ≪ g(1/q) k q I {1/q≤r} + e −τ k log q, and the claim follows from the Berry-Esseen type inequality (42).
In the case r = 1/2 (when g is submultiplicative on the whole interval [0, 1/2]) we can repeat the same proof without separating the terms hp/q ≥ r/2 and hp/q < r/2, and obtain ψ disc (k) ≪ g(1/q) k /q.

Lower bounds
The lower bounds in Theorem 1 will follow from the following lemma.
Proof of Theorem 1.As observed at the beginning of Section 4, we may assume that D = 1.First, we prove the upper bounds, starting with the case k ≤ q 2 .In [5] we showed that if α is a badly approximable irrational and EX 2 1 < ∞, then sup 0≤x≤1 | Pr({S k α} ≤ x) − x| ≪ k −1/2 with implied constant depending only on the maximal partial quotient in the continued fraction of α.In fact, the same proof works for a rational p/q in place of α, provided that k ≤ q 2 ; in particular, sup 0≤x≤1 | Pr({S k p/q} ≤ x) − x| ≪ k −1/2 .Since the uniform distribution on Z q also has distance ≪ q −1 ≪ k −1/2 from the Lebesgue measure in the (continuous) Kolmogorov metric, it follows that ψ disc (k) ≪ k −1/2 .Next, let k > q 2 .As observed, we can apply Lemma 8 with g(x) = |ϕ(2πx)| and suitable constants c > 0 and 0 < r ≤ 1/2 to obtain ψ disc (k) ≪ |ϕ(2π/q)| k q I {1/q≤r} + e −τ k log q.
It is easy to see that there exists a constant q 0 depending only on the distribution of X 1 such that for all q ≥ q 0 and all k > q 2 , the second term is negligible compared to the first one; in particular, ψ disc (k) ≪ |ϕ(2π/q)| k /q, as claimed.Finally, we prove the lower bounds.Lemma 9 immediately shows that the claim holds for any k ≤ Cq 2 with some constant C > 0, and also for all k > q 2 .To see the claim on the remaining interval Cq 2 ≤ k ≤ q 2 , simply note that the asymptotics |ϕ(2πx)| 2 = 1 − 4π 2 (Var X 1 )x 2 (1 + o(1)) as x → 0 shows that in this case ψ disc (k) ≥ |ϕ(2π/q)| k 2(q − 1) ≥ (1 − 8π 2 (Var X 1 )/q 2 ) k/2 2(q − 1) ≫ k −1/2 , provided that q is large enough.
In the second step we used the fact that multiplication by p is a bijection of all nonzero remainders modulo q.As observed in the paragraph after Lemma 8, the function |ϕ(2πx)| is submultiplicative on [0, r], and |ϕ(2πx)| ≤ e −τ whenever x ≥ r with suitable constants 0 < r ≤ 1/2 and τ > 0. Following the methods in the proof of Lemma 8, we get It is now easy to see that for all large enough q and k ≥ 1 we have ψ TV (k) ≪ |ϕ(2π/q)| k , as claimed.Indeed, if, say, |ϕ(2π/q)| 2k ≥ 1/2, then the claim follows from the trivial estimate ψ TV (k) ≤ 1.If |ϕ(2π/q)| 2k < 1/2, then we necessarily have k ≫ q 2 , and consequently e −2τ k q ≪ |ϕ(2π/q)| 2k for all large enough q; the claim then follows from (45).This finishes the proof of the upper bound in (2).

d=
X k and ξn,i d = ξ n,i such that X(t) is the same measurable function of the Xk and ξn,i as X(t) is of the X k and ξ n,i .Let Ỹ (t) d = Y (t) be the same measurable function of the Xk and ξn,i as Y (t) is of the X k and the ξ n,i .Then ζ(t) := X(t) + Ỹ (t) has the same distribution as X(t) + Y (t) d = 1≤k≤t f (S k α), and ζ(t) = X(t) + O(t 1/2−ε ) = σW (t) + O(t 1/2−ε ′′ ) a.s.
1/2−ε ) a.s., where ε > 0 is a constant depending only on γ, and n(t) ≥ 0 and 2 ≤ R(t) < r n(t) are integers such that |t − max J n(t),R(t) | ≪ t 1/2−ε .Note that X(t) and Y (t) are measurable functions of the variables X k and the auxiliary variables ξ n,i .