Equidistribution of high traces of random matrices over finite fields and cancellation in character sums of high conductor

Let g$g$ be a random matrix distributed according to uniform probability measure on the finite general linear group GLn(Fq)$\mathrm{GL}_n(\mathbb {F}_q)$ . We show that Tr(gk)$\mathrm{Tr}(g^k)$ equidistributes on Fq$\mathbb {F}_q$ as n→∞$n \rightarrow \infty$ as long as logk=o(n2)$\log k=o(n^2)$ and that this range is sharp. We also show that nontrivial linear combinations of Tr(g1),…,Tr(gk)$\mathrm{Tr}(g^1),\ldots, \mathrm{Tr}(g^k)$ equidistribute as long as logk=o(n)$\log k =o(n)$ and this range is sharp as well. Previously equidistribution of either a single trace or a linear combination of traces was only known for k⩽cqn$k \leqslant c_q n$ , where cq$c_q$ depends on q$q$ , due to work of the first author and Rodgers. We reduce the problem to exhibiting cancellation in certain short character sums in function fields. For the equidistribution of Tr(gk)$\mathrm{Tr}(g^k)$ , we end up showing that certain explicit character sums modulo Tk+1$T^{k+1}$ exhibit cancellation when averaged over monic polynomials of degree n$n$ in Fq[T]$\mathbb {F}_q[T]$ as long as logk=o(n2)$\log k = o(n^2)$ . This goes far beyond the classical range logk=o(n)$\log k =o(n)$ due to Montgomery and Vaughan. To study these sums, we build on the argument of Montgomery and Vaughan but exploit additional symmetry present in the considered sums.


Introduction
Fix F q , the finite field of q elements.We denote its characteristic by char(F q ).Let g ∈ GL n (F q ) be an invertible n × n matrix over F q chosen according to the uniform probability measure.The first author and Rodgers [12,Thm. 1.1] showed that Tr(g k ) equidistributes in F q as n → ∞, uniformly for k ≤ c q n for some sufficiently small c q depending on q, and the rate of convergence is superexponential.However, nothing beyond k = O(n) was known.We describe our three main results.
Theorem 1.1.Let g ∈ GL n (F q ) be chosen uniformly at random.Let k = k(n) be a positive integer such that log k = o(n 2 ).The distribution of Tr(g k ) tends to the uniform distribution on F q as n tends to ∞.
The range log k = o(n 2 ) in Theorem 1.1 is optimal in the sense that we cannot replace it with log k = O(n 2 ).Indeed, if we take k = |GL n (F q )| = n−1 i=0 (q n − q i ) then log k ≍ q n 2 and, by Lagrange's theorem, g k = I n for g ∈ GL n (F q ) and so Tr(g k ) ≡ n does not equidistribute.(This shows Tr(g k ) is periodic in k.) Theorem 1.1 leaves open the question, not explored in this paper, whether the range can be extended to log k ≤ c q n 2 for some constant c q > 0 depending on q; we believe the answer is negative.We also prove a theorem for combination of traces.
Theorem 1.2.Let g ∈ GL n (F q ) be chosen uniformly at random.Let k = k(n) be a positive integer such that char(F q ) ∤ k and log k = o(n).Let a j ∈ F q for j = 1, . . ., k be arbitrary constants with a k = 0. Then the distribution of 1≤i≤k a i Tr(g i ) tends to the uniform distribution on F q as n tends to ∞.
Over F q we have Tr(g char(Fq) ) = Tr(g) char (Fq) and the condition char(F q ) ∤ k is necessary to avoid trivial linear combinations.The range log k = o(n) in Theorem 1.2 is optimal in the sense that it cannot be replaced by log k = O(n), as we now demonstrate by giving an example where 1≤i≤k a i Tr(g i ) (char(F q ) ∤ k, a k = 0) is not equidistributed and log k ≍ q n.Let F q denote an algebraic closure of F q , and let f ∈ F q [T ] be a monic polynomial.Given its factorization f (T ) = deg f j=1 (T − λ j ) over F q , we define its ith (i ≥ 0) power sum symmetric polynomial as (1.1) 2020 Mathematics subject classification: 11L40 (Primary), 60B20, 15B52, 05E05 By Newton's identities p i (f ) is an integral multivariate polynomial in coefficients of f , hence p i (f ) is in fact in F q .Moreover, we have p i (f g) = p i (f ) + p i (g) for any monic f, g ∈ F q [T ].If f (T ) = det(I n T − g) is the characteristic polynomial of a matrix g, then p i (f ) = Tr(g i ).
Let F (T ) := k i=0 a i T i be the product of all monic irreducible polynomials in F q [T ] of degree at most n (so a k = 1, a 0 = 0).Then, for any λ ∈ ∪ n j=1 F q j , Hence k i=1 a i p i (f ) = 0 for any f with deg f ≤ n and for all g ∈ GL n (F q ).Finally, k = deg F ≍ q n by the Prime Polynomial Theorem (see Lemma 3.1).If k happens to be divisible by char(F q ), we can replace F by T F .The question whether the range of Theorem 1.2 can be extended to log k ≤ c q n for some constant c q > 0, for which we believe the answer is negative, remains open and is not explored here.
For certain values of k, for example, primes, we can go beyond the range log k = o(n 2 ) of Theorem 1.1 using the following arithmetic criterion.
Theorem 1.3.Let g ∈ GL n (F q ) be chosen uniformly at random.Let k = k(n) be a positive integer.Suppose as n → ∞.Then, the distribution of Tr(g k ) tends to the uniform distribution on F q as n tends to ∞.
As we shall see later, Theorem 1.1 is a consequence of Theorem 1.3.

Comparison with random matrix theory
Our investigation was motivated by results in random matrix theory, although we do not use any techniques from this area.Let U n (C) be the group of n × n unitary matrices over complex numbers, endowed with Haar measure of total mass 1.A classical result of Diaconis and Shahshahani [9] states that for any k ≥ 1, the vector , where (Z j ) k j=1 are independent standard complex Gaussians.Johansson [18,Thm. 2.6] showed that the rate of convergence of a linear combination of Tr(U i ) to a suitable Gaussian is superexponential in total variation distance.In [19], Johansson and Lambert extended [18] to the total variation distance of X k from Y k uniformly for k ≪ n 2/3−ε , and in [6], Courteaut and Johansson established similar results for orthogonal and symplectic groups.In a recent work, Courteaut, Johansson, and Lambert [7] studied the convergence of Tr(U k )/ √ k to Z k as k varies, obtaining, among other results, that the distance goes to 0 for any k in the range 1 ≤ k < n.As for the complementary range, Rains [24] proved that there is a stabilizing phenomenon once k ≥ n: The eigenvalues of U k become distributed as n independent uniform random variables on the unit circle, and in particular Tr(U k )/ √ n tends in distribution to the standard complex Gaussian.

Symmetric function perspective
Let us also formulate a natural problem in symmetric functions that will turn out to share strong similarities with the equidistribution of Tr(g k ).Let e k (t 1 , . . ., t n ) be the kth elementary symmetric polynomial and p k (t 1 , . . ., t n ) be the kth power sum symmetric polynomial Let X = (X i ) n i=1 be n F q -valued random variables such that e 1 (X), . . ., e n (X) are independent and uniformly distributed on F q .By Newton's identities, p k (X) must also be F q -valued.We ask, is p k (X) close to uniform?
Here is one way to construct such a sequence X.Taking a 1 , . . ., a n to be independent uniform random variables on F q , the polynomial ], the subset of monic polynomials of degree n.Setting (X i ) n i=1 to be its n roots in some order, we have that e j (X) = a j satisfy the conditions above, and p i (f ) defined in (1.1) coincides with p i (X).We prove the following.
i=1 are n random variables such that (e 1 (X), . . ., e n (X)) has the uniform distribution on (F q ) n .Then: ) then the distribution of p k (X) tends to the uniform distribution on F q .
2. Let a 1 , . . ., a k ∈ F q .If log k = o(n), a k = 0 and char(F q ) ∤ k then the distribution of k i=1 a i p i (X) tends to the uniform distribution on F q .
3. If the sum in (1.2) diverges then the distribution of p k (X) tends to the uniform distribution on F q .
Theorem 1.4 is about p k (f ) when we choose f uniformly at random from M n,q , while Theorems 1.1-1.3 are about p k (f ) for a polynomial f drawn from the space of possible characteristic polynomials of a matrix from GL n (F q ) endowed with the uniform measure.As proved in [12,Thm. 1.4], the total variation distance of the law of the first k next-to-leading coefficients of det(I n T − g) from the uniform distribution on F k q tends to 0 as n → ∞ for k as large as n − o(log n), so the setup of Theorem 1.4 is not that different from the setup of Theorems 1.1-1.3.

Cancellation in character sums of high conductor
Given n ≥ 0 we denote by M n,q ⊆ F q [T ] the subset of monic polynomials of degree n.We denote by M q = ∪ n≥0 M n,q the set of monic polynomials in F q [T ].We denote by P n,q ⊆ M n,q the set of irreducible polynomials of degree n and let P q = ∪ n≥0 P n,q .Throughout, the letter P is reserved for elements of P q .
Given k ≥ 1 and a nontrivial additive character ψ : F q → C, we define a function χ k,ψ : M q → C by Remark 1.1.If i is a negative integer, then p i (f ), as defined in (1.1), is well defined as long as T ∤ f .Moreover, the usual properties are preserved: We show in Lemma 2.1 that the function χ k,ψ is a primitive Dirichlet character modulo T k+1 .The following theorem is the main component behind the proof of Theorem 1.1.
Theorem 1.5.We have, uniformly for n ≥ 1 and k ≥ 1, In particular, we have cancellation when log k = o(n 2 ) as n → ∞.
Note that the range log k = o(n 2 ) is optimal because for k = n i=1 (q i − 1) we have γ −k = 1 for every γ ∈ ∪ n i=1 F × q i and thus p −k (f ) ≡ n on {f ∈ M n,q : T ∤ f }.Hence, there is no cancellation for such k.We do not know if cancellation for log k = o(n 2 ) persists if we restrict to P n,q instead of M n,q ; the proof of Theorem 1.5 relies on summing over all degree-n polynomials.
It is instructive to compare Theorem 1.5 to results about character sums for integers and polynomials.We switch temporarily to the integer setting.Let χ be a nonprincipal Dirichlet character modulo m and consider the sum S x,χ := n≤x χ(n) as x → ∞.Montgomery and Vaughan [22,Lem. 2] proved under the generalized Riemann hypothesis for L(s, χ) that if m ≥ x and y ∈ [(log m) 4 , x] is a parameter, then (1.3) Taking y = (log m) 9 , we see that S x,χ exhibits cancellation if log log m = o(log x).Granville and Soundararajan improved the error term in (1.3) and also showed that this range is optimal [13, Cor.A], in the sense that for any given A > 0 and for any prime m, there exists a nonprincipal character χ mod m with |S x,χ | ≫ A x, where x = log A m.Much less is known unconditionally.Burgess proved S x,χ exhibits cancellation when m ≤ x 3−ε , and this can be extended to m ≤ x 4−ε if m is cubefree [4,5].When the conductor is smooth, better results exist [17,Ch. 12].In particular, if m = p r , then Banks and Shparlinski [1] showed one has cancellation in the range log m = o((log x) 3/2 ) (if r ≥ C and p ≤ x c for some absolute C > 0 and c > 0); this improved earlier work of Postnikov [17,Thm. 12.16].It is worthwhile to recall n≤x n it exhibits cancellation when log(|t| + 2) = o((log x) 3/2 ), a result due to Vinogradov [17,Cor. 8.26].The bound we prove for the sum of Now let us return to the polynomial setting.The generalized Riemann hypothesis in F q [T ] is a seminal theorem due to A. Weil [28].Let χ be a nonprincipal Dirichlet character modulo a polynomial In [2, Thm.3], Bhowmick and Lê adapted (1.3) to the polynomial setting, proving unconditionally that (For a self-contained bound on the sum in the right-hand side of (1.4), see Lemma 3.5.) The range log k = o(n 2 ) in Theorem 1.5 is far beyond the ranges that the generalized Riemann hypothesis implies, and heavily exploits a special symmetry satisfied by χ k,ψ , which is shown in Lemma 3.8.We are not aware of any other explicit family of characters, in either integers or polynomials, where the generalized Riemann hypothesis implies cancellation when the conductor exceeds the Montgomery-Vaughan range In §A we prove a new bound on general character sums in function fields which is not used in the paper.

Structure of the paper
In Section 2, we discuss the connection between the distribution of traces and character sums in more detail, and bound the total variation distance between our respective distributions and the uniform distribution by corresponding character sums.By doing so, we reduce Theorem 1.1 and the first part of Theorem 1.4 to Theorem 1.5, and Theorem 1.2 and the second part of Theorem 1.4 to obtaining bounds for the character sum in (1.4).The latter is quite straightforward as shown in Lemma 2.2, while the former requires more careful treatment.We prove Theorem 1.5 in Section 3; we consider this to be the technical part of the paper.We start with observing that there is underlying symmetry in sums of χ k,ψ against primes.To see this, in Lemma 3.8, we prove where Λ(f ) is the function field von Mangoldt function, and k ′ = gcd(k, q n − 1).This identity allows us to improve the Weil bound as shown in Corollary 3.9.This is helpful when gcd(k, q n − 1) is small.Motivated by Montgomery and Vaughan [22,Lem. 2] given S ⊆ {1, . . ., n}, we write (1.6) In our case, choosing allows us to bound the second sum on the right-hand side of (1.6) using (1.5) (see Lemma 3.4) while a sieve bound (Lemma 3.5) bounds the first sum on the right-hand side of (1.6).This strategy ends up proving the following criterion.
Proposition 1.6.Let n ≥ 1 and k ≥ 1.Let ψ : F q → C × be a nontrivial additive character.We have In particular, a sufficient criterion for cancellation is that the sum in (1.2) diverges as n → ∞.
In Section 2, we reduce Theorem 1.3 and the third part of Theorem 1.4 to Proposition 1.6.To deduce Theorem 1.5 from Proposition 1.6, we prove a sharp upper bound on L<d≤2L log gcd(k, q d − 1) in Lemma 3.6, that may be of independent interest.
The criterion above is particularly interesting because it allows us to exhibit cancellation on the left-hand side of (1.7) for arbitrarily large k, for example, whenever k is a prime, or a product of a bounded number of primes, the sum in (1.2) diverges simply because the following shorter sum does: Remark 1.2.Proposition 1.6, and hence Theorem 1.5, apply as is to f ∈Mn,q µ(f )χ k,ψ (f ) where µ is the Möbius function, see Remark 3.3.Recall µ is multiplicative with µ(P ) = −1 and µ(P e ) = 0 for e ≥ 2. However, we do not know that log k = o(n 2 ) is optimal in this case.
Remark 1.3.For every ε > 0, there are examples where log k ∼ εn 2 for which Proposition 1.6 does not yield cancellation, for example, k = ⌊ε ′ n⌋ i=1 (q i − 1) where taking logarithm shows ε ′ > 0 is defined via ε = ε ′2 (log q)/2.Remark 1.4.It is natural to ask whether analogues of Theorems 1.1 and 1.2 hold for other groups.Consider for instance U(n, q), the unitary group over F q 2 .Then, as in Section 2, we are led to consider the sums f ∈M n,q 2 P U (f )χ(f ) where P U (f ) := P g∈U(n,q) (det(I n T − g) = f ) is a function studied in [12, §5] and χ is a Dirichlet character depending on k (if we are in the situation of Theorem 1.1) or on a 1 , . . ., a k (if we are in the situation of Theorem 1.2).The support of P U is restricted since the characteristic polynomial of any matrix from U(n, q) is unitary self-reciprocal, that is, it belongs to Using the strategy in Section 2 together with [12, Thm.5.10], the problem ultimately reduces to bounding character sums over M usr n,q 2 , f ∈M usr n,q 2 χ(f ).We expect that it is possible to adapt our proofs to bound such sums with additional work, although various complications arise, and this problem deserves further study.For a review of character sums over such sets, we refer the reader to [12, §5].

An involution
Given f ∈ M q we use the notation p i (f ) introduced in (1.1).Given a 1 , . . ., a k ∈ F q and an additive character ψ : The Let us explain this idea.We let M gl q := {f ∈ M q : (f, T ) = 1} (this notation, borrowed from [12, §3.1], is motivated by the fact that the characteristic polynomial of any matrix from ∪ n≥0 GL n (F q ) lies in M gl q ).We define an involution ι on M gl q by ι(f 1 for the definition of p −k ).For any function α : M q → C, we define We say that a function α : M q → C is completely multiplicative if α(f g) = α(f )α(g) holds for all f, g ∈ M q and α(1) = 1.If α is completely multiplicative, then so is ι α because ι is (when extended to M q via ι(T ) = 0).Given an arithmetic function α : M q → C, we use the notation we have and, since S(n, α A variant of the following lemma was established in [12, Lem.2.2]. Lemma 2.1.Let a 1 , . . ., a k ∈ F q with a k = 0 and char(F q ) ∤ k.Let ψ : F q → C × be a nontrivial additive character, and let α = ξ a,ψ .Then, ι α is a primitive Dirichlet character modulo T k+1 .
Proof.We know ι α is completely multiplicative, vanishes at multiples of T and ι α (1) = 1.It remains to show that it only depends on the residue of the input modulo T k+1 and that it is primitive.Newton's identities yield that p i (f ) is a function of the i first next-to-leading coefficients of f , where the jth (j ≥ 1) next-to-leading coefficient of T n + n−1 i=0 a i T i is defined to be a n−j if j ≤ n and to be 0 otherwise.Hence, α = ξ a,ψ (f ) only depends on the k first next-to-leading coefficients of f .Since by definition ι(f ) reverses the coefficients of f and normalizes by f (0), it follows that ι α (f ) depends only on the last k + 1 coefficients of f , that is, on f mod T k+1 .Finally, ι α (T k+1 − c) = 1 for every c ∈ F × q while a short computation using Newton's identities shows ι α (T k − c) = ψ(ka k /c) is not equal to 1 for suitable c.

Reduction of Theorem 1.4 to character sum estimates
Recall that X = (X i ) n i=1 ∈ F q n are n random variables such that e 1 (X), . . ., e n (X) are independent and uniformly distributed on F q , and that one can take X i to be the zeros of a polynomial f chosen uniformly at random from M n,q .Let a 1 , . . ., a k ∈ F q and denote by F q the group of q additive characters from F q to C × , with ψ 0 denoting the trivial character.By Fourier analysis on F q , given x ∈ F q , P( Since S(n, ξ a,ψ0 ) = q n , the triangle inequality implies The first part of Theorem 1.4 is immediate from (2.5) and Theorem 1.5.The third part follows from (2.5) and Proposition 1.6.For the second part of Theorem 1.4, we use (2.4) and observe ι ξ a,ψ is a nonprincipal Dirichlet character modulo T k+1 by Lemma 2.1, and the result follows from applying the following lemma.

Reductions of Theorems 1.1-1.3 to character sum estimates
We define an arithmetic function P GL : M q → C as follows.If f ∈ M q and n = deg f , then where GL n (F q ) is endowed with the uniform measure.It follows from the work of Reiner [25] and Gerstenhaber [11] that P GL is multiplicative (in the sense that P GL (f g) = P GL (f )P GL (g) when gcd(f, g) = 1) and supported on M gl q .Their works also show that on prime powers it is given by P GL (T e ) = 0 and P GL (P e ) = q −e deg P e i=1 where P ∈ P q \ {T } and e ≥ 1 (cf.[12,Thm. 3.3]).The following identity is a quick consequence of (2.6) (it can also be derived directly from [12,Thm. 3.4]).Recall |f | = q deg f .Lemma 2.3.We have Let a 1 , . . ., a k ∈ F q .By Fourier analysis on F q , given x ∈ F q , we have The trivial character ψ 0 contributes 1/q to the right-hand side.Recall Summing over x and using the triangle inequality gives In [12,Thm. 3.6], the first author and Rodgers showed that For fixed k this is essentially optimal up to constants, because |S(n, P GL • ξ a,ψ )| cannot decay faster than exponentially in n 2 due to |GL n (F q )| = q Θ(n 2 ) .In this paper, we focus on the range of cancellation rather than the rate of cancellation, and in this aspect, we can do better.Observe that from Lemma 2.3, P GL (g)ξ a,ψ (g).
Since P GL is a probability measure on M j,q , the triangle inequality yields (2.9) Similarly to the symmetric function case, we see that Theorem 1.2 follows from (2.9) and Lemma 2.2.
In the special case a 1 = . . .= a k−1 and a k = 1, we have 3 Proof of Theorem 1.5 The von Mangoldt function Λ : M q → C is defined as Gauss' identity [27, Thm.2.2] states that and is known to imply the following.
Given a Dirichlet character χ its L-function is defined as which converges absolutely for |u| < 1/q.If χ is a nonprincipal character modulo Q, then L(u, χ) is a polynomial of degree at most deg Q − 1 as follows from the orthogonality relations for χ.We set Theorem 3.2 (Weil's RH [28]).Let χ be a nonprincipal Dirichlet character.Factoring L(u, χ) as we have Lemma 3.3.Let χ be a nonprincipal Dirichlet character, then Proof.We take the logarithmic derivative of the Euler product (3.2) and (3.3) and compare coefficients to obtain γ n i for all n ≥ 1.Then, Theorem 3.2 implies the lemma via the triangle inequality.
Let α : M q → C be a completely multiplicative function.Given any subset T of the positive integers N, we define F T (u, α) := Let us write [u m ]F for the mth coefficient of a power series F .Our starting point is the identity where T c = N \ T is the complement of T .

Montgomery and Vaughan's estimate
To bound the second term in (3.5), we prove the following lemma generalizing an estimate of Montgomery and Vaughan [22,Lem. 2] in the polynomial setting.Lemma 3.4.Let α : M q → C be a 1-bounded completely multiplicative function, meaning |α(f )| ≤ 1 for all f ∈ M q .Let n ≥ 1 and S ⊆ {1, . . ., n} be a set of positive integers, and S c = {1, . . ., n} \ S be its complement.Let s 0 = min s∈S s.Then, where Proof.Using identity (3.5) and noticing that we can restrict the set of degrees to {1, . . ., n}, we have Because α is 1-bounded, the coefficients in the series of F S c (u, α) are bounded in absolute value by the respective coefficients of Further, we may write F S (u, α) as It follows that [u j ](F S (u, α) − 1) for 0 ≤ j ≤ n are bounded in absolute value by the coefficients of u j in Putting this together, we have We make a general observation.If the coefficients of a power series Applying this observation with R = 1/q, we get |X| ≤ q n Z S c (1/q)(Z S,α (1/q) − 1). ( We estimate Z S c (1/q) using Lemma 3.1 as Similarly, where s 0 = min s∈S s.The required estimate then follows from (3.7), (3.8), (3.6), and (3.9).
We now move on to estimating the first term in (3.5).

Sieve estimate
Lemma 3.5 (Sieve bound).Let S ⊆ {1, . . ., n} be a set of positive integers, and S c = {1, . . ., n} \ S be its complement.Then Remark 3.2.In the integer setting, the estimate n≤x: (n,k)=1 1 ≪ x p|k (1 − 1/p) for any positive integer k whose prime factors do not exceed x is a classical consequence of Selberg's sieve (see [15] for a discussion of this and an alternative proof).A permutation analogue of Lemma 3.5 was established (in greater generality) by Ford [10,Thm. 1.5].The proof we give is self-contained and is in the spirit of [15,14,10].
and define The coefficients satisfy A i = A i for 0 ≤ i ≤ n.Differentiating (3.10) formally, we see that where Comparing coefficients in (3.11), we obtain and so The trivial bound A n−i ≤ q n−i implies we find that A n ≤ n −1 q n F (1/q) + 5 .
and recalling log n = n d=1 d −1 + O(1), we conclude that as needed.

Bound on a gcd sum
We trivially have B L,k ≪ L min{log k, L log q}, and this bound is optimal when log q k ≫ L 2 .For example, take k = 1≤i<2L (q i − 1) = q Θ(L 2 ) , then q d − 1 | k for all L ≤ d < 2L so that B L,k ≍ L 2 log q.For smaller k, however, this bound is too generous.
Lemma 3.6.Suppose L ≥ log q k.Then B L,k ≪ L √ log k log q.
To prove this result we will need a couple of facts concerning cyclotomic polynomials.Recall that the cyclotomic polynomials (φ n (x)) n≥1 are defined recursively by d|n φ d (x) = x n − 1.They lie in Z[x] and can be written as φ n (x) = j∈(Z/nZ) × (x − e 2πij/n ).This last relation shows that if m and p are coprime where The following lemma is classical but we could not find it explicitly stated in the literature and so we give a proof based on an argument implicit in Roitman [26].
where we used that by definition of ord p (a) a n/p ′ = (a ord p (a) ) Hence, p ′ must be p, and n is indeed ord p (a) times a power of p.This shows A p ⊆ {p i ord p (a) : i ≥ 0}.Finally, the relation φ p i m (a) = φ m (a p i )/φ m (a p i−1 ) for gcd(m, p) = 1 and i ≥ 1 implies p i ord p (a) ∈ A p for every i ≥ 1 since φ m (a p i )/φ m (a p i−1 ) ≡ φ m (a) p i −p i−1 mod p.
We introduce a parameter T ≥ 1.We shall show and then take T = L/ log q k.We consider separately the contribution of e > d/T and e ≤ d/T , obtaining where L≤d<2L p|k, φe(q) e|d e>d/T log gcd(p νp(k) , φ e (q)), B L,k,2 = L≤d<2L p|k, φe(q) e|d e≤d/T log gcd(p νp(k) , φ e (q)).Let A p := {n ≥ 1 : p divides φ n (q)}.By Lemma 3.7, A p is either empty or is a geometric progression with step size p, so that e: p|φe(q) L/T <e<2L and it follows that Next, omitting the restriction p | k and using φ e (q) | q e − 1 ≤ q e , we have This implies (3.13), and thus the statement of the lemma.

Conclusion of proof of Theorem 1.5
Due to the trivial estimate |S(n, χ)| ≤ q n , we may assume n 12(1 + log q k).In view of Proposition 1.6 it suffices to show − d∈S d −1 ≤ log(1 + log q k + log q n) − log n + O(1), where S is as in (3.14).To show this, first observe that − d∈S d −1 = d∈S c d −1 − log n + O(1).Let m ′ = max{⌈12 log q n⌉, log q k}.Since 1 A≥B ≤ log A/ log B, log(q d/3 ) .
By Lemma 3.6, the last d-sum is O( log q k/m ′ ), finishing the proof.

Lemma 3 . 7 .
Fix an integer a.For a prime p define A p := {n ≥ 1 : p | φ n (a)}.If p | a, then A p = ∅.Otherwise, A p = {p i ord p (a) : i ≥ 0} where ord p (a) is the multiplicative order of a modulo p. Proof.If p | φ n (a), then p | a n − 1 since φ n (a) | a n − 1.If p | a, then a n − 1 ≡ −1 mod p, which is a contradiction.Thus, A p = ∅ in this case.Now suppose p ∤ a.By definition, we have p | a ord p (a) − 1. Suppose p | φ n (a) | a n − 1 for some positive integer n, then p | gcd(a n − 1, a ord p (a) − 1) = a gcd(n,ordp(a)) − 1.Since ord p (a) = min{n ≥ 1 : p | a n − 1}, we have gcd(n, ord p (a)) ≥ ord p (a) and thus ord p (a) | n.This shows A p ⊆ {m • ord p (a) : m ≥ 1}.(3.12)Next we show ord p (a) ∈ A p .Since p | a ord p (a) − 1 = e|ordp(a) φ e (a), it follows that A p contains some divisor of ord p (a), which by (3.12) then has to be ord p (a) itself.Next, we want to show that for n ∈ A p , n/ord p (a) is a power of p.If p ′ is a prime dividing n/ord p (a), then ord p (a) | n/p ′ and so