A decoupling interpretation of an old argument for Vinogradov's Mean Value Theorem

We interpret into decoupling language a refinement of a 1973 argument due to Karatsuba on Vinogradov's mean value theorem. The main goal of our argument is to answer what precisely does solution counting in older partial progress on Vinogradov's mean value theorem correspond to in Fourier decoupling theory.

1. Introduction 1.1.Motivation.Let s ě 1 and k ě 2 be integers.For X ě 1, let J s,k pXq be the number of solutions to the degree k Vinogradov system in 2s variables: where all variables x 1 , . . ., x s , y 1 , . . ., y s P r1, Xs X N. Nontrivial upper bounds for J s,k pXq were first studied by Vinogradov in 1935 [32] and such results are collectively referred to as Vinogradov's Mean Value Theorem (VMVT) in the literature.The main conjecture in VMVT, now a theorem as of 2015, was that for every ε ą 0 and s, k P N, one has J s,k pXq À s,k,ε X ε pX s `X2s´k pk`1q 2 q (1.2) for all X ě 1.It is not hard to see that J s,k pXq Á s,k X s `X2s´kpk`1q{2 and applying Hölder's inequality, we may deduce (1.2) for all s P N from the s " kpk `1q{2 case.VMVT plays an important role in understanding Waring's problem and the Riemann zeta function, see for example [11,12,19,34].When k " 2, the main conjecture in VMVT is classical.In 2014, Wooley [35] proved the k " 3 case of VMVT using the method of efficient congruencing (see also [20] for a shorter proof due to Heath-Brown).In 2015, the k ě 2 case was proven by Bourgain, Demeter, and Guth in [3] using Fourier decoupling for the degree k moment curve from which VMVT followed as a corollary.Finally, in 2017, Wooley [36], gave an alternative proof of (1.2) for all k ě 2 using nested efficient congruencing.
After the proofs of VMVT using the Fourier method of decoupling [3] and the number theoretic method of efficient congruencing [36], it has been an interesting question to determine how these two methods are related and whether a "dictionary" between the two methods could be obtained.The study of this dictionary has led to new proofs of Fourier decoupling for the parabola [23], cubic moment curve [15], and the degree k moment curve [16]; these having been inspired from the efficient congruencing arguments in [26,Section 4], [20], and [36], respectively.Additionally, a decoupling interpretation of the study of VMVT over ellipspephic sets [1] led to a proof of Fourier decoupling for fractal sets on the parabola [5].
In this article, we revisit a particular classical VMVT which states that for all X ě 1 and s " kl with l P N.This result should be compared to the supercritical s ě kpk `1q{2 case in (1.2).For s very large compared to k, we have an extra term 1 2 k 2 p1 ´1 k q s{k in the exponent, which decays exponentially in s for every fixed value of k, instead of an ε.The estimate 1 (1.3) appears (for example) in Vaughan's book [31,Chapter 5] and is a refinement of an argument of Karatsuba [22] from 1973 (see also Stechkin [27] from 1975).The loss of the X 1 2 k 2 p1´1 k q s{k comes from combining the subcritical estimate J k,k pXq À k X k , which follows from the Newton-Girard identities, along with an iterative argument to derive estimates for J s,k pXq when s is supercritical.
The main purpose of this paper is to illustrate how this refined argument of Karatsuba can be adapted to give a proof of a non-sharp Fourier decoupling inequality for the degree k moment curve in the supercritical regime.The key difficulty that prevents the direct use of ideas from [15,16,23] is the heavy reliance on solution counting in (1.3).One of the main points of this article is to clarify the role of such solution counting arguments in the study of Fourier decoupling.The mechanism driving the solution counting arguments will allow us to prove the key Lemma 4.4 below, which concerns the geometry of Fourier supports of the functions appearing in our main Theorem 1.1.
Since our goal is to clarify the role of solution counting in Fourier decoupling and Bourgain, Demeter, and Guth have already given the sharpest possible moment curve decoupling theorem in [3], we will work over Q q rather than over R.This will allow us to present the argument in the cleanest possible manner, free of technical difficulties arising from the inconvenience of the uncertainty principle in R k .See also [14] for another decoupling paper that works over Q q rather than R, there however, the authors use the observation that decoupling over Q q is quantitatively more efficient than decoupling over R in terms of exponential sum estimates.
Notation.As k will be fixed, we will allow all constants to depend on k.Given two positive expressions X and Y , we write X À Y if X ď CY for some constant C that is allowed to depend on k.If C depends on some additional parameter A, then we write X À A Y .We write X " Y if X À Y and Y À X.By writing f pxq " Opgpxqq, we mean |f pxq| À gpxq.We say that f has Fourier support in a set Ω if its Fourier transform p f is supported in Ω.To prepare the reader for the myriad of intervals that will occur later in Sections 4 and 5, there will be three types of interval lengths: intervals named with a "K" will be associated to the smallest scale δ, intervals named with a "J" will be associated to the intermediate scale ν « δ 1{k , and intervals named with an "I" will be associated to the largest scale κ « δ ε (though on a first reading, it might be easier to set κ " 1{q).Finally, in the context of the decoupling constant D p pδq, defined in (1.5) below, we call p subcritical if p ă kpk `1q and p supercritical if p ě kpk `1q (rather than the more accurate but slightly more clumsy "not subcritical").1.2.Analysis over Q q and decoupling.Fix a degree k ě 2 and a prime number q with q ą k.We reserve the letter p for the Lebesgue exponent in the main Theorem 1.1.We very briefly review the harmonic analysis over Q q needed to set up the statement of decoupling.See also Section 2 and [14, Section 2] for further discussion surrounding the harmonic analysis and basic geometric facts over Q q that are useful in decoupling.Additionally see Chapters 1 and 2 of [28] and Chapter 1 (in particular Sections 1 and 4) of [33] for a more complete discussion of analysis on Q q .
The field Q q is the completion of Q under the q-adic norm, defined by |0| " 0 and |q a b{c| " q ´a if a P Z, b, c P Zzt0u and q is relatively prime to both b and c.Then Q q can be identified (bijectively) with the set of all formal series Q q " ! 8 ÿ j"k a j q j : k P Z, a j P t0, 1, . . ., q ´1u for every j ě k ) , and the q-adic norm on Q q satisfies | ř 8 j"k a j q j | " q ´k if a k ‰ 0. Strictly speaking we should be writing |¨| q instead of |¨|, but we omit this dependence as q is fixed.The q-adic norm on Q q induces a norm on Q k q , which we denote also by | ¨| by abuse of notation, via |pξ 1 , . . ., ξ k q| :" max 1ďiďk |ξ i |.Of particular importance is the ultrametric inequality: |ξ `η| ď maxt|ξ|, |η|u with equality if |ξ| ‰ |η|.An interval in Q q is then a set of the form tξ P Q q : |ξ ´a| ď ru, where a P Q q and r ě 0; r will then be called the length of the interval.We also will use |I| to denote the length of an interval I.The ring of integers Z q coincides with the unit interval tξ P Q q : |ξ| ď 1u.A cube in Q k q of side length r is then a product of k intervals in Q q of lengths r.We will work with Schwartz functions defined on Q k q (i.e.finite linear combinations of characteristic functions of cubes in Q k q ).The Fourier transform of such a function f will be given by where χ is a fixed element in the Pontryagin dual x Q q of Q q that restricts to the principal character on the additive subgroup Z q and restricts to a non-principal character on the additive subgroup q ´1Z q , x ¨ξ " ř k i"1 x i ξ i if x " px 1 , . . ., x k q and ξ " pξ 1 , . . ., ξ k q, and dx is the Haar measure on the additive group Q k q normalized so that One key property of the Fourier transform that we will use is that x 1 Zq " 1 Zq , that is, the Fourier transform of the unit ball is the unit ball, see [33, p.42] for a proof.
For δ P q ´N and any interval I Ă Q q with length ě δ, let P δ pIq be a partition of I into intervals of length δ.Write P δ for P δ pZ q q.To each interval I Ă Z q , one associates a parallelepiped ˆ¨¨¨ˆ|I| k where a P I; this parallelepiped is independent of the choice of a P I.Note that Ť KPP δ θ K is a covering of a δ k neighborhood of the unit moment curve (in fact it covers a suitable anisotropic neighborhood of that curve).One also associates to each K P P δ a cube τ K :" tpξ 1 , . . ., ξ k q P Q k q : |ξ j ´aj | ď δ for all 1 ď j ď ku (1.4) of side length δ, where a P K; again this is independent of the choice of a P K.Note that for each K Ă P δ , the ultrametric inequality gives that θ K Ă τ K .For an interval I Ă Z q , let f I be defined such that p f I :" p f ¨1IˆQ k´1 q .For p ě 2 and δ P q ´N, let D p pδq be the smallest constant such that the inequality holds for every Schwartz function f on Q k q with its Fourier transform p f supported on Bourgain, Demeter, and Guth [3] showed that and this estimate is sharp.Strictly speaking [3] proves a decoupling theorem over R rather than over Q q , but the same proof can be used to derive (1.6).Choosing f to be a sum of Dirac deltas immediately implies (1.2).
1.3.The main result.By interpreting the refinement of Karatsuba's argument for (1.3) into decoupling language, our main result is then the following Fourier decoupling analogue of (1.3).In the same way (1.3) is a weaker partial result towards (1.2), Theorem 1.1 and Corollary 1.2 should be viewed as the analogous weaker counterpart of the sharp bound (1.6).
Theorem 1.1.Let p 0 P 2N be an even integer and let cpp 0 q ě 0 be such that for all δ P q ´N (1.7) where C 1 is independent of δ.If p P p 0 `2kN and 0 ă ε ă 1, then ´kpk`1q 2p q´c pp 0 q p p1´1 k q p{p2kq ´ε for all δ P q ´N (1.8) where app, p 0 q :" p p ´p0 (1.9) Since D p pδq ě 1 for all p, (1.7) implies that cpp 0 q, k, and p 0 are such that It is also known that D 2k pδq À ε δ ´ε for any ε ą 0, see for example [8,Exercise 11.19] for the Euclidean case; we provide a proof for the case over Q q in the appendix for the convenience of the reader.We also remark that [21] proved, in the case of local fields, a related square function estimate with a bound independent of δ if the f K 's are Fourier supported in a δ k neighborhood of γpKq; see also [13] and [2] for similar estimates.Choosing p 0 " 2k and cpp 0 q " k 2 {2 `ε for any ε ą 0 in applying Theorem 1.1 we obtain: Corollary 1.2.Let p P 2kN and 0 ă ε ă 1.Then p1´1 k q p{p2kq ´ε for all δ P q ´N where the implied constant in the exponent of q is absolute (and independent of k).
The exponent of q in Corollary 1.2 is more precisely app,2kq p " p 1 2k ´1 p q k 2 `9k´4 2 `1 4 p p 2k ´1q, but we opt to write it as above since it more clearly illustrates what the main terms are.Note that the hypothesis in Theorem 1.1 is always satisfied if p 0 is any fixed exponent ě 2 and cpp 0 q is chosen large enough.One can view Theorem 1.1 as a way of upgrading trivial l 2 L p 0 decoupling at say some subcritical p to l 2 L p decoupling for all large p with only a loss that decreases exponentially as p Ñ `8.Of course, if one already knew the sharp estimate in the critical p 0 " kpk `1q case, then Theorem 1.1 implies that we know the sharp decoupling estimate for all p P kpk `1q `2kN.However this already follows from interpolating the critical estimate with the trivial l 2 L 8 decoupling estimate.
Though Corollary 1.2 implies (1.3) with an extra X ε that comes from the δ ´ε factor in Corollary 1.2, Corollary 1.2 is more general and this extra δ ´ε term comes from needing some additional uniformity in the case of the general f Fourier supported in Ť KPP δ θ K and an application of the broad-narrow argument to get around the use of the Prime Number Theorem in the proof of (1.3) (see Section 4.1.1).See Sections 3.5 and 5.1 for some more discussion comparing the VMVT case and the general f decoupling case.
We end with some discussion about how the proof of Corollary 1.2 (and Theorem 1.1) contrasts with modern decoupling proofs of degree k moment curve decoupling [3,16] which prove (1.6).Unlike the argument in [3,16], we are missing any lower dimensional decoupling input and while we do use induction on scales, the iteration itself is unique in that it iterates on the p in l 2 L p decoupling.Schematically, the iteration to prove Theorem 1.1 controls l 2 L p decoupling by l 2 L p´2k decoupling at a larger scale.After Opp{kq steps, we are reduced to l 2 L 2k decoupling for the degree k moment curve which follows (essentially) from the Newton-Girard identities.The iteration is surprisingly efficient when it controls l 2 L p decoupling by l 2 L p´2k decoupling as long as both p and p ´2k are supercritical.However after about 1 2k pp ´kpk`1q 2 q steps, we enter the subcritical regime for which the iteration becomes inefficient and this is why we accrue an additional δ ´k2 2p p1´1 k q p{p2kq term.When k " 2, the argument for Corollary 1.2 uses Oppq steps to prove a weak non-sharp l 2 L p decoupling estimate.This is to be compared to the modern proof of decoupling for the parabola where to prove the sharp critical l 2 L 6 decoupling, one uses Opε ´1q many steps (see for example the proof of [23,Lemma 2.12]).In the harmonic analysis literature, iterating on p is not a new idea as such an argument was already used by Drury [9] to prove cubic moment curve restriction, though we believe this is the first time such an argument has appeared in the decoupling literature.See also [25] by the fourth author for a similar idea in the additive combinatorics literature which was recently used to obtain diameter free estimates for the quadratic VMVT.Additionally, at each iterative step, three scales are key: the smallest scale δ, the intermediate scale δ 1{k , and the largest scale 1 (though strictly speaking in our proof the largest scale is actually δ ε rather than 1 for technical reasons).This can be compared to [3,16] which uses scales δ, δ ε and 1.
This paper is organized as follows: In Section 2, we review some basic geometric and harmonic analysis facts in Q q that will be used throughout this paper.In Section 3, we review the refinement of the 1973 argument of Karatsuba at a high level.In Section 4, we prove Lemma 4.2 which is the main lemma that is used to prove Theorem 1.1.This is accomplished via combining a standard broad-narrow argument in Section 4.1.1 and some geometric properties of the moment curve that use the Newton-Girard identites, see Lemma 4.4.In Section 5, we dyadically pigeonhole to obtain some uniformity in our estimates and prove Theorem 1.1 and Corollary 1.2.Finally, in the appendix, we include a proof of D 2k pδq À ε δ ´ε for completeness.
Acknowledgements.This question was first posed to the third and sixth author by Shaoming Guo when the third author was visiting the Department of Mathematics at the Chinese University of Hong Kong in July 2019.This question was posed again by Shaoming Guo during a problem session at the Arithmetic (and) Harmonic Analysis workshop held (virtually) at the Mittag-Leffler Institute in early June 2021 and this current collaboration arose from that particular workshop.
KH is supported by the Additional Funding Programme for Mathematical Sciences, delivered by EPSRC (EP/V521917/1) and the Heilbronn Institute for Mathematical Research, ZL is supported by NSF grant DMS-1902763, AM is supported by Ben Green's Simons Investigator Grant, ID 376201, OR is supported by the joint FWF-ANR project Arithrand: FWF: I 4945-N and ANR-20-CE91-0006, and P-L.Y is supported by a Future Fellowship FT20010039 from the Australian Research Council.ZL would also like to thank the National Center for Theoretical Sciences (NCTS) in Taipei, Taiwan for their kind hospitality during his visit, where part of this work was written.We also acknowledge kind support from the American Institute of Mathematics through the Fourier restriction research community.

Wavepacket decomposition and some basic geometric facts
Throughout this paper, we will make use of wavepacket decomposition which allows us to decompose a function f , which is Fourier supported in some θ K , into linear combinations of indicator functions of translates of the parallelpiped "dual" to θ K .That the q-adic character χ is trivial on Z q gives a much cleaner wavepacket decomposition when working over Q q than over R. See [30, Section 3] or [17,Section 2.4] for some discussion about wavepacket decomposition over R in the context of the paraboloid (though the same ideas apply for the degree k moment curve).
Fix δ P q ´N.It will be convenient to introduce the shorthand θ δ :" δZ q ˆδ2 Z q ˆ¨¨¨ˆδ k Z q and T δ :" δ ´1Z q ˆδ´2 Z q ˆ¨¨¨ˆδ ´kZ q .
They are dual to each other in the sense that T δ " tx P Q k q : |x ¨ξ| ď 1 for all ξ P θ δ u.Since for any 1 ď j ď k, any interval in Q q of length δ j is the disjoint union of δ ´pk´jq many intervals of length δ k , it follows that θ δ is the disjoint union of δ ´kpk´1q 2 many cubes of side lengths δ k in Q k q .Similarly, any cube in Q k q of side length δ ´k is a disjoint union of δ ´kpk´1q 2 many translates of T δ .Now for a P Z q , let M a be the k ˆk lower-triangular matrix given by M a " pγ 1 paq γ 2 paq ¨¨¨γ pkq paqq where we view γ pjq paq as a column vector.Then for any K P P δ , we have for any a P K.In fact, the right hand side is independent of a P K since if b P K, then γpbq " γpaq `k ÿ j"1 pj!q ´1γ pjq paqpb ´aq j P γpaq `Ma θ δ , and where the second matrix on the right hand side preserves θ δ " δZ q ˆδ2 Z q ˆ¨¨¨ˆδ k Z q (here we have used the fact that |k!| " 1 in Q q since q ą k).
For K P P δ and any a P K, let T 0,K be the dual parallelepiped to θ K centered at the origin given by T 0,K " tx P Q k q : |x ¨pξ ´γpaqq| ď 1 for all ξ P θ K u.Using (2.1), it is not hard to see that T 0,K " tx P Q k q : |x ¨γpjq paq| ď δ ´j for all 1 ď j ď ku " tx P Q k q : M T a x P T δ u " M ´T a T δ for any a P K.This parallelepiped depends only on K but not on the choice of a P K, since (2.2) shows that where Opδ j q is some number in Q q with norm ď δ j , and the second matrix on the right hand side is a bijection that preserves T δ by the ultrametric inequality.
Lemma 2.1.Let δ P q ´N and fix K P P δ .Then (i) θ K ´θK is the disjoint union of δ (ii) every cube of side length δ ´k in Q k q is the disjoint union of δ ´kpk´1q 2 many translates of T 0,K .
Proof.(i) Recall that θ δ is the disjoint union of δ ´kpk´1q 2 cubes of side lengths δ k .Since M a is a bijection that maps cubes of side length δ k to cubes of side length δ k for any a P K, and θ K ´θK " M a θ δ for any a P K, the assertion follows.Note that θ K ´θK is just a translation of θ K to the origin.(ii) Recall that any cube in Q k q of side length δ ´k is a disjoint union of δ ´kpk´1q 2 many translates of T δ .Since M ´T a is a bijection that maps cubes of side length δ ´k to cubes of side length δ ´k for any a P K, and T 0,K " M ´T a T δ for any a P K, the assertion follows.
From Lemma 2.1(ii), we may deduce that translates of T 0,K tile Q k q ; we denote the collection of such translates by TpKq.We are now ready to state the version of wavepacket decomposition that we will use.Lemma 2.2 (Wavepacket decomposition).Let δ P q ´N and fix K P P δ .Let g be a Schwartz function with Fourier transform supported in θ K .Then |g| is constant on every T P TpKq, and y g1 T is supported on θ K for every T P TpKq.Hence it is natural to write where each term g1 T (which we will call a "wavepacket") is Fourier supported on θ K and has constant modulus on every T P TpKq.It also follows that if T is any subset of TpKq, then ř T PT g1 T is Fourier supported in θ K .
Proof.First, to prove that |g| is constant on any translates of T 0,K , one only needs to prove the case when δ " 1, K " Z q , and then apply a change of variables, but we opt for a more explicit proof.We will show that |gpxq| is constant for all x P A `T0,K for any A P Q k q .By Fourier inversion we have that where we have used that y 1 ¨t P Z q , and so χpy 1 ¨tq " 1.The right hand side is then independent of y 1 and so the above equality is true for all x P A `T0,K .In particular this shows that |g| is a constant on A `T0,K .This constant depends on K, g and A, but is a constant nonetheless.Next, to prove that y g1 T is supported on θ K , it suffices to observe that y g1 T " p g ˚x 1 T , and that x 1 T is supported on θ K ´θK for every T P TpKq: in fact, for every T P TpKq, x 1 T is a modulation of z 1 T 0,K , and if a is any point in K, then T 0,K " M ´T a T δ .It follows that is supported on M a θ δ " θ K ´θK .Finally, the decomposition (2.3) follows since parallelepipeds in TpKq tile Q k q .This completes the proof of the lemma.

Sketch of the Karatsuba argument
Before we dive into the proof of Theorem 1.1, we review the proof of (1.3) with an eye towards interpreting each step into decoupling language.See also, for example, [31, Section 5.1] or [29, Theorem 13 -Lemma 21] for more details of the number theoretic argument.Just for this section, we revert back to calling p a prime so as to best match these references.

3.1.
Step 1: Introducing some p-adic separation.Given X ě 1, one finds, using the Prime Number Theorem, a prime p " X 1{k such that J s,k pXq is controlled by J s,k pX, pq, where J s,k pX, pq is defined to be the number of solutions px 1 , . . ., x s , y 1 , . . ., y s q P pr1, Xs X Nq 2s to (1.1) with the additional condition that x 1 , . . ., x k are pairwise distinct mod p and y 1 , . . ., y k are pairwise distinct mod p.Since p is rather large, this is a rather mild condition and so we heuristically should still expect J s,k pXq « J s,k pX, pq.The benefit of this extra p-adic separation (transversality) in these 2k variables is that we will get to apply Linnik's Lemma (in Step 3, (3.3) below) which will up to permutation uniquely determine these variables.

3.2.
Step 2: Applying the union bound/Hölder.We now write J s,k pX, pq as ż and apply Hölder's inequality to control the above by Denote the integral above to be J s,k pX, p, aq.This expression counts the number of solutions px 1 , . . ., x s , y 1 , . . ., y s q P pr1, Xs X Nq 2s to (1.1) with x 1 , . . ., x k pairwise distinct mod p, y 1 , . . ., y k pairwise distinct mod p, and x k`1 " ¨¨¨" x s " y k`1 " ¨¨¨" y s " a pmod pq.

3.3.
Step 3: Solution counting.Translation invariance of the Vinogradov system implies that we may bound J s,k pX, p, aq by J s,k pX, p, 0q.Rearrange the Vinogradov system (1.1) as where x 1 , . . ., x k are distinct mod p and y 1 , . . ., y k are distinct mod p and since we are considering J s,k pX, p, 0q, we have that x k`1 , . . ., x s , y k`1 , . . ., y s " 0 pmod pq.Each choice of x 1 , . . ., x k , y 1 , . . ., y k gives ď J s´k,k pX{pq many solutions to px k`1 , . . ., x s , y k`1 , . . ., y s q.To see this, write the count for (3.2) as an integral and use the triangle inequality; the basic idea being that shifts of the Vinogradov system can only give fewer solutions.
Next, fixing one of the at most J s´k,k pX{pq many tuples px k`1 , . . ., x s , y k`1 , . . ., y s q, how many valid x 1 , . . ., x k , y 1 , . . ., y k are there?Since requiring y 1 , . . ., y k to be distinct mod p is a rather mild condition, there are ď X k such py 1 , . . ., y k q.Any valid px 1 , . . ., x k q P pr1, Xs X Nq k must satisfy where the x i are pairwise disjoint mod p for some H j that depends on py 1 , . . ., y k q (of which there are ď X k many possibilities) and px k`1 , . . ., x s , y k`1 , . . ., y s q (of which there are ď J s´k,k pX{pq many possibilities).Since p k " X, instead of counting integers between 1 and X, we can count the x i mod p k .Thus it remains to count the number of residue classes px 1 pmod p k q, . . ., x k pmod p k qq such that and x i pmod p k q are pairwise distinct mod p. Linnik's Lemma [24] then says that there are at most k!p kpk´1q{2 many such k-tuples of residue classes and the proof follows from first upgrading all residue classes mod p j in (3.3) to mod p k (by paying a cost of p kpk´1q{2 ) and then using the Newton-Girard identities which essentially uniquely determine the x 1 , . . ., x k (up to permutation).This bound is efficient since probabilistic heuristics suggest that we should expect « pp k q k {p kpk`1q{2 " p kpk´1q{2 many solutions.Thus we have that

3.4.
Step 4: Iteration.Putting Steps 1 to 3 together we obtain the iteration that Running this iteration about Ops{kq many steps reduces to an estimate on J k,k pXq from which one can easily compute there are OpX k q many solutions by the Newton-Girard identities.The iteration (3.5) is sharp if both s and s ´k are supercritical.If they are, then heuristically, we expect J s,k pXq « X 2s´kpk`1q{2 and J s´k,k pX{pq « pX{pq 2ps´kq´kpk`1q{2 .Then the right hand side of (3.5) becomes X 2s X ´3k{2´k 2 {2 p k 2 which is equal to X 2s´kpk`1q{2 since p " X 1{k .However, both sides are not the same if one of s or s ´k is subcritical.This is where the inefficiency of X k 2 2 p1´1 k q s{k comes from.

3.5.
Interpreting Steps 1-4 into decoupling.Having briefly summarized the number theoretic argument into four steps, we now briefly sketch the main points to interpret into decoupling.First we discuss the scales needed in the proof.From Steps 1 and 3, there are three scales: the largest scale X, the intermediate scale p " X 1{k , and the smallest scale 1. Correspondingly in our proof, we use three scales: the smallest scale δ, the intermediate scale ν :" q tlog q δ 1{k u " δ 1{k , and the largest scale 1.For some technical reasons surrounding the broad-narrow reduction, in lieu of the scale 1, we will actually use the scale κ :" q tlog q δ ε u where ε is as in (1.8).Next, we discuss the reduction to the decoupling analogue of (3.1).In Step 1, two residue classes being distinct mod p means they are p-adically separated by a distance 1 and so this should correspond to two intervals which are 1-separated.To get around the use of the Prime Number Theorem, we make use instead of broad-narrow reduction due to Bourgain and Guth in [4] which will allow us to reduce to controlling a multilinear decoupling expression.
Third, the loss of p 2s´2k in Step 2 above deserves some mention.This loss comes from essentially having applied the union bound Heuristically we expect this inequality to be efficient since each ř n"a pmod pq contributes equally to the entire sum as the exponential sum should not bias one residue class mod p over another.This however is not necessarily true in the decoupling case and will require us to obtain some extra uniformity via dyadic pigeonholing, see Section 5.1, later.
Finally, to interpret the solution counting Step 3, we make use of the simple identity ż which converts the integral of f into a question of whether 0 is contained in the support of p f .This is done in Lemma 4.4 below and the proof relies on the Newton-Girard identities, much like in the proof of Linnik's Lemma.This part of the argument requires that p is even and is reminiscent of a Córdoba-Fefferman argument (see for example [8, Section 3.2] or [6,7,10]).

The main lemma
One standard property about the moment curve decoupling constant that we use is affine rescaling.This property plays the analogue of translation-dilation invariance of the Vinogradov system (1.1).Lemma 4.1 (Affine rescaling).Let g be a Schwartz function on Q k q Fourier supported in Ť KPP δ θ K .Then for any interval I Ă Z q of length κ ě δ, we have Proof.This proof is standard and follows from a change of variables which can be found for example in [8, Section 11.2].
Our main lemma in proving Theorem 1.1 is the following: Lemma 4.2.Let p P 2k `2N, δ P q ´N and κ P q ´N X rδ, 1q.Let ν " q tlog q δ 1{k u P q ´N so that ν ď δ 1{k .If g is a Schwartz function with Fourier support in Ť KPP δ θ K , then we have ż where N is the number of J P P ν for which g J ‰ 0 and C depends only on k and p.
Here κ is a somewhat technical parameter that is chosen to be roughly δ ε later in Section 5.However, on a first reading, it might be more convenient for the reader to take κ " 1{q to better grasp the moving parts of the argument.The somewhat non-standard decoupling right hand side in Lemma 4.2 is reminiscent of the right hand side used in Theorem 1.2 of [18].To give more context to the above lemma, the following estimate is true: Lemma 4.3.For any p ą 2k, we have where N is as defined in Lemma 4.2.
Proof.Hölder's inequality gives us q q , and so, applying with a K " b K " }g K } L 8 pQ k q q and c K " }g K } L p´2k pQ k q q , we get p ÿ It remains to observe that Suppose for a moment that in Lemma 4.3, we had an equality instead of an inequality.This is indeed the case when gpxq is the exponential sum X ´100k 2 1 |x|ďX 100k ř X j"1 epγpjq ¨xq that arises in using decoupling to estimate the number of solutions in (1.3).As N ď ν ´1 (and taking, for convenience, κ " 1{q), Lemma 4.2 would give us Heuristically, we expect this iteration to be efficient as long as p ´2k (and so also p) is supercritical.
To see this, if r is supercritical, then we heuristically expect that D r pδq r « δ ´r 2 `kpk`1q 2 for all δ.Thus the iteration should be efficient if with this assumption on the size of D r pδq r , both sides of (4.1) are the same.The right hand side of (4.1) is then which is comparable to the left hand side of (4.1).A similar calculation shows that this iteration is not efficient if at least one of p or p ´2k is subcritical.
Unfortunately the reverse inequality in Lemma 4.3 fails to hold for general g.This is because we lack the uniformity in the exponential sum that one considers when one counts solutions to the Vinogradov system.This uniformity can be restored by pigeonholing, which only produces δ ´ε losses.This pigeonholing must be done before one applies induction on scales and iterates on the Lebesgue exponent p.The full argument is carried out in detail in Section 5.

4.1.
Proof of Lemma 4.2.The proof of Lemma 4.2 uses a broad/narrow dichotomy, due to Bourgain and Guth [4] combined with some basic geometric geometric properties of the moment curve.See also for example [8,  As a result, we obtain the pointwise bound that for each x P Q k q , we have which, upon raising both sides to power 2k and applying pA `Bq 2k ď 2 2k´1 pA 2k `B2k q (a consequence of the convexity of x Þ Ñ x 2k ), yields Using this pointwise bound while integrating we find that ż for some C depending on k.Hölder's inequality followed by Minkowski's inequality implies that the first term satisfies for some C 1 that depends on k and p.The last inequality uses Young's inequality and the fact that p ě 2k.Therefore, ż Using affine rescaling (Lemma 4.1) and applying the definition (1.5) of our decoupling constant, we deduce that ´ÿ IPPκ }g I } 2 Plugging this into the above yields ż This inequality (4.3) is the analogue of Step 1 in Section 3.1.The requirement that we analyze solutions to the Vinogradov system with x 1 , . . ., x s and y 1 , . . ., y s being distinct mod p corresponds to the requirement that we analyze ş with dpI i , I j q ą κ for all 1 ď i ‰ j ď k with κ " 1{q.
Next, we mimic Step 2 in Section 3.2.Recalling our definition of N in the statement of Lemma 4.2, Hölder's inequality gives Applying this in the second term in (4.3), we get ż which is the analogue of Step 2 in Section 3.2.
To analyze the second term in (4.4), we fix I 1 , . . ., I k P P κ with dpI i , I j q ą κ for all i ‰ j, and fix J P P ν with g J ‰ 0. To estimate the integral ş , first note that the Fourier transform of |g J | 2 " g J g J is supported in the parallelepiped θ J ´θJ , of dimension ν ˆν2 ˆ¨¨¨ˆν k .Since our hypothesis guarantees that p ´2k is an even positive integer, the same is true for the Fourier transform of |g J | p´2k .Lemma 2.1(i) applied to J P P ν instead of K P P δ shows that the Fourier support of |g J | p´2k is the disjoint union of ν ´kpk´1q{2 many cubes of side lengths ν k , and we denote this collection of cubes by tlu.This corresponds to the fact that we have a k-tuple of residue classes pH 1 pmod pq, H 2 pmod p 2 q, . . ., H k pmod p k qq which we can upgrade to p kpk´1q{2 many k-tuples of the form pH 1 1 pmod p k q, H 1 2 pmod p k q, . . ., H 1 k pmod p k qq.Note that the side length ν k of the cubes l is ď δ.
We now apply Fourier inversion and turn products into convolutions.We have ż For each fixed l and K1 P P δ pI 1 q, . . ., Kk P P δ pI k q, let Sp K1 , . . ., Kk , lq be the set of all pK 1 , . . ., K k q with K i P P δ pI i q such that 0 We will prove in Lemma 4.4 below that #Sp K1 , . . ., Kk , lq ď pqκq ´kpk´1q .If we think of the model case when κ " 1{q, this would say that the Ki and l uniquely determine the K i in (4.5).This analogous to the situation in Linnik's lemma where once we upgrade (3.3) to residue classes mod p k , the remaining variables are essentially uniquely determined.We now write ż Applying affine rescaling shows that this is One can think of (4.6) as the analogue of (3.4) in Section 3.3 in the following way: the term ν ´kpk´1q{2 pqκq ´kpk´1q max KPP δ }g K } k 8 plays the role of p kpk´1q{2 from Linnik's lemma, the term p ř KPP δ }g K } 8 q k plays the role of X k , and finally the term plays the role of the J s´k,k pX{pq.
Plugging (4.6) back to (4.4), we then obtain ż 4.1.2.Geometry of the moment curve.The proof of Lemma 4.2 is now complete modulo the proof of the following lemma, which provides the key geometric input that enables one to count #Sp K1 , . . ., Kk , lq.This is the analogue of Linnik's Lemma ([29, Corollary 17] and the estimate for Bpgq in the proof of [31, Lemma 5.1]); see also [13,Proposition 1.3] and [2, Proposition 3.1].
Both proofs use the Newton-Girard identities in essentially the same way.The hypothesis that q ą k, where q is the characteristic of our base field Q q and k is the degree of the moment curve, plays a role in the following lemma.
Lemma 4.4.Let p P 2k `2N, δ P q ´N, κ P q ´N X rδ, 1q, and ν " q tlog q δ 1{k u P q ´N so that ν ď δ 1{k .Suppose that I 1 , . . ., I k P P κ with dpI i , I j q ą κ for all i ‰ j.Let l be a cube of side length ν k and K1 P P δ pI 1 q, . . ., Kk P P δ pI k q.Define Sp K1 , . . ., Kk , lq be the set of all ordered k-tuples pK 1 , . . ., K k q with K i P P δ pI i q such that Then #Sp K1 , . . ., Kk , lq ď pqκq ´kpk´1q .
Proof.Assume for the sake of contradiction that #Sp K1 , . . ., Kk , lq ą pqκq ´kpk´1q ě 1.We can find two k-tuples of intervals pA 1 , . . ., A k q and pB 1 , . . ., B k q with each A i , B i P P δ pI i q such that 0 P supppy g A 1 ˚¨¨¨˚y g A k ˚y g K1 ˚¨¨¨˚y g Kk ˚p { |g J | p´2k 1 l qq, (4.7) and such that there exists an i 0 with dpA i 0 , B i 0 q ą pqκq ´pk´1q δ.Indeed, if not, picking an arbitrary pC 1 , . . ., C k q P Sp K1 , . . ., Kk , lq, shows that any other pD 1 , . . ., D k q P Sp K1 , . . ., Kk , lq must satisfy dpC i , D i q ď pqκq ´pk´1q δ.This gives at most pqκq ´kpk´1q many k-tuples which violates our initial assumption that #Sp K1 , . . ., Kk , lq ą pqκq ´kpk´1q .Without loss of generality, we may assume that i 0 " 1.Since for each i " 1, 2, . . ., k, we have A i , B i Ă I i and dpI i , I j q ą κ for all i ‰ j, this implies dpA i , A j q ě qκ, dpB i , B j q ě qκ, dpA i , B j q ě qκ whenever j ‰ i (4.9) (thus the only distances we do not have any control over are the ones of the form dpA i , B i q, i ‰ 1).By (4.7) and (4.8), we have that where here we recall the definition of τ K in (1.4).Each τ A i , τ B i , and τ Ki are cubes in Q k q of side length δ and l is a cube in Q k q of side length ν k ď δ.Thus by the ultrametric inequality, both ř k i"1 τ A i ´řk i"1 τ Ki `l and ř k i"1 τ B i ´řk i"1 τ Ki `l are cubes in Q k q of side length δ.Furthermore, by the ultrametric inequality, since two cubes of side length δ are either completely disjoint or exactly the same, we must have and hence Therefore (after another application of the ultrametric inequality) there exists ξ A i P A i and ξ B i P B i such that for j " 1, 2, . . ., k.
Applying this conclusion to (4.15) then yields that pqκq k´1 |ξ A 1 ´ξB 1 | ď δ.But this contradicts the fact that dpA 1 , B 1 q ą pqκq ´pk´1q δ.Therefore we must have #Sp K1 , . . ., Kk , lq ď pqκq ´kpk´1q which completes the proof of the lemma.D p pδ 0 q (5.1) instead of D p pδq as D p pδq is defined for all real δ P p0, 1s (rather than just for δ P q ´N) and is monotonic, that is, D p pδ L q ď D p pδ S q if δ L ě δ S .
Proposition 5.1.For even integers p ą 2k, there exists a constant C ą 0, depending only on k and p, such that for every 0 ă ε ă 1, we have for all 0 ă δ ă 1.
Proof.To bound D p pδq p , suppose 0 ă δ ă 1 and δ 0 P q Z with δ 0 P rδ, 1s.We need to bound D p pδ 0 q p by decoupling down to frequency scale δ 0 .Let f be a Schwartz function on Q k q with Fourier support in . We want to prove the existence of C ą 0 so that for any 0 ă ε ă 1, ż ¯p{2 . (5.3) In fact, we will prove that for any translate Q of B δ ´k 0 :" tx P Q k q : |x| ď δ ´k 0 u, we have ż (5.4) The estimate (5.3) then follows by summing over all such Q's that tile Q k q , and applying Minkowski's inequality to bring an ℓ p{2 norm over Q on the right hand side into the sum over K P P δ 0 .
Thus we now turn to the proof of (5.4).Note that for any translate Q of B δ ´k 0 , we have that As a result, to prove (5.4), it suffices to prove (5.3) under the additional assumption that f is supported on Q.Since f is an arbitrary Schwartz function with Fourier support in Ť KPP δ 0 θ K , we may assume Q " B δ ´k 0 .Thus from now on, we assume additionally that f and all the f K are supported on B δ ´k 0 and prove (5.3).We first dyadically pigeonhole f by wavepacket height.Write H ˚" max KPP δ 0 }f K } L 8 pQ k q q .For K P P δ 0 and H where here f K : Q k q Ñ C, |f K | is the absolute value of f K , and the last characteristic function is meant to be the indicator function of the set tx P Q k where the last equality is because |f K 1 T | constant on every T P TpKq.Again by Lemma 2.2, note that f pHq K is Fourier supported in θ K .Using the terminology of Lemma 2.2, the nonzero wavepackets that make up f pHq K are all of height " H. Then where the second inequality follows from writing f Next we dyadically pigeonhole so that each relevant f pHq K is made up of about the same number of wavepackets.Let now ν " q tlog q δ 1{k 0 u ď δ if the number of nonzero terms in (5.6) (that is, the number of nonzero wavepackets in f pHq K ) is in pα{2, αs, and 0 otherwise.Thus now we have that and each f pH,αq K is a function which is supported in B δ ´k 0 and Fourier supported in θ K which has " α many nonzero wavepackets of height " H.
Finally, we dyadically pigeonhole so that given a K, the parent interval J of length ν has about the same number of children K 1 of length δ 0 such that f pH,αq K 1 ‰ 0. To be more precise, fix a K and let J be the unique parent interval of length ν containing K.This parent J contains ν{δ 0 many intervals K 1 of length δ 0 and hence J has at most ν{δ 0 many children ‰ 0u P pβ{2, βs, and 0 otherwise.Thus we now have and each f pH,α,βq K is a function which is supported in B δ ´k 0 , Fourier supported in θ K , has " α many nonzero wavepackets of height " H, and K's parent J has " β children each of which also are supported in B δ ´k 0 , Fourier supported in θ K , and have " α many nonzero wavepackets of height " H.
For each K1 P P δ pI 1 q, . . ., Kk P P δ pI k q, we count the number of ordered k-tuples pK 1 , . . ., K k q with K i P P δ pI i q for i " 1, . . ., k and 0 P supppy g K 1 ˚¨¨¨˚y g K k ˚y g K1 ˚¨¨¨˚y g Kk q.The proof of Lemma 4.4 shows that the number of such ordered k-tuples is ď pqκq ´kpk´1q (in fact, here we only need that y g K j is supported in the cube τ K j rather than the smaller parallelepiped θ K j ).So using Cauchy-Schwarz, ÿ K i PP δ pI i q i"1,...,k ÿ Kj PP δ pI j q j"1,...,k ż Q k q g K 1 . . .g K k g K1 . . .g Kk ď pqκq ´kpk´1q ÿ It follows that ÿ I 1 ,...,I k PPκ dpI i ,I j qąκ @i‰j ż Alternatively, multilinear restriction estimate and L 2 orthogonality says that for any ball B δ ´1 of radius δ ´1 in Q k q , one has ż and since each |g K j | is constant on B ´1 δ , we have Summing over all B δ ´1 Ă Q k q and all I 1 , . . ., I k P P κ , we have ÿ I 1 ,...,I k PPκ dpI i ,I j qąκ @i‰j ż which for the purposes below is as good as (A.3).Putting (A.2) and (A.3) back into (A.1),we have Spδq 2k ď 2 2k´1 k 2k Sp δ κ q 2k `22k´1 κ ´p4k´2q pqκq ´kpk´1q .

´kpk´1q 2 cubes
of side lengths δ k , and

is.perfectly partition B δ ´k 0 into δ ´kpk´1q{2 0
Fourier supported in θ K and supported in B δ ´k 0 Since a T P TpKq is either completely contained in or completely disjoint from B δ ´k 0 , we then can writef pHq K " ÿ T PTpKq,T ĂB δ ´k 0 pf K 1 T q1 H{2ă|f K 1 T |ďH .(5.6)Furthermore, the T P TpKq which are contained in B δ ´k 0 many translates of T 0,k .Thus (5.6) has at most δ ´kpk´1q{2 0 many nonzero terms.Therefore for α P 2 N X r1, δ

P 2 ‰
pH,α,βq ν " tJ P P ν : #tK 2 P P δ 0 pJq : f pH,αq K 0u P pβ{2, βsu result, ż Q k q Chapter 7].4.1.1.The broad-narrow argument.First, we have the pointwise bound |gpxq| ď ř IPPκ |g I pxq|.At every point x P Q k q , let I x be the set of all intervals I 1 P P κ such that |g I 1 pxq| ě κ max IPPκ |g I pxq|.Suppose first I x contains at least k (disjoint) intervals, say I 1 1 , . .., I 1 k (all dependent on x) of length κ and |g I 1 1 pxq| " max IPPκ |g I pxq|: in this case we have |gpxq| ď κ ´1 max PPκ dpI i ,I j qąκ @i‰j |g I 1 pxq . ..gI k pxq| 1{k .Here we used that in Q k q , two distinct intervals of the same length are separated by at least that length.Alternatively, I x contains at most k ´1 intervals, in which case IPPκ |g I pxq| ă k max IPPκ |g I pxq|.