Cliques in dense inhomogeneous random graphs

The theory of dense graph limits comes with a natural sampling process which yields an inhomogeneous variant G(n,W) of the Erdos-Renyi random graph. Here we study the clique number of these random graphs. We establish the concentration of the clique number of G(n,W) for each fixed n, and give examples of graphons for which G(n,W) exhibits wild long-term behavior. Our main result is an asymptotic formula which gives the almost sure clique number of these random graphs. We obtain a similar result for the bipartite version of the problem. We also make an observation that might be of independent interest: Every graphon avoiding a fixed graph is countably-partite.


INTRODUCTION
The Erdős-Rényi random graph G(n, p) is a random graph with vertex set [n] = {1, . . . , n}, where each edge is included independently with probability p. Since Gilbert, and independently Erdős and Rényi introduced the model in 1959, this has been arguably the most studied random discrete structure. Here, we recall facts about cliques in G(n, p); these • while there are other sequences of finite graphs with clique numbers growing much slower than logarithmic with a limit graphon W ≡ 1.
So, suppose that we have suppressed these pathological examples by looking at "typical graphs that are close to W " rather than "all graphs that are close to W ", and let us see what value motivated by the clique number can be associated to W . To this end, suppose that W : 2 → [0, 1] is such a graphon that W (x, y) ∈ [p 1 , p 2 ] for every x, y ∈ , where 0 < p 1 p 2 < 1 are fixed. Then the edges of G(n, W ) are stochastically between G(n, p 1 ) and G(n, p 2 ). Thus, (1) tells us that the clique number ω(G(n, W )) asymptotically almost surely satisfies Thus, it is actually plausible to believe that ω(G(n, W )) log n (3) converges in probability. In this paper, we study this and related questions.

Related Literature on Inhomogenous Random Graphs
Inhomogeneous random graphs allow one to express different intensities of bonds between the corresponding parts of the base space. This is obviously useful in modeling phenomena in biology, sociology, computer science, physics, and other settings. The price one has to pay for this flexibility of these models is in extra difficulties in mathematical analysis of their properties. This is one of the reasons why literature on G(n, W ) is fairly scarce, compared to G(n, p); another reason apparently being that the inhomogeneous model is much more recent. Actually, in the inhomogeneous model, most work was done in the sparse regime, which we shall introduce now. To get a counterpart to sparse random graphs G(n, p n ), p n → 0, one introduces rescaling G(n, p n · W ). In this setting, W need not be bounded from above anymore (even though the question of "how unbounded" W can be is rather subtle and we neglect it here). The most impressive example of work concerning sparse inhomogeneous random graphs is [2] in which the existence and the size of the giant component in G(n, 1 n · W ) was determined. This work has initiated a big amount of further work on G(n, 1 n · W ), such as [26,27], as well as on related percolation models [1].
The threshold for connectivity of inhomogeneous random graphs was investigated in [6]. The diameter of inhomogeneous random graphs was studied in [9].
A particular subclass of the random graph models G(n, W ) are the so-called stochastic block models introduced already in 1980's in the field of mathematical sociology [13]. They are used extensively in many areas of mathematics, computer science, and physics. In our language, (the dense version of) stochastic block models correspond to the case when W is a step-function with finitely many or countably many steps. The stochastic block model is mathematically much more tractable. For example, the study of criticality in stochastic block models in [14] seems to be much more tractable than in the case of general inhomogeneous random graphs.

OUR CONTRIBUTION
In this section we present our main results. The notation in this section is standard. We refer the reader to Section 3 for formal definitions.
We saw that for many natural graphons W , ω(G(n, W )) grows logarithmically. It is easy to construct a graphon for which ω(G(n, W )) grows for example as log log n, or another graphon for which ω(G(n, W )) grows for example as n 0.99 . More surprisingly, our next proposition shows that we can have an oscillation between these two regimes even for one graphon.

Proposition 2.1.
For an arbitrary function f : N → R + with lim n→∞ f (n) = +∞ there exists a graphon W and a sequence of integers 1 = 0 < k 1 < 1 < k 2 < 2 < . . . such that asymptotically almost surely, While Proposition 2.1 shows that the long-term behavior of ω(G(n, W )) can be quite wild, for a fixed (but large) n, the distribution of ω(G(n, W )) is concentrated. The proofs of Proposition 2.1 and Theorem 2.2 are given in Section 4. In the proof of Theorem 2.2, we need to consider the case of graphons W for which E[ω(G(n, W ))] is bounded (as n → ∞) separately. Investigation of such graphons led to a result that is of independent interest. Let us give the details. Clearly, for each graphon W , and each n ∈ N, ω(G(n, W )) is stochastically dominated by ω(G(n+1, W )). As a consequence, the sequence E[ω(G(1, W ))], E[ω(G(2, W ))], E[ω(G(3, W ))], . . . is nondecreasing. We say that W has a bounded clique number if lim n→∞ E[ω(G(n, W ))] < +∞. Note that one example of graphons of bounded clique numbers are graphons W which have zero homomorphism density of H (see (11) for the definition) for some finite graph H. A subclass of these are kpartite graphons. These are graphons W : 2 → [0, 1] for which there exists a measurable partition = 1∪ 2∪ . . .∪ k such that for each i ∈ [k], W i × i = 0 almost everywhere. In the following example, we show that the structure of graphons with a bounded clique number can be more complicated. We consider a sequence of triangle-free graphs G 1 , G 2 , . . . whose chromatic numbers tend to infinity (it is a standard exercise that such graphs indeed exist). Let W 1 , W 2 , . . . be their graphon representations. We now glue these graphons into one graphon W . Clearly, ω(G(n, W )) 2 with probability one, but W is not k-partite for any k. Here, we show that the structure of graphons with a bounded clique number cannot be much more complicated than in the example above. We call a graphon W : 2 → [0, 1] countably-partite, if there exists a measurable partition = 1∪ 2∪ . . . such that for each i ∈ N, W i × i = 0 almost everywhere. Let us turn our attention to the main subject of the paper, that is, to the behavior of the clique number in G(n, W ) scaled as in (3). As a warm-up for studying (3), we first deal with its bipartite counterpart. To this end, we shall work with bigraphons. Bigraphons, introduced first in [20], arise as limits of balanced bipartite graphs. A bigraphon is a measurable function U : 1 × 2 → [0, 1]. Here, 1 and 2 are probability spaces which represent the two partition classes, and the value U(x, y) represents the edge intensity between the parts corresponding to x and y. This suggests the sampling procedure for generating the inhomogeneous random bipartite graph B(n, U): we sample uniformly and independently at random points x 1 , . . . , x n from 1 and y 1 , . . . , y n from 2 . In the bipartite graph B(n, U) with colour classes {a i } n i=1 and {b j } n j=1 , we connect a i with b j with probability U(x i , y j ). We define the natural bipartite counterpart to the clique number. Given a bipartite graph G = (A, B; E), we define its biclique number as the largest such that there exist sets X ⊆ A, Y ⊆ B, |X| = |Y | = , that induce a complete bipartite graph. We denote the biclique number of G by ω 2 (G).
The main result concerning the biclique number is the following. Theorem 2.4. Let U : 1 × 2 → [0, 1] be a bigraphon whose essential supremum p = ess sup U is strictly between zero and one. Then we asymptotically almost surely have The proof of Theorem 2.4 is given in Section 5.
We turn to our main result which determines the quantity (3). Suppose that W is a graphon with strictly positive essential infimum. Define : h is a nonnegative L 1 -function on .
(6) Here, we set 0 0 = 0 and a 0 = +∞ for a ∈ R \ {0}. We can now state our main result. Theorem 2.5. Suppose that W is a graphon whose essential infimum is strictly positive. Then Theorem 2.5 is consistent with (1). Indeed, suppose that W ≡ p ∈ (0, 1) is a constant graphon. Then for any h in (6) (which is not constant zero), we have that log 1/p . We provide heuristics for Theorem 2.5 for more complicated graphons in Section 6. Unfortunately, we were unable to turn these relatively natural heuristics into a rigorous proof. The actual proof of Theorem 2.5 is given in Section 8, building on tools from Section 7.
There are several alternative ways of expressing κ(W ). For example, when we heuristically derive Theorem 2.5, we make use of the following identity. Fact 2.6. We have where Other expressions of κ(W ) are given in Proposition 3.8.

Remark 2.7.
While we formulate all the problems in terms of cliques, we could have worked with the complementary notion of independent sets instead. Indeed, investigating one of these notions with respect to G(n, W ) is equivalent to investigating the other with respect to G(n, 1 − W ).
Using this observation, we get from Theorem 2.5 the following corollary for the size of the maximum independent set α(G(n, W )).

Corollary 2.8.
Suppose that W is a graphon whose essential supremum is strictly less than 1. Then In our language, (the dense version of) stochastic block models correspond to the case when W is a step-function with finitely many or countably many steps. There exists a conceptually simpler proof of Theorem 2.5 when restricted to stochastic block models. This simplification occurs both on the real-analytic side (i.e., general measurable functions versus step-functions) and on the combinatorial side. See our remark in Section 6.3.

Notation
For n ∈ N, we write [n] = {1, . . . , n}, and [n] 0 = {0, 1, . . . , n}. As always, we denote by n k the binomial coefficient n! k!(n−k)! . For the multinomial coefficients of higher orders, we employ the notation We omit rounding symbols where it does not affect correctness of the calculations.
We shall always assume that is a standard Borel probability space without atoms. We always write ν for the probability measure associated with .
We shall sometimes make use of tools from real analysis which are available only for R and R d . The Lebesgue measure will be denoted by λ. It should be always clear from the context whether we mean the one-dimensional Lebesgue measure on R, two dimensional Lebesgue measure on R 2 or any higher dimensional Lebesgue measure.
We write · 1 to denote the L 1 -norm of functions (or vectors in a finite-dimensional space). Non-negative vectors, and non-negative L 1 -functions 1 are called histograms (see our explanation at the end of Section 6.1). For a histogram f , we write Box(f ) for all histograms g for which g f (pointwise). We say that a histogram is non-trivial if it is not almost everywhere zero.
We recall the notions of essential supremum and essential infimum. Suppose that is a space equipped with a measure ν. For a measurable function f : X → R we define ess sup f as the least number a such that ν({x ∈ : f (x) > a}) = 0. The quantity ess inf f is defined analogously.

Random Graphs H(n, W )
There is a natural intermediate step when obtaining the random graph G(n, W ) from a graphon W which is often denoted by H(n, W ). To obtain H(n, W ) we sample n random independent points x 1 , . . . , x n from the probability space underlying W . The random graph has the vertex set [n]. The edge-set is an edge-set of a complete graph equipped with edge-weights. The weight of the edge ij is W (x i , x j ). Self-loops are not included.

Graphons
The above crash course in graph limits almost suffices for the purposes of this paper, and we need only a handful of additional standard definitions. See [18] for further references.
All (non-discrete) probability spaces in this paper are standard Borel probability spaces without atoms. Recall that the Isomorphism Theorem (see e.g. [15,Theorem 17.41]) tells us that there is a measure-preserving isomorphism between each two such spaces (i.e. a bijection between the spaces such that this function and its inverse are measurable and preserve measures). In particular, suppose that W : 2 → [0, 1] is a graphon defined on a probability space , and let X be another probability space. Let us fix an isomorphism ψ : X → between X and . By a representation of W on X we mean the graphon W : X 2 → [0, 1], (x, y) → W (ψ(x), ψ(y)). Of course, the representation depends on the actual choice of the isomorphism ψ. Note however that the distribution of G(n, W ) does not depend on the choice of ψ as it is the same as the distribution of G(n, W ). Note also that from the Isomorphism Theorem above we get the following fact. We shall need to "zoom in" on a certain part of a graphon. The next definition is used to this end. Definition 3.2. Suppose that W : 2 → [0, 1] is a graphon on a probability space with a measure ν. By a subgraphon of W obtained by restricting to a set A ⊆ of positive measure we mean a function U : A 2 → [0, 1] which is simply the restriction W A×A . When working with this notion, we need to turn A to a probability space. That is, we view U as a graphon on the probability space A endowed with measure ν A (B) := ν(B) ν(A) for every measurable set B ⊆ A.
Observe that in the above setting for every B ⊆ A of positive measure we have Note that a lower bound on ω(G(n, W A×A )) provides readily a lower bound on ω(G(n, W )). More precisely, suppose that we can show that asymptotically almost surely, ω(G(n, W A×A )) c log n. Consider now sampling the random graph G(n, W ). By the Law of Large Numbers, out of the n sampled points x 1 , . . . , x n ∈ , there will be (ν(A) − o(1))n > 1 2 ν(A)n many of them contained in A. In other words, there is a coupling of G = G(n, W ) and G = G( 1 2 ν(A)n, W A×A ) such that with high probability, G is contained in G as a subgraph. We conclude that The homomorphism density of a graph Suppose that is an atomless standard Borel probability space. Let W 1 , W 2 : 2 → [0, 1] be two graphons. We then define the cut-norm distance of W 1 and W 2 by where S and T range over all measurable subsets of . Strictly speaking, d is only a pseudometric since two graphons differing on a set of measure zero have zero distance. Based on the cut-norm distance we can define the key notion of cut distance by where ϕ : → ranges through all measure preserving automorphisms of , and W ϕ 1 stands for a graphon defined by W ϕ 1 (x, y) = W 1 (ϕ(x), ϕ(y)). Then δ is also a pseudometric. Suppose that H = ({v 1 , . . . , v }, E) is a graph (which is allowed to have self-loops), and let be an arbitrary atomless standard Borel probability space. By a graphon representation W H of H we mean the following construction. We consider an arbitrary partition = A 1∪ A 2∪ . . .∪A into sets of measure 1 each. We then define the graphon W H as 1 or 0 on each square A i × A j , depending on whether v i v j forms an edge or not. Note that W H is not unique since it depends on the choice of the sets A 1 , . . . , A . However, all the possible graphons W H are at zero distance in the δ -pseudometric. So, when writing W H we refer to any representative of the above class. With this in mind, we can also define the cut distance of H and any graphon W : 2 → [0, 1], denoted by δ (H, W ), as δ (W H , W ). Also, all of this extends in a straightforward way to weighted graphs with a weight function w : E → [0, 1].

Remark 3.3.
Suppose that H is a finite graph and I is the unit interval (open or closed). Then in the above construction we can take the sets A i ⊆ I to be intervals.
The key property of the sampling procedures described earlier (both G(n, W ) and H(n, W )) is that the samples are typically very close to the original graphon W in the cut distance. Let us state this fact (for the sampling procedure H(n, W )), proven first in [4], formally. In some situations, we shall consider a wider class of kernels, the so-called L ∞ -graphons. These are just symmetric L ∞ -functions W : 2 → R + . That is, we do not require L ∞ -graphons to be bounded by 1 but rather by an arbitrary constant. The random graph H(n, W ) makes sense even for L ∞ -graphons. 2 By simple rescaling, we get the following from Lemma 3.4.

Corollary 3.5.
Let W an arbitrary L ∞ -graphon. Then for each k ∈ N with probability at Let us note that the proof of Lemma 3.4 is quite involved.

Lebesgue Points and Approximate Continuity
Let f : (0, 1) 2 → R be an integrable function. Recall that (x, y) ∈ (0, 1) 2 is a called a Lebesgue point of f if we have where Recall also that a function f : (0, 1) 2 → R is said to be approximately continuous at (x, y) ∈ (0, 1) 2 if for every ε > 0, the point (x, y) is a point of density of the set We will use the following classical result (see e.g. [23,Theorem 7.7]).
2 But G(n, W ) does not make sense.
An easy consequence of the previous theorem is also the following classical result.

Alternative Formulas for κ(W )
First, let us prove Fact 2.6 which gives our first identity for κ(W ).
Proof of Fact 2.6. If there exists a nonzero histogram h such that (u,v) = 0 then clearly both suprema in (6) and (7) are +∞.
Let h be an arbitrary histogram in (6) and suppose that (u,v) . This way, the function f is a histogram on , and f 1 equals to the term in the supremum in (6). So, to justify that the right-hand side of (7) is at least as big as that of (6), we only need to show that (f , W ) 0. Indeed, On the other hand, let f be an arbitrary histogram appearing in (7) such that For c ∈ R, let us denote by cf the c-multiple of the function f . Then the map c → (cf , W ) is clearly a quadratic function with the limit −∞ at +∞. And since In Fact 2.6 we gave an alternative formula for κ(W ). In Proposition 3.8 we give two further expressions. These expressions require the graphon W to be changed on a nullset. Note that we have the liberty of making such a change as the distribution of the model G(n, W ) remains unaltered.
is an arbitrary graphon. Then W can be changed on a nullset in such a way that the following holds: We have where the infimum ranges over all measurable sets A ⊆ (0, 1) of positive measure. Moreover, More precisely, for each ε > 0 and each set A from (16) satisfying where h ranges over all indicator functions on (0, 1). Our proof of Proposition 3.8 could be easily modified to show that in the infimum, we can range over all histograms h instead. Since the Radon-Nikodym theorem gives us a one-to-one correspondence between non-negative L 1 -functions and absolutely continuous measures, we get that where the infimum ranges over all probability measures π on (0, 1) that are absolutely continuous with respect to the Lebesgue measure. While we shall not need this identity we remark that it can be used to derive a heuristic -slightly different from that given in Section 6 -for Theorem 2.5.
Proof of Proposition 3.8. Let us replace the value of W at every point (x, y) ∈ (0, 1) 2 that is not a point of approximate continuity by c. This is a change of measure zero by Theorem 3.7. Let us deal with the first part of the statement postponing (17) to later.
Proof of Claim 3.8.1. Let h be an arbitrary function appearing in (6) (not constant zero). Fix r ∈ {2, 3, . . .}, and let F ⊆ (0, 1) be a random set consisting of r independent points sampled from (0, 1) according to the density function d = h/ h 1 . Then by linearity of expectation we have This shows that there exists a deterministic r-element set F for which 2 r(r − 1) This concludes the proof of Claim 3.8.1.
Let us denote the right-hand side of (16) as −P.
Proof of Claim 3.8.2. Suppose that r ∈ {2, 3, . . .} is given, and let F = {x 1 < x 2 < . . . < x r } be an arbitrary set of points in (0, 1) as in (15). Let ε ∈ (0, c/2) be arbitrary. Let . Firstly, note that the concavity of the logarithm gives that for each a ∈ [c, ∞). Secondly, note that C 0 as ε 0. Let us take δ > 0 such that for each i ∈ [r], we have that the sets are pairwise disjoint, and such that the measure of each of the sets The latter property can be achieved since each Now, let us use (18) for the first term and the fact that λ(D ij ) ε(2δ) 2 for the second term. Thus, Letting ε 0 (which means that also C 0), we get the claim.
By Claim 3.8.1, we have lim inf r→∞ log P r − 2 κ(W ) . By Claim 3.8.2, we have P lim sup r→∞ log P r . Further, it is obvious that − 2 κ(W ) P: Indeed, the supremum in (6) ranges over all histograms, of which indicators of measurable sets are just a particular case. The combination of the three above inequalities proves the fact.
So, it remains to deal with (17). Positive multiples of indicator functions are histograms, so (7) tells us that κ(W ) sup a∈R + ,A⊆ {aλ(A) : (a · 1 A , W ) 0}. It remains to deal with the opposite inequality. We shall prove this in the "more precisely" form. Let ε > 0 be arbitrary and take A such that (1 + ε) 2 . We claim that the pair (a, A) is admissible for the supremum in (17). Indeed, , and since ε > 0 was arbitrary, this finishes the proof.

Exhaustion Principle
We recall the principle of exhaustion (see e.g. [8, Lemma 11.12] for a more general statement).

Proof of Proposition 2.1
We shall need the following well-known crude bound on the minimum difference between uniformly random points.
Consider a sequence of positive numbers 1 = a 1 > a 2 > a 3 > . . . > 0, with lim n→∞ a n = 0, to be determined later. Define a graphon W : Let us show how to achieve (4). Suppose that numbers a 1 , . . . , a 2i−1 were already set. Fix an arbitrary number n large enough such that n −3 < a 2i−1 and f (n) > 1 + 1/a 2i−1 . Then, set a 2i := n −3 . We claim that with high probability, there is no set of f (n) vertices in G(n, W ) forming a clique. Indeed, consider the representation of the vertices of G(n, W ) in the interval [0, 1]. By Fact 4.1 we can assume that the mutual distances between these points are more than a 2i . Consider an arbitrary set S ⊆ (0, 1) of these points of size f (n). By the pigeonhole principle there are two points x, y ∈ S with |x − y| 1/(f (n) − 1) < a 2i−1 .
On the other hand, |x − y| > a 2i . We conclude that W (x, y) = 0, and thus S does not induce a clique. Next, let us show how to achieve (5). Suppose that numbers a 1 , . . . , a 2i were already set. Fix a large number n. In particular, suppose that n −3 < a 2i and f (n) > 2 a 2i , and let a 2i+1 := n −3 . Now, consider the process of generating vertices in G(n, W ). By the Law of Large Numbers, out of n vertices, with high probability, at least 1 2 a 2i n vertices fall in the interval ( 1 2 − a 2i 2 , 1 2 + a 2i 2 ). By Fact 4.1, with high probability, the differences of pairs of these points are bigger than a 2i+1 . In particular, the said set of vertices forms a clique of order at least 1 2 a 2i n > n f (n) , as needed for (5).

Remark 4.2.
It may seem that by replacing the values 0 and 1 by some constants 0 < p 1 < p 2 < 1 in the construction in the proof of Proposition 2.1 we get an oscillation between c 1 log n and c 2 log n. Theorem 2.5 tells us however that this is not the case: the clique number normalized by log n will converge in probability.

Proof of Theorem 2.2
The proof of Theorem 2.2 was suggested to us by Lutz Warnke.
First, we handle the case when E[ω(G(n, W ))] is bounded.

Lemma 4.3. Let W be a graphon, and L
Proof. The statement follows from the following claim. Suppose that W is a graphon. Then for each ∈ N we have Indeed, suppose that for some and n we have that P[ω(G(n, W )) ] = δ > 0. Then, for each k, we have that P[ω(G(kn, W )) Suppose that W is represented on a probability space . By Lemma 4.3, we can assume that E[ω(G(n, W ))] → ∞.
To prove the concentration, we shall use Talagrand's inequality. For this, we need to represent G(n, W ) on a suitable product space J. It turns out that the right product space corresponds to "vertex-exposure" technique known in the theory of random graphs. Let W ). To see this, consider an arbitrary element x ∈ J. We can write where x i ∈ , and p i,j ∈ [0, 1]. In the instance of G(n, W ) corresponding to x, vertices i and j are connected if and only if W (x i , x j ) p i,j . It is straightforward to check that this gives the right distribution on G(n, W ).
Consider the clique number, this time on the domain J. That is, we have a function : J → R, where (x) is the clique number of the graph corresponding to x. Then is a (discrete) 1-Lipschitz function. That is if x, y ∈ J are such that they differ in one coordinate, then | (x) − (y)| 1. 3 Further, satisfies the so-called small certificates condition. This means that whenever (x) , there exists a set C of at most many coordinates such that (y) for each y ∈ J which agrees with x on each coordinate from C. (In other words, the values of x on coordinates from C alone certify that (x) .) Indeed, it is enough just to reveal the values at the indices of one maximum clique. Talagrand's inequality (see [22, Remark 2 following Talagrand's Inequality II, p. 81]) 4 then states that there exists an absolute constant β > 0 such that for t n = (E[ω(G(n, W ))]) 3 4 , we have (for every large enough n) that The conclusion immediately follows by letting n go to infinity. First of all, we will show that there is a set B ⊆ of positive measure such that W B×B = 0 almost everywhere. We may assume that L 2 (the case L = 1 is trivial). For every

Graphons With a Bounded Clique Number
. It follows from the equality t(K L+1 , W ) = 0 that for every (up to a set of measure zero) Next, observe that for every A ⊆ of positive measure, there is B ⊆ A of positive measure such that W B×B = 0 almost everywhere. This follows by the previous considerations applied on the subgraphon W * = W A×A (for which we still have that sup{k ∈ N : Finally, let W : (0, 1) 2 → [0, 1] be a representation of the graphon W on (0, 1). Then the statement follows by an application of Lemma 3.9.

BICLIQUE NUMBER
In this section, we prove Theorem 2.4. First, we introduce some additional notation. When we refer to a bipartite graph H = (V , W ; E) as a bigraph, we consider a distinguished order of the colour classes V = {v 1 , . . . , v p } and W = {w 1 , . . . , w q }. In such a case we define the bipartite density of H in a bigraphon U : Note that for a bigraph H = (V , W ; E) and its conjugate H = (W , V ; E) the quantities t B (H, U) and t B (H , U) are not necessarily equal.
As we will see, the upper bound in Theorem 2.4 is trivial. For the lower bound, we need to make a small detour to Sidorenko's conjecture.

Sidorenko's Conjecture
A famous conjecture of Simonovits and Sidorenko, "Sidorenko's conjecture", [24,25] asserts that among all graphs of a given (large) order n and fixed edge density d, the density of a fixed bipartite graph is minimized for a typical sample from G(n, d). The conjecture can be particularly neatly phrased in the language of graphons -as observed already by Sidorenko a decade before the notion of graphons itself -in which case it asserts that

Bicliques in Almost Constant Bigraphons
As a preliminary step for our proof of Theorem 2.4, we study bicliques in random bipartite graphs sampled from almost constant bigraphons. This condition is formalized by the following definition.
Proof. Let α ∈ (0, 1) be arbitrary. Suppose that ε > 0 is sufficiently small (we will make this precise later), d ∈ (d 1 , d 2 ) and U : Let X k be the number of bicliques in B(k, U) whose two colour classes have size = (1 − α) · 2 log 1/d · log k. Multiplicities caused by automorphisms of K , are not counted. By Proposition 5.1 we have Next, we are going to show by a second moment argument that X k ≈ E[X k ] a.a.s. For p, q = 0, 1, . . . , , we define the bigraph K [ ,p,q] as a result of gluing two copies of K , along p vertices in the first colour class and q vertices in the second colour class. Alternatively, K [ ,p,q] can be obtained by erasing edges of two disjoint copies of the bigraph K −p, −q from K 2 −p,2 −q . We have e(K [ ,p,q] ) = 2 2 − pq .
We have where Y p,q counts the number of bigraphs K [ ,p,q] which preserve the order of the colour classes. We expand the second moment as It is now enough to compare this with (21).

Claim 5.3.2.
There exist numbers C, ε 2 > 0 such that the following holds: Whenever where o(1) → 0 as k → ∞ uniformly for any choice of p and q.
It remains to compare the terms t B (K [ ,0,0] , U) and t B (K [ ,p,q] , U). First, we claim that for each i, j, h ∈ N we have Indeed, by Hölder's inequality, we have where the integrations are over T = (t 1 , . . . , t h ) ∈ ( 2 ) h , and x ∈ 1 , and deg(x, T ) = h r=1 U(x, t r ). Thus A double application of (27) followed by an application of Proposition 5.1 gives In the defining formula (19) for t B (K [ ,p,q] , U) we show that d + ε 2 are upper bounds on some factors in U(x i , y i ), as in Fig. 1. Observe that after removal of the p + q − 2pq edges indicated in Fig. 1, the graph K [ ,p,q] decomposes into a disjoint union of K , and K −p, −q . Note that for a disjoint union H 1 ⊕ H 2 of two bigraphs H 1 and Therefore, for C = 1/ log(1/d 1 ), and ε 2 < 1 20 d 1 log(1/d 2 ). Let C > 0 and ε 2 > 0 be given by Claim 5.3.2. Let ε 1 > 0 be given by Claim 5.3.1 for c = C. Now if ε < min (ε 1 , ε 2 ), we are ready provide upper bounds on the summands in (23). Note that we have (log 2 k) many of these summands. We get

Proof of Theorem 2.4
The upper bound is easy, since it claims that ω 2 (B(n, U)) is typically not bigger than the biclique number in the balanced bipartite Erdős-Rényi random graph B(n, p) (which clearly stochastically dominates B(n, U)). For completeness, we include the calculations. We write Y k (B(n, U)) for the number of complete balanced bipartite graphs on k + k vertices inside B(n, U). For k = (1 + ε) · 2 log 1/p · log n, we have, The statement now follows from Markov's Inequality, provided that we can show that n 2 p k → 0. Indeed, n 2 p k = n 2 p 2 log n log 1/p p 2ε log n log 1/p = p 2ε log n log 1/p → 0 .
We now turn to the lower bound. Let α ∈ (0, 1) be arbitrary and let ε 0 > 0 be given by Proposition 5.3 for d 1 = p 2 and d 2 = p. Let ε < min ε 0 , p 2 be arbitrary. We denote by ν i the measure given on i , i = 1, 2. The definition of the essential supremum, together with Theorem 3.6, gives that there exist two measurable sets A ⊆ 1 and B ⊆ 2 such that We put δ = min (ν 1 (A), ν 2 (B)). By rescaling the measures ν 1 , ν 2 , we get probability measures ν * 1 on A and ν * 2 on B. Then we can view U A×B as a bigraphon, which we denote by U * . Note that U * is (p − ε, ε)-constant (and thus also (p − ε, ε 0 )-constant as ε < ε 0 ).
Consider now the sampling process to generate B ∼ B(n, U) as described above. A standard concentration argument gives that with high probability, at least δn 2 points x i sampled in 1 lie in the set A, and at least δn 2 points y j sampled in 2 lie in the set B. In other words, with high probability we can find a copy of B( δn 2 , U * ) inside B(n, U). Looking at the biclique number, we get that for = (1 − α) · 2 log 1/(p−ε) · log(δn/2), we have where the last inequality follows from Proposition 5.3. Since log(δn/2) = (1 + o(1)) log n, and since α ∈ (0, 1) and ε < min ε 0 , p 2 were arbitrary, the claim follows.

FORMULA FOR GRAPHS WITH LOGARITHMIC CLIQUE NUMBER
In this section we try to informally justify Theorem 2.5. While we believe that the derivation here captures the essence of the problem, the actual proof, presented in Section 8, is quite different. At the end of this section we comment on what fails in turning the heuristics into a rigorous argument.

First Moment for a 2-Step Graphon
Let us try to gain some intuition on Theorem 2.5 by looking at one of the simplest nonconstant graphons.
Our aim is to determine for which numbers c ∈ R + there typically exists a clique of order c log n in G ∼ G(n, W ), and for which c's there is typically none. So let us fix c ∈ R + and let X n count the number of cliques of order c log n in G ∼ G(n, W ). By Markov's Inequality, ω(G) will be typically smaller than c log n in the regime when E[X n ] → 0. On the other hand, it is plausible that a second moment argument will give that typically ω(G) c log n when E[X n ] → +∞. With this belief -which is supported by the success of a secondmoment argument in the proof of Theorem 2.4 -let us estimate E[X n ]. Actually, we rather look at a refined quantity Y α 1 ,α 2 n (G) which is defined as the number of cliques in G that consist of α 1 log n vertices whose representation on lies in 1 , and α 2 log n vertices that are represented in 2 . We have We expect the quantities Y α 1 ,α 2 n to be either super-polynomially small or super-polynomially large. Since the sum above has only (log n)-many summands, we expect that E[X n ] → +∞ if and only if ∃α 1 , α 2 0 such that α 1 + α 2 = c and E[Y α 1 ,α 2 n ] → +∞ .
For a clique that contributes to Y α 1 ,α 2 n (G) to be present, α 1 log n 2 edges in the ( 1 × 1 )-part of W , α 2 log n 2 edges in the ( 2 × 2 )-part, and α 1 α 2 log 2 n edges in the ( 1 × 2 )-part must be present in the specific locations of perspective cliques or complete bipartite graphs. By the Law of Large Numbers, approximately β 1 n points in the sampling process for G(n, W ) end up in 1 and approximately β 2 n points end up in 2 . We get is negative or positive, respectively. It is straightforward to generalize this formula to graphons with more steps. Observe also that the values of β 1 and β 2 get lost in the transition between the first and the second line of (34), and are immaterial in (35) consequently (provided that they are positive). In particular, the step sizes β i could have been "infinitesimally small". Thus, we can see a direct correspondence between ( * ) and ( * * ) in (8) and (35), where the integration corresponds to passing to infinitesimal steps. In view of this, we will denote the term in (35) by (α, W ), where α = (α 1 , α 2 ). The optimization over α 1 and α 2 in (33) corresponds to taking the supremum in (7). This is why we call the functions f in (7) (or vectors, in case of step-graphons) histograms: they specify the densities of the vertices of the anticipated cliques over the space . Also, motivated by (7) and its interpretation above, we say that a histogram f is admissible for a graphon W if (f , W ) 0.
Last, let us note that a physicist might refer to ( * ) as the "entropy contribution", as it comes from the choice of the vertices of a clique, while ( * * ) could be referred to as the "energy" needed to include all required edges of that clique.

Introducing the Second Moment to the Example
So far, our prediction was based on a first moment argument. Combined with Markov's Inequality this gives readily an upper bound on the typical clique number of G(n, W ). We now want to complement the upper bound with a lower bound based on a second moment argument. Let us first recall the situation in the setting of the Erdős-Rényi random graphs G(n, p). There, a straightforward calculation for the random variable X n counting cliques of order c log n (where c > 0 is fixed) gives that E[X n ] 2 = (1 + o(1

))E[X 2
n ] if and only if E[X n ] → +∞. Thus, the first and the second moment start working together at the same time.
The situation is more complicated in the model G(n, W ). We will illustrate this on the graphon W from (32). Suppose that α 1 , α 2 0 are such that (35) is positive, and we ask in hope whether where K and L range over all sets of vertices with α 1 log n vertices represented in 1 and α 2 log n vertices represented in 2 . Let K 1 be the vertices of K represented in 1 , and let K 2 , L 1 , and L 2 be defined analogously. It is clear that |K 1 \L 1 | = |L 1 \K 1 | and |K 2 \L 2 | = |L 2 \K 2 |. So for fixed sets K and L, we have Thus, grouping (37) depending on the values of where the approximate equality represents the fact that we assumed on the right-hand side that exactly β i n vertices are represented in i . Let us write γ i = m i / log n, and let us write z γ 1 ,γ 2 n for the individual summands on the right-hand side of (38). Routine manipulations give that log z γ 1 ,γ 2 n log 2 n ≈ γ 1 + γ 2 + 2(α 1 − γ 1 ) + 2(α 2 − γ 2 ) Thus, if we want the second moment (36) to work then comparing the calculations above with (34), we must have for each γ 1 ∈ [0, α 1 ] and γ 2 ∈ [0, α 2 ] that 2 α 1 + α 2 + α 2 1 2 log p 11 + α 2 2 2 log p 22 + α 1 α 2 log p 12 which rewrites as (γ , W ) 0. To summarize, to justify (7) (at least for step-functions), it would suffice to have the following.
This, however, does not hold in general. Indeed, take for example We have (α, W ) = 1 8 > 0 but (γ , W ) = − 1 2 < 0. It is worth explaining what is happening in the example in words. The parameters p 11 and α 1 are set so that asymptotically almost surely, G( n 2 , p 11 ) contains no cliques of order α 1 log n. However, in the rare cases when G( n 2 , p 11 ) (viewed as a subgraph of G(n, W )) does contain such a clique, there are typically many ways of extending it on the 2 -part by α 2 log n additional vertices, thus inflating E[Y α 1 ,α 2 n ] substantially. The above suggests a correction for (7) in that we should range only over those histograms f for which (g, W ) 0 for all histograms g ∈ Box(f ). Note also that the necessity of testing the admissibility condition over all sub-histograms of f has a clear combinatorial interpretation: If cliques with a given histogram typically appear, then for each given subhistogram cliques with that sub-histogram must appear (just because a subset of a clique again induces a clique). Now, after all the arguing why (7) should seem wrong, let us explain why it is actually all right. We show in Lemma 7.2 that for any histogram f attaining the supremum in (7) we automatically have that all sub-histograms are admissible (recall that a histogram h is admissible for a graphon W if (h, W ) 0). If the supremum is not attained then for each histogram f almost attaining the supremum, we have (g, W ) 0 for all sub-histograms g, which is sufficient for the argument.

Failure of Turning the Above Heuristics into a Rigorous Argument
There are two types of errors that we introduced in the above argument. Firstly, the "little imprecisions" when we replaced a sum by its maximal term (such as in (33)) or when we used the -symbol. Each such step introduces an error of o(1) to the quantities log Xn log 2 n and log Y α 1 ,α 2 n log 2 n . That means, that actually we can only conclude that which is too crude for the second moment argument to work. Secondly, the notion of a "set of vertices following a certain histogram" makes sense only in the stochastic block model, but not when we have a finite set of vertices in an uncountable probability space. Let us jump ahead and note that in the rigorous proof in Section 8 we, in a sense, are able to make use of histograms in the continuous setting. Namely, Lemma 3.4 allows us to discretize a given graphon in an appropriate sense, after which it does make sense to talk about histograms.
Let us remark that for the stochastic block model the first issue (which is the only in that case) can be dealt with by pedestrian calculations, thus yielding a routine proof of Theorem 2.5 for the special class of stochastic block models.

TOOLS FOR THE PROOF OF THEOREM 2.5
In this section we prepare tools for the lower bound in Theorem 2.5. In Section 7.1 we state and prove Lemma 7.2 which asserts that if f * is a histogram almost attaining the supremum in (7) then (f , W ) is almost positive for all subhistograms f f * . The need for this lemma was motivated in Section 6.2. In Section 7.2 we introduce a new graphon parameter ξ(W ). This parameter is motivated by controlling the second moment of the number of cliques of a given size. All the work in Section 7 steers towards deriving the two main results of this section, Lemma 7.7 and Lemma 7.9. The former lemma asserts that each graphon W contains a subgraphon U with ξ(U) ≈ 1 κ(W ) . The latter asserts that ω(G(n, U)) 1 ξ(U) ·log n. These two lemmas combine easily to give the proof of the lower bound in Theorem 2.5 (as is shown in Section 8).

Subhistograms of Optimal Histograms are Admissible
The main result of this section, Lemma 7.2, tells us that if f * is a histogram almost attaining the supremum in (7) then (f , W ) is almost positive for all subhistograms f f * . We showed that this particular case of the (false, in general) Dream Lemma is needed for the second moment to work. The proof of Lemma 7.2 is technical, building on Lemma 7.1. It turns out that in those situations when the supremum in (7) is attained, Lemma 7.2 has a much shorter (but conceptually the same) proof. We offer this simplified statement in Lemma A.1 in the Appendix. Lemma 7.1. Suppose that W is an arbitrary graphon with 0 < ess inf W ess sup W < 1. Then there is a constant K > 0 depending only on the graphon W such that the following holds: Let g be an arbitrary histogram admissible for W and let δ ∈ (0, 1). Suppose that a ∈ (0, 1) and that g = g + g for some non-trivial histograms g and g such that g 1 < g 1 −δ. Then either (g ) −a, or there exists a histogram g * which is admissible for W and for which we have (39) Proof. Since we shall work exclusively with the graphon W , we write (·) as a shortcut for (·, W ). Let us write m − = ess inf W and m + = ess sup W . Let us fix numbers a, δ ∈ (0, 1) and a decomposition g = g + g of g into non-trivial histograms g , g such that g 1 < g 1 − δ. For ε 1 ∈ (0, 1) and ε 2 > 0, let us write g * (ε 1 , ε 2 ) = (1 − ε 1 )g + (1 + ε 2 )g . Let us also write and note that A, B, C, D, E > 0. We have (g ) = A − C and (g) There is nothing to prove when (g ) −a. Thus, assume otherwise. Then Upper-bounding C by 1 2 g 2 1 log(1/m − ) and using that C > a, we get For each ε 1 ∈ (0, 1) and ε 2 > 0, the difference (g * (ε 1 , ε 2 )) − (g) can be expressed as (0, 1) and β > 0 will be determined later) then we have Let us expand the term T1. 2C 2a.
Now, set ε 1 = a 4 min( 1 C , B 2 2A 2 D ) and β = min(1, aB 4AD , aB 2AE ). Routine calculations give that each of the terms T2 and T3 is smaller than a. Plugging the bounds above in (43), we get We have Therefore It follows that This finishes the proof.

The Graphon Parameter ξ(·)
In Section 6.2 we outlined why the second moment argument for counting cliques should go through. (Recall that the second moment argument is needed to prove the lower bound in Theorem 2.5, which is the more difficult half of the statement.) For the actual execution of this step, however, we need to introduce a new graphon parameter. This parameter is a version of the cut norm with an exotic scaling. Given an arbitrary graphon W represented on a probability space we define The key feature of ξ(W ), which we prove in Lemma 7.9, is that the second moment argument for counting cliques of order almost 1 ξ(W ) log n works. More precisely, in the proof of Lemma 7.9, we set up a random variable Y which essentially counts the number of individually-weighted cliques of the said order. 5 . This allows us to conclude that there must be at least one clique of such an order.
Of course, Lemma 7.9 itself is not enough to establish the lower bound in Theorem 2.5: we need to connect the new quantity ξ(W ) to the original quantity κ(W ). Given Lemma 7.9 described above, we would hope that κ(W ) = 1 ξ(W ) . Unfortunately, in general, we only have κ(W ) 1 ξ(W ) , see Fact 7.3. Not all is lost though. In Lemma 7.7 we prove that every graphon W contains a subgraphon U for which 1 ξ(U) > κ(W ) − ε (here, ε > 0 is arbitrarily small). After picking a subgraphon U for which 1 ξ(U) ≈ κ(W ) we continue with the proof of the lower bound in Theorem 2.5 as follows. We prove in Lemma 7.9 that asymptotically almost surely ω(G(n, U)) 1 ξ(U) log n. As described in (10), the above combination of Lemma 7.7 and Lemma 7.9 will conclude the desired proof of the lower bound in Theorem 2.5. Now, let us state and prove the already advertised Fact 7.3. This fact will not be needed in our proof of Theorem 2.5. However, since it is so basic we record it here. Proof. By considering the set B = A in (48) we see that . In the rest of this section we prove Lemmas 7.9 and 7.7. The paths towards these lemmas are shown in Fig. 2.
For the next lemma, note that if G is a finite graph then the value of ξ(W G ) does not depend on the particular representation W G of G.  W(x,y)) and a weighted graph G with V (G ) = V (G) and weight function w (i, j) = log( 1 /w(i,j)). Then for an arbitrary γ ∈ (0, 1] we have . Thus, the term S1 can be bounded from above by − γ 2 log(ess inf W ) γ 2 log 1 c , as needed. Suppose next that ν(A) > γ . Suppose first that δ (W , G ) = 0. Using the invertible measure preserving maps from (12), we know that for each ε > 0 there exists a graphon representation W G of G on such that d (W , W G ) < εν(A). Then as was needed. Suppose next that δ (W , G ) > 0. Using the invertible measure preserving maps from (12), we know that for each ε > 0 there exists a graphon representation W G of G on such that d (W , W G ) < (1 + ε)δ (W , W G ). We shall fix such a representation W G for ε = ν(A) γ − 1. Then (49) can be bounded from above by as was needed. Actually, the assertion of Lemma 7.5 is violated only with probability at most exp(− n 2 log n ), as can be seen from the proof of Lemma 7.5. We shall not need this refinement, though. For the proof of Lemma 7.5 we shall need the following well-known fact which we include here for the reader's convenience. Proof. Let us first bound the probability that one distinguished ball is placed into a bin which contains some other balls. Recall that for each n 2, The mentioned probability is exactly The claim then follows by summing this error probability over all m n 1/3 balls.
Proof of Lemma 7.5. Let be the probability space underlying W . Let W = log 1 /W be the negative logarithm of W . Note that W is bounded from above by log 1 /c. Sampling the random graph G ∼ H(n, W ) can be naturally coupled with sampling a random graph G ∼ H(n, W ). So, for the first part of the argument, we shall analyze the graph G . Suppose first that n is fixed. Corollary 3.5 implies that with probability at least 1 − exp(− n 2 log n ) . We shall prove the statement for each weighted graph G satisfying this property (provided that n is sufficiently large). That means that we assume that G is fixed, and G is its exponentiated version. In particular, all the probabilistic calculations below are only with respect to later randomized steps. Let K be the family of all subsets X of V (G ) = V (G) of size k n for which δ (G , G [X]) 20  ] < exp(− kn 2 log kn ). Thus, we get (for n sufficiently large) that |K| ε n kn . So, the lemma will follow provided that we prove that H ⊆ K, which we prove next.
Indeed, let X ∈ K be an arbitrary vertex set of size k n . Then δ ( We take γ = 1 / 4 √ log kn, and see that the right-hand side is, for large enough n, smaller than ε. This proves that X ∈ H and consequently concludes the lemma. Our next two lemmas are crucial in proving the lower bound in Theorem 2.5. The first lemma, Lemma 7.7, tells us that in every graphon W there exists a subgraphon U of W for which we have 1 ξ(U) κ(W ). The second lemma, Lemma 7.9, then tells us that in G(n, U) we can typically find cliques of order almost 1 ξ(U) log n.
Proof. Let us write m = log ( 1 /ess inf W ). Consider the number γ 0 > 0 and the function q : (0, γ 0 ) → R + given by Lemma 7.2 for the graphon W . Let δ > 0 be fixed such that δκ(W ) < γ 0 . We use (16) to find a set A of positive measure such that Consider now the subgraphon U = W A×A on the probability space A endowed with the measure ν A . We now turn to obtaining the bound on ξ(U). To this end we want to control each term in (48). Claim 7.7.1. Suppose that B ⊆ A is an arbitrary set of positive measure. We have This can be rewritten as Thus, 2 κ(W ) as required.
The term q(δκ(W )) (1−δ)κ(W ) in (52) does not depend on the choice of the set A. Thus, it tends to zero as we let δ 0. We conclude that for δ > 0 sufficiently small, if we select A as in (51), we have for each B ⊆ A of positive measure. By (48), we have If δ > 0 is sufficiently small then the right hand side (which tends to 1 κ(W ) as δ 0) is smaller than 1 κ(W )−ε , and so 1 as was needed.
We shall need the following observation.
Let us split the integration above according to the partition (D i × D j ) i,j∈ [n] . We can neglect the terms for which i = j since then the integrand is log( 1 /1) = 0. So, suppose that i < j. Then for each x ∈ D i , y ∈ D j we have W G (x, y) = w(i, j). Thus, in this case w(i,j)). We conclude that The lemma follows after exponentiation. Lemma 7.9. Suppose that W is a graphon with ess inf W > 0. Suppose that α < 1/ξ(W ). Then asymptotically almost surely, G(n, W ) contains a clique of order α log n.
Proof. Choose δ > 0 such that α(ξ(W ) + δ) < 1 and let us write Let H ∼ H(n, W ). Set k := α log n. Let A be be the family of all sets A ⊆ V (H) of size k for which |ξ(W ) − ξ(W H[A] )| < δ. Lemma 7.5 tells us that with high probability, the graph H has the property that |A| (1 − δ) n k . Condition on this event, and fix a realization of the weighted graph H with a weight function w : V (H) 2 → [0, 1] having the above property. We shall now obtain from H an unweighted graph G by including each edge ij with probability w(i, j). It is our task to show that with high probability, G contains a clique of order k. (Recall that this probability is only with respect to obtaining G from H.) For each A ∈ A set up the indicator X A of the event that G[A] is a clique. Define (note that the denominator is not zero because ess inf W > 0). Let Y = A∈A Y A . To conclude the proof, we want to prove that for each ε > 0 (which we now consider fixed), we have Y > 0 with probability at least 1 − ε, provided that n is sufficiently large.
We have E[Y A ] = 1 for each A ∈ A, and consequently E[Y ] = |A|, which tends to infinity with n → +∞. Below we shall prove that E[Y 2 ] (1 + ε)E 2 [Y ], which will establish (59) via the usual second-moment argument. Recall that E[Y ] = |A| (1 − δ) n k . Thus, Note that the last expression is a geometric series, and its quotient exp − γ log n 2 tends to 0 as n → ∞. Therefore the sum of the series tends to 1. Thus for sufficiently large n and for sufficiently small δ > 0 we get . Therefore, (59) follows from Chebyshev's Inequality.

PROOF OF THEOREM 2.5
Let c = ess inf W . Suppose that W is represented on the unit interval I = (0, 1) equipped with the Lebesgue measure λ. Let us replace the value of W in every point (x, y) ∈ (0, 1) 2 that is not a point of approximate continuity by c. This is a change of measure zero by Fact 3.7. In particular, κ(W ) does not change, nor does the distribution of the model G(n, W ).

Upper Bound
Let ε ∈ (0, κ(W )/4) be arbitrary. Let n be sufficiently large. We want to show that a.a.s. G ∼ G(n, W ) contains no clique of order k = (κ(W ) + ε) log n. Let X n (G) count such cliques. We have E[X n (G)] = (x 1 ,x 2 ,...,xn)∈I n A∈( This summation has n k < n k = exp (κ(W ) + ε) log 2 n terms. By (15), each of these terms is bounded by P k(k−1) 2 k where lim k→∞ P k = exp − 2 κ(W ) . So if n is sufficiently large then each term is bounded by Thus, log n + ε → 0 , as n goes to infinity. Markov's inequality concludes the proof.

Lower Bound
We shall assume that ess sup W < 1. Let us justify this step. Suppose that W is an arbitrary graphon. We can then take a sequence of graphons W 1 , W 2 , . . ., where W j = min(W , 1 − 1 j ) (pointwise). Then (16) tells us that κ(W j ) → κ(W ) (even in the case κ(W ) = +∞). Thus, it suffices to prove a lower bound for each of the graphons W j .
Let ε > 0 be arbitrary. We apply Lemma 7.7 to find a set A ⊆ of positive measure such that for the subgraphon U = W A×A we have 1 ξ(U) κ(W ) − ε. Lemma 7.9 then tells us that asymptotically almost surely, ω(G(n, U)) (κ(W ) − 2ε) log n. Since there is a coupling of G = G(n, W ) and G = G( λ(A)n 2 , U) such that G asymptotically almost surely contains a copy of G , we obtain that (cf. (10)), ω(G(n, W )) (κ(W ) − 3ε) log n asymptotically almost surely.
Since ε > 0 was arbitrary, this completes the proof of Theorem 2.5.

CONCLUDING REMARKS
Our concluding remarks concern possibilities of extending the main result, Theorem 2.5.

Sharpening the Results
As mentioned in Section 1, Matula, Grimmett and McDiarmid proved for p ∈ (0, 1) an asymptotic concentration of ω(G(n, p)) on two consecutive values for which they provided an explicit formula. It is possible that when, say, 0 < ess inf W ess sup W < 1, then ω(G(n, W )) is asymptotically concentrated on two consecutive values.

Sparse Inhomogeneous Random Graphs
Let us look at our set of problems for G(n, p n · W ), where p n → 0, i.e., the model introduced in Section 1.1. Note that Remark 2.7 is no longer valid: the problem of maximum clique and maximum independent set in G(n, p n · W ) is genuinely different. It turns out that the more interesting problem is that of the independent set. For the Erdős-Rényi random graph G(n, p n ), the problem of determining the independence number is essentially solved by the above mentioned work [11,21], and by the work of Frieze [10] down to the range p n 1 n . Note that the regime p n 1 √ n is more subtle as the second moment argument does not work, and indeed Frieze's contribution was in establishing concentration of the count of large independent sets by alternative means. The regime p n = C/n seems to require methods Proof of Claim A.1.1. For ε 1 , ε 2 ∈ (0, 1), let us write g * (ε 1 , ε 2 ) = (1 − ε 1 )g + (1 + ε 2 )g . We define numbers A, B, C, D, and E as in (40). Note that A, B, C, D, E 0. For any ε 1 , ε 2 ∈ (0, 1), the difference (g * (ε 1 , ε 2 )) − (g) can be expressed as (A1) Now let us assume that (g ) < 0, i.e. A < C. Then we have 0. (A2) By (A2), the right hand side of (A1) is a quadratic expression (in the variable ε 1 ) with a positive linear coeficient. Therefore there is ε 1 > 0 (which we fix now) small enough such that ε 1 , A B ε 1 ∈ (0, 1) and Since the function (ε 1 , ε 2 ) → (g * (ε 1 , ε 2 )) is obviously continuous, we can find ε 2 ∈ A B ε 1 , 1 such that (g * (ε 1 , ε 2 )) is still nonnegative. Then we have This finishes the proof.