The impact of heterogeneity and geometry on the proof complexity of random satisfiability

Satisfiability is considered the canonical NP‐complete problem and is used as a starting point for hardness reductions in theory, while in practice heuristic SAT solving algorithms can solve large‐scale industrial SAT instances very efficiently. This disparity between theory and practice is believed to be a result of inherent properties of industrial SAT instances that make them tractable. Two characteristic properties seem to be prevalent in the majority of real‐world SAT instances, heterogeneous degree distribution and locality. To understand the impact of these two properties on SAT, we study the proof complexity of random k$$ k $$ ‐SAT models that allow to control heterogeneity and locality. Our findings show that heterogeneity alone does not make SAT easy as heterogeneous random k$$ k $$ ‐SAT instances have superpolynomial resolution size. This implies intractability of these instances for modern SAT‐solvers. In contrast, modeling locality with underlying geometry leads to small unsatisfiable subformulas, which can be found within polynomial time.


Introduction
Propositional satisfiability (SAT) is arguably among the most-studied problems for both theoretical and practical research. Nonetheless, the gap between theory and practice is huge. In theory, SAT is the prototypical hard problem and hardness of other problems is shown via reductions from SAT. Achieving even a running time of O(2 cn ) for any c < 1 and n variables would be a major breakthrough and a somewhat surprising one at that. On the contrary, reductions to SAT are used to solve various problems appearing in practice, as state-of-the-art SAT solvers can easily handle industrial instances with millions of variables.
This theory-practice gap does not come from the lack of a sufficiently precise theoretical analysis of modern SAT solvers. They are actually provably slow on most instances, i.e., drawing an instance uniformly at random yields a hard instance with probability tending to 1 for n → ∞, if the clause-variable ratio is not too low or way too high [9,21]. Instead, the discrepancy comes focus on unsatisfiable instances, i.e., on the case where a solver has to prove that no satisfying assignment exists. This is typically much harder than finding a satisfying assignment, making the unsatisfiable regime arguably more relevant. Besides these results on SAT, we provide insights on the complexity of weighted higher-order Voronoi diagrams in higher dimensions, which is of independent interest.
The power-law and geometric models both mimic specific properties observed in industrial instances while trying to make as little additional assumptions as possible. Though this makes the resulting instances arguably more realistic than, e.g., instances drawn uniformly at random, we want to stress that even the geometric model is far from a perfect representation of industrial instances. Thus, our results do not claim to completely explain the efficiency of modern SAT solvers on industrial instances. However, to the best of our knowledge, we provide the first theoretical result that links a high level of locality to provably more tractable instances, which we believe to be a first step towards closing the theory-practice gap.

Outline
We state and discuss our main results and technical contributions in Section 2. Formal definitions are in Section 3. A short outline of our core arguments is in Section 4, followed by the formal proofs: lower bounds for the power-law model in Section 5, upper bounds on the complexity of Voronoi diagrams in Section 6, and upper bounds for the geometric SAT model in Section 7.
To not distract from the core arguments, results we use that were either known before or are straight-forward to prove are outsourced to Appendix A.

Results, Technical Contribution, Discussion
In this section, we state our results and discuss the contribution, also in context to previous results. To make the results understandable, we briefly discuss, e.g., the probability distributions over SAT formulas we study. These are short and not meant to be formal definitions. For complete definitions, see Section 3.

Power-Law SAT
The power-law SAT model has four parameters: the number of variables n, the number of clauses m, the number k of variables appearing in each clause, and a power-law exponent β. To draw a formula, power-law weights with exponent β are assigned to the variables and then each clause is generated independently by drawing k variables without repetition using probabilities proportional to the weights. Each literal is negated with probability 1/2.
To discuss our first main contribution, let Φ be a formula drawn from the power-law model with density at or above the satisfiability threshold, i.e., Φ is unsatisfiable at least with constant probability. We show that, although it is likely that Φ is unsatisfiable, it is highly unlikely that modern SAT solvers can figure that out in polynomial time. We prove this using resolution proof complexity.
Resolution is a refutation technique for propositional and first-order logic introduced by [24]. If an application of resolution steps leads to a contradiction, the formula is unsatisfiable. The sequence of resolved clauses then serves as a proof for unsatisfiability, also called a refutation of the formula. The resolution proof system exhibits a strong connection to modern Davis-Putnam-Logemann-Loveland (DPLL) and conflict-driven clause learning (CDCL) SAT solvers: DPLL is polynomially equivalent to tree-like resolution [58] and CDCL with unlimited restarts is polynomially equivalent to resolution [7,54]. Thus, the minimum number of steps necessary to derive a contradiction also yields a lower bound on the running time of solvers simulating the same process. This number of steps is also called the resolution size of a formula, i. e. the minimum number of resolution steps necessary to arrive at a contradiction. Equivalently, the width of a resolution proof is the size of the largest clause appearing in the proof and the resolution width of a formula is the smallest width of any proof refuting that formula. Interestingly, a lower bound w on the resolution width of a formula also implies a lower bound on its resolution size [9]: every resolution proof of a formula in k-CNF has size exp(Ω((w − k) 2 /n)) and every tree-like resolution proof has size 2 w−k .
We will show a lower bound for the resolution width of unsatisfiable formulas drawn from the power-law model. Our results translate to lower bounds on the resolution size and thus to matching lower bounds on the running time of conflict-driven clause learning (CDCL) solvers. For DPLL solvers, which use tree-like resolution, the bounds are even stronger. We only consider the resolution width of unsatisfiable instances. Thus, the probability bound we get is actually a conditional probability conditioned on instances being unsatisfiable. Note that our bound does not only hold above the satisfiability threshold, where a random formula Φ is a. a. s. unsatisfiable, but also at the threshold, where it is unsatisfiable with constant probability.
(iv) If β > 2k−2 k−2 and ∆ ∈ o n ε 3 / log ε 3 n , then w ∈ Ω n · ∆ −1/ε 3 . The above lower bounds allow the density ∆ to be super-constant (even polynomial), which is asymptotically above the satisfiability threshold. For the sake of simplicity, assume ∆ to be constant in the following. Starting at the bottom (iii, iv), we get a linear bound for w if β is sufficiently large, i.e., greater than 3 or (2k − 2)/(k − 2). For β = 3 (ii), the bound is still almost linear. Note that these results in particular imply exponential lower bounds on the resolution size and thus on the running time of CDCL and DPLL. For smaller β (i), we get a polynomial bound for the width with exponent ε 2 /ε 1 ; see Figure 1 for a plot with ε close to 0.
Interestingly enough, our bounds only hold for power law exponents β > 2k−1 k−1 . This is complemented by a previous result [34], which shows that the satisfiability threshold of power-law random k-SAT is at density ∆ = Θ(1) for power law exponents β > 2k−1 k−1 and that asymptotically almost surely instances with constant constraint densities are trivially unsatisfiable for power law exponents β < 2k−1 k−1 . Thus, the resolution width is constant in the latter case. Part iv of Theorem 5.8 is derived via lower bounds on the bipartite expansion of the clausevariable incidence graph of these instances. These results can be of independent interest for hypergraphs with edge size k and for random (0, 1)-matrices. Additionally, these expansion properties yield lower bounds for the clause space complexity, which in turn gives lower bounds on the tree-like resolution size of such formulas (Section 5.2). More precisely, this results in an exponential lower bound on the tree-like resolution size for β > 2k−3 k−2 . This is an improvement of the bound obtained via resolution width.
It is interesting to note that this result on the non-geometric model supports the claim that locality is a crucial factor for easy SAT instances. The lower bounds for the power-law model are solely based on the fact that every set of clauses covers a comparatively large set of variables. In other words, we only use that there are no clusters of clauses with similar variables, i.e., we explicitly use the lack of locality.

Geometric SAT
The geometric model has the following parameters: n, m, and k have the same meaning as for the power-law model. Moreover, w is a weight function assigning each variable v a weight w v and T is the so-called temperature that controls the strength of locality by varying the impact of the geometry. As underlying geometric space, we use the d-dimensional torus T d = R d /Z d (see Section 3) equipped with a p-norm with p ∈ N + ∪ ∞. To draw a formula, the variables and clauses are assigned random positions in T d . Then, for each clause, k variables are drawn without repetition with probabilities depending on the variable weight and on the geometric distance between clause and variable. In the extreme case of T = 0, each clause deterministically includes the k closest variables (where closeness is a combination of geometric distance and weight), while increasing the temperature T increases the probability for the inclusion of more distant variables. For T → ∞, the model converges to uniform random SAT. Note that the weights are a parameter of the model and not drawn randomly. We have the following theorem, where W denotes the sum of all variable weights. The condition on the weights is in particular satisfied by power-law distributed weights.
Theorem 7.12. Let Φ be a formula with n variables and m ∈ Θ(n) clauses drawn from the weighted geometric model with ground space T d equipped with a p-norm, temperature T < 1, W ∈ O(n), and w v ∈ O(n 1−ε ) for every v ∈ V and any constant ε > 0. Then, Φ contains a. a. s. an unsatisfiable subformula of constant size, which can be found in O(n log n) time.
To briefly explain how we prove this, consider a simplified version where variables and clauses are points in the Euclidean plane and each clause contains the k variables geometrically closest to it (temperature T = 0). Now consider the equivalence relation obtained by defining two points of the plane equivalent if and only if they have the same set of k closest variables. The equivalence classes of this relation are the regions of the order-k Voronoi diagram of the variable positions. With this connection, we can use upper bounds on the complexity of order-k Voronoi diagrams [46] to prove the existence of small and easy to find unsatisfiable subformulas. We note that this result is of asymptotic nature. In particular for small densities, the number of variables n has to be very large before the instances actually get as easy as stated in Theorem 7.12. Nevertheless, this results strongly suggests that an underlying geometry makes SAT instances more tractable.
To extend the above argument to the general statement in Theorem 7.12, we extend the complexity bounds for order-k Voronoi diagrams in various ways; see next section for more details. Moreover, for non-zero temperatures, clauses no longer include exactly the k closest variables but can, in principle, consist of any set of k variables. However, we can show that, with high probability, a linear fraction of clauses behaves as in the T = 0 case. We note that analyses of similar structures, such as hyperbolic random graphs, are often restricted to the simpler but less realistic T = 0 case, e.g., [11,12,14,48]. We believe that our analysis provides insights on the non-zero temperature case that can be helpful for such related questions.
We note that our results seem to contradict the results of Mull et al. [49], stating that (i) a strong community structure is not sufficient to have tractable SAT instances and that (ii) the community attachment model [38], which enforces a community structure, generates hard instances. However, at a closer look, this is not a contradiction at all. Though measuring the community structure, e.g., via modularity, is a good indicator for locality, the concept of locality goes deeper. If the instance can be partitioned such that there are strong ties within each partition and loose ties between partitions, then the instance has a strong community structure. However, to have a high level of locality, this concept has to hierarchically repeat on different levels of magnitude, i.e., there needs to be community structure within each partition and between the partitions. To state this slightly differently, consider locality based on a notion of similarity between objects (here: variables or clauses). In this paper, we use distances between random points in a geometric space as a measure for similarity, which gives us a continuous range of more or less similar objects. In contrast to that, in the above mentioned papers focusing on a flat community structure [38,49], similarity is a binary equivalence relation: two objects are either similar or they are not.

Voronoi Diagrams
Consider a finite set of points, called sites, in a geometric space. The most commonly studied type of Voronoi diagram assumes the 2-dimensional Euclidean plane as ground space and has one Voronoi region for each site, containing all points closer to this site than to any other site. We deviate from this default setting in four ways: (i) We allow an arbitrary constant dimension d, where the ground space is the torus or a hypercube in R d . (ii) We consider the order-k Voronoi diagram, which has for every subset A of sites with |A| = k a (possibly empty) Voronoi region containing all points for which A are the k nearest sites. The number of non-empty order-k Voronoi regions is called the complexity of the diagram. (iii) The sites have multiplicative weights that scale the influence of the different sites. Without loss of generality, we assume the weights to be scaled such that the minimum is 1. (iv) We allow the p-norm for arbitrary p ∈ N + ∪ ∞. Theorem 6.9. Let S be a set of n sites with minimum weight 1, total weight W , and random positions on the d-dimensional torus equipped with a p-norm, for constant d. For every fixed k, the expected number of regions of the weighted order-k Voronoi diagram of S is in O(W ). The same holds for random sites in a hypercube.
To set this result into context, we briefly discuss previous work on the complexity of Voronoi diagrams in different settings. See the book by Aurenhammer et al. [6] for a general overview on Voronoi diagrams. To this end, we use the following theorem that relates the complexity in terms of Voronoi regions (which is what we are concerned with in this paper) with the complexity in terms of vertices. 2 Theorem 6.2. Let S be a set of n weighted sites in general position in R d equipped with a p-norm. If the order-k Voronoi diagram has vertices, then the order-(k + d) Voronoi diagram has Ω( ) non-empty regions.
We note that, using insights from previous work, this theorem is not hard to prove. One basically has to generalize the result by Lê [45] bounding the number of d-spheres going through d + 1 points in d-dimensional space to weighted sites, and then observe how the Voronoi diagram changes in the construction by Lee [46] for d = 2, when going from order-k to order-(k + 1). However, we are not aware of previous work stating this connection between vertices and non-empty regions in higher orders explicitly.  Figure 2: (a) Weighted Voronoi diagram (order-1) of the colored sites. Continuing the construction with n/2 high-weight sites on the left and n/2 low-weight sites towards the right yields Θ(n 2 ) vertices (small black dots). Note that each vertex lies on the boundary of three regions and has thus equal weighted distance to its three closest sites. (b) The order-3 Voronoi diagram for the same sites (excluding one). The colored boxes indicate the three closet sites. The order-1 diagram is shown in the background. Each order-1 vertex lies in the interior of an order-3 region as it has equal weighted distance to its three closest sites. As at most two order-1 vertices share an order-3 region, we get Ω(n 2 ) order-3 regions. Theorem 6.2 generalizes this observation.
The four above-mentioned generalizations of the basic Voronoi diagram (higher dimension, higher order, multiplicative weights, and different p-norms) have all been considered before. However, to the best of our knowledge, not all of them together.
Higher-order Voronoi diagrams have been introduced by Shamos and Hoey [57]. Lee [46] showed that the order-k Voronoi diagram in the plane (unweighted with Euclidean metric) has complexity O(k (n − k)) (in terms of number of regions), which is linear for constant k. For the 1-and ∞-norm, Liu et al. [47] improved this bound to O(min{k (n − k), (n − k) 2 }). Closely related to the 1-norm, Gemsa et al. [37] showed similar complexity bounds for higher-order Voronoi diagrams on transportation networks of axis-parallel line segments. Bohler et al. [15] show an upper bound of 2k (n − k) for the much more general setting of abstract Voronoi diagrams. There, the metric is replaced by curves separating pairs of sites such that certain natural (but rather technical) conditions are satisfied. One obtains normal Voronoi diagrams when using perpendicular bisectors for these curves. This in particular shows that the 2k (n − k) bound on the number of regions in the order-k Voronoi diagram holds for arbitrary p-norms in 2-dimensional space and for the hyperbolic plane. As the hyperbolic plane is closely related to 1-dimensional space with sites having multiplicative power-law weights [18], we suspect that the bound by Bohler et al. [15] also covers this case.
In general one can say that higher-order Voronoi diagrams of unweighted sites in 2-dimensional space are well-behaved in that they have linear complexity. This still holds true for arbitrary p-norms. However, this picture changes for weighted sites or higher dimensions.
Voronoi diagrams with multiplicative weights were first considered by Boots [17] 3 due to applications in economics. Beyond that, multiplicatively weighted Voronoi diagrams have applications in sensor networks [23], logistics [36] and the growth of crystals [25]. However, even in the most basic setting of 2-dimensional Euclidean space and order 1, weighted Voronoi diagrams can have quadratic complexity [5] (in terms of number of vertices). This comes from the fact that Voronoi cells are not necessarily connected; see Figure 2a for the construction of Aurenhammer and Edelsbrunner [5] that proves the lower bound. With Theorem 6.2, and as illustrated in Figure 2, this implies that even the order-3 Voronoi diagram of weighted sites in 2-dimensional Euclidean space has a quadratic number of non-empty regions. As a special case, Theorem 6.9 shows that this complexity is only linear in the total weight for sites positioned randomly in the unit square. Moreover, this also implies that the number of vertices of the corresponding order-1 Voronoi diagram is linear. This nicely complements the result by Har-Peled and Raichel [42], who show that the expected complexity of order-1 Voronoi diagrams of sites in 2-dimensional Euclidean space with random weights is O(n polylog n). Only recently, Fan and Raichel [32] showed that sites with weights chosen randomly form a constant-sized set of possible weights yield Voronoi diagrams with linear complexity. Moreover, more closely related, they show that the Voronoi diagram of sites with arbitrary weights and with random positions chosen in the unit square has linear complexity in expectation. We are not aware of any results concerning the complexity of Voronoi diagrams when combining multiplicative weights with higher dimension, higher order or other norms.
For higher dimensions, even normal (first order, unweighted) Voronoi diagrams in 3dimensional Euclidean space can have Θ(n 2 ) [43,56] vertices. Theorem 6.2 thus implies that the order-4 Voronoi diagram has a quadratic number of non-empty regions. Moreover, the complexity of higher-order Voronoi diagrams in higher dimensions has been considered before by Mulmuley [50], who obtains polynomial bounds with the degree of the polynomial depending on the dimension. Our Theorem 6.9 in particular shows that this complexity is much lower, namely linear, for the hypercube with randomly positioned sites. Moreover, via Theorem 6.2 this gives a linear bound on number of vertices in the normal order-1 Voronoi diagram in higher dimensions. We note that this special case of our result coincides with a previous result by Bienkowski et al. [10]. Similarly, Dwyer [28] showed that sites drawn uniformly from a higher dimensional unit sphere (instead of a hypercube) yield Voronoi diagrams of linear complexity in expectation. Moreover, due to Golin and Na [41] and Driemel et al. [26], the same is true for random sites on 3-dimensional polytopes and random sites on polyhedral terrains, respectively. Thus, though higher dimensional Voronoi diagrams can be rather complex in the worst case, these results indicate that one can expect most instances to be rather well behaved. An alternative explanation of why the complexity of practical instances is lower than the worst-case indicates is given by Erickson [29,30], who studies the complexity of 3-dimensional Voronoi diagrams depending on the so-called spread of the sites.
The above results for higher dimensional Voronoi diagrams consider the Euclidean norm. For general p-norms, Lê [45] showed that the complexity of the Voronoi diagram is bounded by O(n c ), where c is a constant independent of p but dependent on the dimension d. With the same argument as above, Theorem 6.9 together with Theorem 6.2 implies a linear bound for this complexity that holds in expectation. Moreover, Boissonnat et al. [16] show more precise bounds of Θ(n d/2 ) and Θ(n 2 ) for the ∞-and the 1-norm, respectively. Again, our result implies linear bounds for random sites in this setting.

Formal Definitions
Here we provide formal definitions for all concepts we use throughout the paper, including the power-law and geometric random SAT models, Resolution, and Voronoi diagrams.

k-SAT
We let x 1 , x 2 , . . . , x n denote Boolean variables that can be either true or false. A clause is a disjunction of literals 1 ∨ . . . ∨ k , where each literal assumes a (possibly negated) variable. For a literal i let | i | denote the variable of the literal. A formula Φ in conjunctive normal form (CNF) is a conjunction of clauses c 1 ∧ . . . ∧ c m and a formula in k-CNF is a conjunction of clauses, where each clause contains exactly three distinct literals. We conveniently interpret a Boolean formula in CNF as a set of clauses and a clause c both as a Boolean formula and as a set of literals. We say that Φ is satisfiable if there exists an assignment of variables x 1 , . . . , x n such that the formula evaluates to true.

Power-Law Random k-SAT
The power-law model can be defined via the more general non-uniform model. To draw a k-SAT formula from the non-uniform model, let n and m be the number of variables and clauses, respectively, and let w 1 , . . . , w n be variable weights. We sample m clauses independently at random. Each clause is sampled by drawing k variables without repetition with probabilities proportional to their weights. Then each of the k variables is negated independently at random with probability 1/2.
The power-law model for a power-law exponent β > 2 is an instantiation of the non-uniform model with discrete power-law weights

Resolution
The resolution proof system uses two rules, the resolution rule and the weakening rule. Given two clauses a ∨ x and b ∨ x, where a and b are clauses and x is a Boolean variable, the resolution rule states i. e. the clause a ∨ b is a logical consequence of the two given clauses. The weakening rule states that for any two clauses a and b it holds that i. e. if a holds, then a ∨ b holds as well. For a formula Φ = {c 1 , c 2 , . . . , c m } in CNF a resolution derivation of a clause c from Φ is a sequence of clauses (d 1 , d 2 , . . . , c) such that each clause d i is either one of the initial clauses c 1 , . . . , c m or derived from previous clauses with either the resolution rule or the weakening rule. A resolution refutation is a resolution derivation of the empty clause. The size of a derivation is the number of clauses it contains. The size of a formula in CNF is the size of a smallest refutation for it. The width of a derivation is the size of the largest clause in it. The width of a formula in CNF is the smallest width of any refutation for it.

Graph Representation and Expansion
Let Φ be a SAT-formula with variable set V and clause set C. The clause-variable incidence graph G(Φ) of Φ has vertex set C ∪ V , with an edge between a clause and a variable if and only if the clause contains the variable. Clearly, G(Φ) is bipartite. It is an (r, c)-bipartite expander if for

Geometric Ground Space
We regularly deal with points with random positions in some geometric space. With random point, we refer to the uniform distribution in the sense that the probability for a point to lie in a region A is proportional to its volume vol(A). For this to work, the volume of the ground space has to be bounded. Canonical options are, e.g., a unit-hypercube or a unit-ball. These, however, lead to the necessity of special treatment for points close to the boundary, which makes the analysis more tedious without giving additional insights. To circumvent this, we use a torus as ground space, which is completely symmetric. The d-dimensional torus T d is defined as the d-dimensional hypercube [0, 1] d in which opposite borders are identified, i.e., a coordinate of 0 is identical to a coordinate of 1. 4 It is equipped with the p-norm as metric, for arbitrary but fixed p ∈ N + ∪ {∞}. To define it formally for the torus, let p = (p 1 , . . . , p d ) and q = (q 1 , . . . , q d ) be two points in T d . The circular difference between the ith coordinates is |p i − q i | • = min{|p i − q i |, 1 − |p i − q i |}. With this, the distance between p and q is

Random Points
We obtain the uniform distribution for a point p = (p 1 , . . . , p d ) by drawing each coordinate p i uniformly at random from [0, 1]. For two random points p and q, their distance p − q is a random variable. Let F dist (x) be its cumulative distribution function (CDF), i.e., F dist (x) = Pr p − q ≤ x . To determine F dist (x), fix the position of p. Then, for x ≤ 0.5, the set of points of distance at most x to p is simply the ball B p (x) of radius x around p, yielding where Γ is the gamma function. Note that Π d,p only depends on d and p but is constant in x. Moreover Π 2,2 = π (thus the name Π), and Π d,∞ = lim p→∞ Π d,p = 2 d . For distances x > 0.5, the formula for F dist (x) is more complicated (we basically have to subtract the parts reaching out of the hypercube). However, for our purposes, it suffices to know F dist (x) for x ≤ 0.5 and use the obvious bound F dist (x) ≤ 1 for x > 0.5.

Weighted Points and Distances
We regularly consider a fixed set of n points equipped with weights, which we call sites. For a site s i with weight w i , the weighted distance of a point p to s i is s i − p /w 1/d i . For a fixed value x, the set of points with weighted distance at most x are the points with s i − p ≤ xw Note that the volume of this set is proportional to w i . Intuitively, the region of influence of a site is thus proportional to its weight. To simplify notation in some places, we define normalized weights

Geometric Random k-SAT
In the geometric model, we sample positions for the variables and clauses uniformly at random in the d-dimensional torus T d . For v ∈ V and c ∈ C, we use v and c to denote their positions, respectively. Let w 1 , . . . , w n be variable weights that are normalized such that the smallest weight is 1. Moreover, let W = n v=1 w v . For a clause c and a variable v, define the connection weight This is the reciprocal of the weighted distance between v and c raised to the power d/T . The k variables for the clause c are drawn without repetition with probabilities proportional to X(c, v).
Among all possible combinations, we choose which of the k variables to negate uniformly at random, without repetition if possible, i.e., we only get the same clause twice if we have more than 2 k clauses with the same variable set. For T → 0 the model converges to the threshold case where c contains the k variables with smallest weighted distance.
The connection weight X(c, v) is a random variable. We denote the CDF of X(c, v) with F X (x). With the CDF for the distance between two random points in Equation (1), we obtain the following; see Lemma A.2 for a proof: (2)

Voronoi Diagrams
Let S = {s 1 , . . . , s n } be a set of sites with weights w 1 , . . . , w n . A point p belongs to the (open) Voronoi region of a site s i if its weighted distance to s i is smaller than its weighted distance to any other site. The collection of all Voronoi regions is the Voronoi diagram of S. Order-k Voronoi regions are defined analogously for subsets A ⊆ S with |A| = k, i.e., the region of A contains a point p if and only if the weighted distances of p to all sites in A is smaller than the weighted distance to any site not in A. More formally, p belongs to the order-k Voronoi region of A if there exists a radius r such that s i − p ≤ ω i r for s i ∈ A and s i − p > ω i r for s i / ∈ A. Note that the order-k Voronoi region of A is potentially empty. The order-k Voronoi diagram is the collection of all non-empty order-k Voronoi regions. Its complexity is the number of such non-empty regions.

Core Arguments
Before delving into the technical details of our proofs in the subsequent sections, we briefly discuss the core arguments.

Power-Law SAT
We use a framework that Ben-Sasson and Wigderson [9] introduced for the uniform SAT model. We prove lower bounds for the resolution width, which imply lower bounds for the resolution size and the tree-like resolution size, which then imply lower bounds for the running times of CDCL and DPLL solvers, respectively.
To bound the resolution width, we essentially have to show that different clauses do not overlap too heavily. Specifically, a formula has resolution width Ω(w) if (1) every set S of at most w clauses contains at least |S| different variables and (2) every set S of 1 3 w ≤ |S| ≤ 2 3 w clauses contains at least a constant fraction of unique variables.
We achieve the bounds in Theorem 5.8 (i-iii) by showing the above two properties directly. For the bound in Theorem 5.8 (iv), we first observe that both properties are fulfilled if the clause-variable incidence graph of a k-CNF formula Φ has high enough bipartite expansion. Recall the definition of bipartite expansion from Section 3 and note how the requirement that the neighborhood of clause vertices is large resembles the requirement that clauses do not overlap too heavily. We show that G(Φ) is a bipartite expander asymptotically almost surely if Φ is drawn from the power-law model, which yields the lower bound of Theorem 5.8 (iv).
Compared to the uniform case, the weights make the properties required for the lower bounds less likely. Variables with high weight appear in many clauses, making the clauses less diverse. Thus, it is less likely that every clause set covers a large variety of variables.

Geometric SAT
To explain the core idea of our proof, consider the following simplified geometric model. Map n variables and m clauses to distinct points in the 2-dimensional Euclidean plane (randomly or deterministically). Build the SAT instance by including in each clause c the k variables with the smallest geometric distance to c. Now consider the order-k Voronoi diagram defined by the positions of the n variables. As a clause c contains the k closest variables, the k variables contained in c are exactly the k variables defining the Voronoi region of c's position. Independent of the positions of the n variables, there are only at most 2k (n − k) regions in the order-k Voronoi diagram [15]. Thus, if we have at least 2 k 2k (n − k) clauses, then, by the pigeonhole principle, at least one Voronoi region contains 2 k clauses. As k is considered to be a constant, this number of clauses is linear in n, i.e., we still have constant density. Moreover, as repeating the same clause (with the same variable negations) is avoided whenever possible, there is a set of k variables that has a clause for every combination of literals. Thus, we have an unsatisfiable subformula of constant size 2 k , which implies low proof complexity.
This result can be varied and strengthened in multiple ways, e.g., by allowing weighted variables, a higher dimensional ground space, or by softening the requirement that every clause contains the k closest variables (model with higher temperature). In the following, we briefly discuss how these generalizations can be achieved.

Abstract Geometric Spaces
The result by Bohler et al. [15] on the complexity of order-k Voronoi diagrams is very general in the sense that it holds for abstract Voronoi diagrams. Roughly speaking, abstract Voronoi diagrams are based on separating curves between pairs of points that take the role of perpendicular bisectors. In this way, one can abstract from the specific geometric ground space. Whether a point p is closer to site s 1 or to site s 2 is no longer determined by comparing distances s 1 − p and s 2 − p but by the curve separating s 1 from s 2 . For this to work, the separating curves have to satisfy a handful of basic axioms. These are for example satisfied by perpendicular bisectors in the Euclidean or the hyperbolic plane. Thus, the above argumentation for low proof complexity directly carries over to the hyperbolic plane, or more generally, to any abstract geometric space satisfying the axioms.

Lower Density Via Random Clause Positions
Assume the variable positions are fixed. Now choose random positions for the clauses and observe in which regions of the order-k Voronoi diagram they end up. We want to know whether there is a region that contains at least 2 k clauses. This comes down to a balls into bins experiment. Each Voronoi region is a bin and each clause is a ball. Thus, there are O(n) bins and m balls. Moreover, we are interested in the maximum load, i.e., the maximum number of balls that land in a single bin. Due to a result by Raab and Steger [55], the maximum load is a. a. s. in Ω( log n log log n ) if we throw Ω( n polylog n ) balls. Thus, even for a slightly sublinear number of balls, the maximum load is superconstant. We note that this result holds for uniform bins. In our case, we have non-uniform bins, as the probability for a clause to end up in a particular Voronoi region is proportional to the area of the region. However, it is not hard to see that the result by Raab and Steger [55] remains true for non-uniform bins; see Section A.5. Thus, even if the number of clauses m is slightly sublinear in the number of variables n, we get a small unsatisfiable subformula asymptotically almost surely if the Voronoi diagram has low complexity.

Positive or Negative Literals with Repetition
Above we assumed that we get the exact same clause with coinciding negations twice only if we already have more than 2 k clauses with the same set of k variables. Although this is arguably a reasonable assumption for the model, we can make a similar argument without it. Assume instead that for each variable, we choose the positive and negative literal uniformly at random, independently of all other choices. Moreover, assume for an increasing function f , that there are f (n) clauses that have the same set of k variables. With the above balls into bins argument, we, e.g., have f (n) ∈ Ω( log n log log n ). Then the probability that there is a combination of positive and negative literals that we did not see at least once is at most 2 k (1 − 2 −k ) f (n) . This probability goes to 0 for n → ∞, i.e., a. a. s., there is an unsatisfiable subformula of constant size 2 k .

Higher Dimension and Weighted Variables
At the core of our argument lies the fact that order-k Voronoi diagrams have linear complexity in the plane. As already mentioned in Section 2.3, this is no longer true for order-k Voronoi diagrams in higher dimensions or if the variables have multiplicative weights. A formal argument for why this property breaks is in Section 6.1. However, for sites distributed uniformly at random, we show in Section 6.2 that the complexity can be expected to be linear in the total weight, even in the more general setting. Thus, using that the variables have random positions (a requirement we did not need before), we can apply the above argument to obtain low proof complexity.

Non-Zero Temperature
Non-zero temperatures make it so that clauses do not necessarily contain the k closest variables. Instead, variables are included with probabilities depending on the distance. Thus, we cannot simply look at the order-k Voronoi diagram to determine which variables are contained in a given clause. We resolve this issue in Section 7. For this, we call a clause nice, if it behaves as it would in the T = 0 case, i.e., if it includes the k closest variables. In Section 7.1 we show that, in expectation, a constant fraction of clauses is actually nice. Moreover, in Section 7.2, we show that the number of nice clauses is concentrated around its expectation. With this, we can apply the same arguments as before to only the nice clauses, of which we have linearly many, to obtain a low proof complexity.

Voronoi Diagrams
The worst-case lower bounds for the complexity of order-k Voronoi diagrams follow from existing lower bounds on the number of vertices together with Theorem 6.2, which connects the complexity in terms of regions with the complexity in terms of vertices. This connection is obtained by observing how the order-k Voronoi diagram changes when increasing k.
For the average-case linear upper bound on the number of regions, the argument works roughly as follows, assuming the unweighted case for the sake of simplicity. For each size-k subset A of the sites, we devise an upper bound on the probability that A has a non-empty order-k Voronoi region. This region is non-empty if and only if there are points that have A as the k closest sites, i.e., if there is a ball that contains the sites of A and no other sites. With this observation, we can use a win-win-style argument. Either the radius of this ball is small, which makes it unlikely that all sites of A lie in the ball, or the ball is large, which makes it unlikely that it contains no other sites.

The Direct Approach
As stated in Section 4.1, a formula has resolution width Ω(w) if (1) every set S of at most w clauses contains at least |S| different variables and (2) every set S of 1 3 w ≤ |S| ≤ 2 3 w clauses contains at least a constant fraction of unique variables. In this section we are going to show that both conditions are satisfied for power-law exponents β > 2k−1 k−1 and clause-variable ratios ∆ ∈ Ω(1). The first condition can also be interpreted in terms of bipartite expansion. It states that the clause-variable incidence graph G(Φ) is a (w, 0)-bipartite expander. The following lemma states bounds on w for which G(Φ) is a (w, 0)-bipartite expander asymptotically almost surely. These bounds depend on the power-law exponent β as well as on the clause-variable ratio ∆. Note that our choices of k and β in the lemma ensure ε 1 , ε 2 > 0.
This implies that every variable in N (Ĉ) has to appear at least twice. Otherwise, one could delete a clause with a unique variable fromĈ to get a setĈ with |Ĉ | = i − 1 and |N (Ĉ )| ≤ i − 2. This would violate the minimality ofĈ. Also,Ĉ must contain exactly i − 1 different variables. Otherwise, we could remove any clause fromĈ and violate minimality.
where P i is the probability to draw i clauses which contain at most i − 1 different variables and all of them at least twice. We can now imagine the k · i variables of the i clauses to be drawn independently with replacement. This would only increase the probability that the i clauses contain at most i − 1 different variables and all of them at least twice. Thus, the probability we consider is an upper bound. Now we consider the i − 1 different variables drawn. Then, we choose the i − 1 pairs of positions where each variable appears for the first and second time. As a rough upper bound we have at most many possibilities by simply choosing i − 1 from all k·i 2 possible pairs. Now we bound the probability that at these pairs of positions the same variables do appear. This is at most n j=1 p 2 j per pair of positions. At the remaining k · i − 2 · (i − 1) positions we can only choose from at most those i − 1 variables. Thus, the probabilities at all other positions are the sum of the i − 1 variable probabilities, which is at most the sum of the i − 1 highest variable probabilities. Let F (i) be the sum of the i highest variable probabilities. Then it holds that for a constant κ = κ(k, β) > 0 that might depend on other parameters, which are fixed to constants as well. We will use κ to collect all constant factors. According to Lemma A.1 n j=1 Thus, our result depends on the power law exponent β. For β < 3 we get which holds since we assume m = ∆ · n and m i ≤ e·m i i . In order to have a sum which is o(1) we want to ensure that κ · ∆ · n −ε 1 · i ε 2 is at most a constant smaller than 1. It is easy to check that this holds for Thus, we can set w to this value. If we split the sum in Equation (3) For β > 3 we get is at most a small constant for w ∈ Θ(n · ∆ −1/ε 2 ) sufficiently small. By splitting the sum as before, we can show ((n · ∆ −1/ε 2 ), 0)-expansion with probability at least 1 − Θ(∆ · log ε 2 n/n ε 2 ) or a. a. s. for ∆ ∈ o n ε 2 / log ε 2 n . For β = 3 we get the same result as for β > 3, except for an additional factor of (ln small enough, we can ensure that this sum is at most O(∆ · log n · log(n)/n (k−2)/2 ) by splitting the expression at i 0 = ln n again. Hence, we get Θ(n · (∆ · log n) −2/(k−2) ), 0expansion with probability at least 1 − O ∆ · ln n · (log(n)/n) (k−2)/2 or a. a. s. for ∆ ∈ o n (k−2)/2 / log (k−2)/2+1 (n) .
Now we want to show the second requirement of Theorem 5.3, that every set S of 1 3 w ≤ |S| ≤ 2 3 w clauses contains at least a constant fraction of unique variables. Again, our choices of k and β in the lemma ensure that we can always choose an ε > 0 with ε 1 , ε 2 > 0.
The upper bounds on ε ensure ε 1 > 0 and ε 2 > 0. We want to bound the probability that there is a set of clauses C with 1 3 w ≤ |C | ≤ 2 3 w and at most ε · |C | many unique variables. Let P i be the probability that there is a set C of size i with that property. We assume the k · i Boolean variables to be drawn independently at random, i. e., we allow duplicate variables inside clauses. This only decreases the probability of having unique variables. Additionally, we split the probability into parts depending on the number j of different variables that appear in C in addition to the ε · i unique ones. It holds that where κ = κ(k, ε, β) > 0 is a constant that might depend on other parameters, which are fixed to constants. Note that we estimated the probability to draw a new (unique) variable with 1. Thus, this also accounts for the probability to draw a variable that is not actually new. Especially, it accounts for the probability to draw one of the j non-unique variables. This means, the expression we have is an upper bound for the probability to draw at most ε · i unique variables. As in the proof of Lemma 5.1 we have to distinguish three cases depending on the power law exponent β. Using Lemma A.1 we see that for β < 3 Now it remains to bound the inner sum. In order to do so, we will split it at j 0 = 3−β 4 (k − ε) · i. It is easy to see that 0 < 3−β 4 < 1 4 for 2 < β < 3, thus this choice of j is valid. For the first part of the sum it holds that Additional factors of at most c i for positive constants c are hidden in κ i . For the second part of the sum it holds that where we used j ≥ 3−β 4 (k − ε) · i in the second and a geometric series in the third line. The base of the series is i 3−β β−1 ≥ 1. Thus, the last term with j = k−ε 2 · i dominates and we get the shown estimate with factors c i for positive constants c hidden in κ i again. Thus, and plugging this into Equation (4) yields Since we want to sum over all This sums up to o(1) as soon as κ·∆·n −ε 2 ·w ε 1 is a suitably small constant and w is super-constant. In our case, we see that this holds for some For β = 3 we get We want to show that this inner sum is at most κ i · (i · ln n) k−ε 2 i . As before, we can split the sum. This time we split it at j 0 = k−ε 4 i. For the first part we get The second line contains a geometric series with base ln n ≥ 1 again that we estimated by its dominating term (ln n) Plugging this into Equation (5) gives us As before, we can see that this is at most κ i for some constant κ ∈ (0, 1) if For β > 3 we get This time we are going to show that the inner sum is bounded by i Again, we split the sum. This time at Thus, in the first part of the sum all exponents are positive. It now holds that for some constant κ that we can incorporate in the κ we already have. In the second part of the sum the exponent ((k − ε) · i − 2j) β−2 β−1 − j is negative. However, we know that the base is where the last line holds, since 2 β−2 β−1 − 1 > 0, which implies that we have a geometric series with base at least one again, that we estimate by its dominating term, i. e. the term with j = k−ε 2 · i. If we plug our estimate into Equation (6) this gives us We can now find a w ∈ Θ(n · ∆ −1/ε 1 ) small enough such that the property holds as desired.
In all three cases we can choose w in such a way that the probability for the property not to hold is at most κ w 3 for some constant κ ∈ (0, 1). This means, the property holds a. a. s. for w ∈ ω(1).
The two properties we showed in Lemma 5.1 and Lemma 5.2 can be used to derive lower bounds on resolution width via the following theorem by Ben-Sasson and Wigderson [9].
). Let Φ be an unsatisfiable k-CNF formula with k ≥ 3. If there is a w ∈ N such that (i) for all sets of clauses C with |C | ≤ w it holds that C contains at least |C | different Boolean variables and (ii) for all sets of clauses C with 1 3 w ≤ |C | ≤ 2 3 w it holds that C contains at least ε · |C | unique variables for some constant ε > 0.
then the resolution width of Φ is Ω(w).
Lemma 5.1 and Lemma 5.2 together with Theorem 5.3 imply Corollary 5.4. However, Theorem 5.3 only works for unsatisfiable instances. Since the two lemmas do not condition on instances being unsatisfiable, we also need to make sure that the probability for having unsatisfiable instances is large enough. In particular, we have to guarantee that this probability is larger than the error probabilities of Lemma 5.1 and Lemma 5.2. If the probability of generating unsatisfiable instances is asymptotically larger than those error probabilities, the conditional probability of our width lower bounds to hold conditioned on instances being unsatisfiable will be approaching one. Since the error probabilities of the two lemmas are o(1), we want the clause-variable ratio ∆ to be high enough for instances to be unsatisfiable with at least constant probability. The resulting corollary is stated below. It only holds for unsatisfiable instances as well, i. e. the probability bound on resolution width is actually a conditional probability conditioned on instances being unsatisfiable.
For β > 3 we have to show This shows that in all three cases the bounds from Lemma 5.2 are smaller, thus giving us the lower bounds on resolution width as stated in the corollary. This is nearly the statement of Theorem 5.8. However, via bipartite expansion we can already show linear resolution width at constant clause-variable ratios for β > 2k−2 k−2 instead of β > 3. This gives a better bound for k ≥ 5. The bounds on bipartite expansion and the resulting bounds on resolution width will be derived in the next section.

A Lower Bound on Bipartite Expansion
In this section we show an improved bound on the bipartite expansion. We will use it to obtain a linear lower bound on resolution width for β > 2k−2 k−2 , which is potentially smaller than 3, and therefore improves the previous bound. Recall that linear resolution width implies exponential resolution size, and thus also exponential tree-like resolution size. Moreover, our bound on the bipartite expansion can also be used to bound the so-called resolution clause space, which additionally yields an exponential lower bound on tree-like resolution size for β > 2k−3 k−2 as we will see at the end of this section. The following lemma shows the bipartite expansion property.
Lemma 5.5. Let Φ be a random power-law k-SAT formula with n variables, m clauses, k ≥ 3, power-law exponent β > 2k−3 k−2 , and let ε ∈ (0, (k − 1) · β−2 β−1 − 1) constant. If ∆ = m/n ∈ o n ε / log ε n , then there exists an r ∈ Θ n · ∆ −1/ε such that the clause-variable incidence graph Proof. First, note that our choice of β > 2k−3 k−2 guarantees that the interval (0, (k − 1) · β−2 β−1 − 1) from which we choose ε is not empty. This interval is chosen in such a way that c > 0 is guaranteed. As in the proof of [8, Lemma 5.1], we define a bad event E, that G (Φ) is not an (r, c)-bipartite expander. If E happens, then there is a set C ⊆ C with 1 ≤ |C | ≤ r such that |N (C )| < (1 + c) · |C |. Given a set C ⊆ C = [m] of clause indices with |C | = i we want to bound the probability P i that the k · i indices of variables appearing in those clauses contain at most (1 + c) · i different variables. Since clauses contain variables without repetition, it holds that P i is dominated by the probability to draw at most (1 + c) · i different variables when drawing k · i Boolean variables independently at random. Now imagine sampling these k · i variables in some arbitrary, but fixed order. It holds that the probability to draw a new variable is at most 1, while the probability to draw an old variable is at most the probability to draw one of the (1 + c) · i variables of maximum probability. As before, the sum of these probabilities is denoted by F ((1 + c) · i). This gives us Note that this expression also captures the case that we draw fewer than (1 + c) · i different variables, since the probability to draw a new variable is bounded by one and thus also captures the probability that this new variable is in fact an old one. In the case of a power-law distribution, we have due to Lemma A.1 and thus for some constant κ(c, β, k) > 0, m = ∆ · n, and c = (k − 1) − (1 + ε) · β−1 β−2 .
Summing over all i ≥ 1 now yields We split this sum into two parts, the first part from i = 1 to ε · log n and the second part from ε · log n to r. For the first part we get which holds, since m i=1 α i ≤ 2 · α for all m ≥ 1 and α < 1 2 . This holds for big enough values of n and for ∆ ∈ o(n ε / log ε n). For the second part we get This notion of bipartite expansion is connected to the resolution width of a formula. The following corollary, implicitly stated by Ben-Sasson and Wigderson [9], formalizes this connection.
Proof. Due to the definition of bipartite expansion, k+ε 2 > 1 ensures the first condition of Theorem 5.3. We will show that the second condition is fulfilled as well. Let G(Φ) = (C, V, E) and let C ⊆ C with 1 3 r ≤ |C | ≤ 2 3 r. Let δC denote the set of unique variables from C , i. e. δC = v ∈ N (C ) | |N (v) ∩ C | = 1 . As Ben-Sasson and Widgerson state in [9, proof of Theorem 6.5] it holds that: due to the r, k+ε 2 − 1 -bipartite expansion. These two properties imply a resolution width of Ω(r).
This result on the bipartite expansion of power-law random k-SAT allows us to derive the following corollary on resolution width. Again, we require the clause-variable ratio ∆ to be high enough for instances to be unsatisfiable with at least constant probability.
Together with Corollary 5.4 the former corollary implies Theorem 5.8.
Additionally, Ben-Sasson and Galesi [8] state a theorem that directly connects bipartite expansion and tree-like resolution size. An application of this theorem yields a slightly better bound on tree-like resolution size than the ones derived from resolution width. Proof. [8, Theorems 4.2 and 3.3] state together that any bipartite graph that is an (r, c)-bipartite expander has a resolution clause space of at least c·r 2+c . Thus, with [31, Theorem 1.6], it holds that the resolution size for formulas whose clause-variable incidence graph is an (r, c)-bipartite expander, is at least exp c·r 2+c .
This leads to the following corollary, which already asserts exponential tree-like resolution size for constant clause-variable ratios at β > 2k−3 k−2 .

The Complexity of Voronoi Diagrams
We first show quadratic lower bounds on the complexity (number of non-empty regions) of order-k Voronoi diagrams that already hold in rather basic settings. Afterwards, we consider random point sets and prove a linear upper bound.

Worst-Case Lower Bounds
In this section, we show worst-case lower bounds on the number of non-empty regions of higherorder Voronoi diagrams. As already mentioned in Section 2.3, our lower bounds are based on previously known lower bounds on the number of vertices of Voronoi diagrams, in conjunction with a new theorem connecting the number of vertices with the number of regions in higher orders. This theorem relies on the fact that there are not too many different points with equal distance to a set of d + 1 sites in d-dimensional space. For the unweighted case and for p = ∞, the result in the next lemma was shown by Lê [45]. We extend it to weighted sites and p = ∞, following along the lines of Lê's proof [45] (at least for p = ∞): (i) Observe that the points with equal distance to the d + 1 sites is the set of solutions to a system of polynomial equations.
We note that this polynomial has the same form in the unweighted case [45,Equation 10], except we have the additional factors 1/ω 0 and 1/ω i . Concerning (ii), it thus suffices to note that these additional factors do not significantly increase the so-called additive complexity. We do not fully define the additive complexity here, but rather cite the properties crucial for this proof. The additive complexity L + (P ) of a polynomial P is defined to be 0 if P is a monomial. Moreover, by [45,Lemma 4], it holds that L + (P 1 + · · · + P n ) ≤ n − 1 + L + (P 1 ) + · · · + L + (P n ), L + (P m ) ≤ L + (P ), for any m ∈ N, and where all P i , P , and Q are polynomials. With this, it is easy to see that the additive complexity of the polynomial in Equation (8) is bounded by a constant only depending on d. In fact, the last bound, L + (P Q) ≤ L + (P ) + L + (Q) in conjunction with the property that constants are monomials with additive complexity 0, makes it so that the additional constant factors ω 0 and ω i do not increase the additive complexity at all. Thus, the additive complexity is bounded by 4d − 1 [45, Lemma 5].
Finally, applying [45, Proposition 3] directly yields the claim, which concludes the proof for p = ∞.
For p = ∞, we cannot use the same argument, as Equation (8) is not polynomial: s i − p involves the maximum over all coordinates. However, for each s i , there are only d possibilities to which coordinate the maximum is evaluated, leading to d d+1 combinations. For each of these combinations, we consider its own system of equations. Denote the resulting set of systems of equations with E. Clearly, every solution for the system of equations in (8) is a solution to at least one system in E. Thus, the number of solutions to (8) is bounded by the total number of solutions to systems in E. Clearly, with the same argument as above, the number of solutions to each system of equations in E is bounded by a constant only depending on d. As E contains only d d+1 systems, this bounds the number of solutions to (8) by a constant only depending on d.
With this, we can now prove the theorem establishing the connection between vertices and non-empty regions.
Theorem 6.2. Let S be a set of n weighted sites in general position in R d equipped with a p-norm. If the order-k Voronoi diagram has vertices, then the order-(k + d) Voronoi diagram has Ω( ) non-empty regions.
Proof. We first show that a vertex of the order-k Voronoi diagram is an interior point of a non-empty region of the order-(k + d) Voronoi diagram. Afterwards, we show that only a constant number of different vertices can end up in the same region.
Let p ∈ R d be a vertex of the order-k Voronoi diagram. Then p has equal weighted distance to exactly d + 1 sites (the sites are in general position). Let {s 1 , . . . , s d+1 } = A ⊆ S be these sites and let P be the ε-environment of p, i.e., a ball with sufficiently small radius ε centered at p. For a point p ∈ P , sort all sites in S by weighted distance from p . Then all sites in A appear consecutive in this order. Moreover, we obtain almost the same order of S for every p ∈ P . The only difference is that the sites of A might be reordered. Also, as p is a vertex of the order-k Voronoi diagram, at least one site from A belongs to the k sites with smallest weighted distance to p. It follows that the first k + d sites in this order completely include all sites from A. Thus, the k + d closest sites are the same for all points in the ε-environment P around p; let B be the set of these sites. It follows that B has non-empty Voronoi region in the order-(k + d) Voronoi diagram as this region has p in its interior.
It remains to show that only a constant number of vertices of the order-k Voronoi diagram can be contained in the same region of the order-(k + d) Voronoi diagram, i.e., the order-(k + d) region belonging to B includes only a constant number of order-k vertices. As stated above, every order-k vertex belongs to a subset A ⊆ B with |A| = d + 1. There are only |B| |A| ≤ k+d d+1 such subsets A, which is constant for constant k and d. Moreover, every fixed subset A of d + 1 sites is responsible for only a constant number of vertices due to Lemma 6.1. Thus, only a constant number of order-k vertices end up in the same order-k + d region, which concludes the proof.
Theorem 6.2 transfers some known lower bounds on the number of vertices of Voronoi diagrams to lower bounds on the number of non-empty regions of order-k Voronoi diagrams. In particular, we get the following corollaries. Corollary 6.3. In the worst case, the order-4 Voronoi diagram of n (unweighted) sites in 3-dimensional Euclidean space has Ω(n 2 ) non-empty regions.
Proof. In the worst case, the ordinary (order-1, unweighted) Voronoi diagram of n sites in 3-dimensional Euclidean space has Ω(n 2 ) vertices [43,56]. Applying Theorem 6.2 yields the claim. Corollary 6.4. In the worst case, the order-3 Voronoi diagram of n weighted sites in 2dimensional Euclidean space has Ω(n 2 ) non-empty regions.
Proof. In the worst case, the order-1 Voronoi diagram of n weighted sites in 2-dimensional Euclidean space has Ω(n 2 ) vertices [5]; also see Figure 2. Applying Theorem 6.2 yields the claim.

Upper Bounds for Sites with Random Positions
Let S = {s 1 , . . . , s n } ⊆ T d be n randomly positioned sites with weights w 1 , . . . , w n . In the following, we bound the complexity of the weighted order-k Voronoi diagram in terms of nonempty regions. Recall from Section 3 that the torus T d is the hypercube [0, 1] d that wraps around in every dimension in the sense that opposite sides are identified. However, the following arguments do not require this property. Thus, the exact same results hold for Voronoi diagrams in hypercubes.
For the normalized weights ω 1 , . . . , ω n , recall from Section 3, that the point p ∈ T d belongs to the Voronoi region corresponding to A ⊆ S with |A| = k if there exists a radius r such that Thus, A has non-empty order-k Voronoi region if and only if there exists such a point p. Our goal in the following is to bound the probability for its existence.
Our general approach to achieve such a bound is the following. The condition p − s i ≤ ω i r for s i ∈ A basically tells us the sites in A are either close together or that r has to be large. In contrast to that, the condition p − s i > ω i r for s i / ∈ A tells us that many sites (namely all n − k sites in S \ A) have to lie sufficiently far away from p, which is unlikely if r is large. How unlikely this is of course depends on r and thus on how close the sites in A lie together. Therefore, to follow this approach, we first condition on how close the sites in A lie together.
To formalize this, consider a size-k subset A ⊆ S and assume without loss of generality that A = {s 1 , . . . , s k }. The site in A with the lowest weight, without loss of generality s 1 , will play a special role. We define the random variable R A to be The intuition behind the definition of R A is the following. The weighted center between s 1 and s i is the point p on the line between them such that s 1 − p = ω 1 r and s i − p = ω i r for a radius r ∈ R. Then R A is the maximum value for r over i ∈ [k]. In the unweighted setting, R A is just half the maximum distance between s 1 and any other site s i . In a sense, R A describes how close the sites in A lie together. Thus, it provides a lower bound on r.
Based on R A , we slightly relax the condition on A having non-empty Voronoi region. We call A relevant if there exists a point p ∈ T d and a radius r ≥ R A such that s 1 − p ≤ ω 1 r and s i − p > ω i r for i > k. The following lemma states that being relevant is in fact a weaker condition than having non-empty order-k Voronoi region. Thus, bounding the probability that a set is relevant from above also bounds the probability for a non-empty Voronoi region from above. Lemma 6.5. A subset of k sites that has a non-empty order-k Voronoi region is relevant.
Proof. Assume A = {s 1 , . . . , s k } has a non-empty order-k Voronoi region. Then there exists a point p and a radius r such that s i − p ≤ ω i r if and only if i ≤ k. Thus, s 1 − p ≤ ω 1 r and s i − p > ω i r for i > k clearly holds, and it remains to show r ≥ R A . From s i − p ≤ ω i r for i ∈ [k] it follows that s 1 −p + s i −p ≤ ω 1 r+ω i r holds for any i ∈ [k]. Thus, by rearranging and applying the triangle inequality, we obtain r ≥ ( s 1 −p + s i −p )/(ω 1 +ω i ) ≥ s 1 −s i /(ω 1 +ω i ). This immediately yields r ≥ R A . Now we proceed to bound the probability that a set A is relevant. The following lemma bounds this probability conditioned on the random variable R A . At its core, we have to bound the probability of the event s i − p > ω i r for s i / ∈ A. For a fixed point p and a fixed radius r, this is rather easy. Thus, most of the proof is concerned with eliminating the existential quantifiers for p and r. Lemma 6.6. For constants c 1 and c 2 depending only on d and p, it holds that Proof. As before, we assume that A = {s 1 , . . . , s k } and that s 1 has minimum weight among sites in A, i.e., min s i ∈A {w i } = w 1 . By definition, A is relevant conditioned on R A , if and only if there exists a radius r ≥ R A and point p ∈ T d such that s 1 − p ≤ ω 1 r and s i − p > ω i r for i > k, i.e., formally we have The core difficulties of bounding the probability for this event are the existential quantifiers that quantify over the continuous variables r and p. In both cases, we resolve this by using an appropriate discretization, for which we then apply the union bound. We get rid of the existential quantifier for r by dividing the interval [R A , ∞), which covers the domain of r, into pieces of length at most R A . More formally, we split the event ∃r ≥ R A with the desired property into the disjoint events ∃r ∈ [jR A , (j + 1)R A ) for j ∈ N + . For a fixed j, r ≥ jR A and s i − p > ω i r implies s i − p > ω i jR A . Moreover, r ≤ (j + 1)R A and s 1 − p ≤ ω 1 r implies s 1 − p ≤ ω 1 (j + 1)R A . Note that this completely eliminates the variable r from the event, which lets us drop the existential quantifier for r. Thus, the event in Equation (10) implies Note that the new existential quantifier for j is not an issue: as j is discrete, we can simply use the union bound and sum over the probabilities we obtain for the different values of j. We will later see that this sum is dominated by the first term corresponding to j = 1.
To deal with the existential quantifier for p, assume j ∈ N + to be a fixed number. First note that s 1 − p ≤ ω 1 (j + 1)R A implies that p lies somewhat close to s 1 . We discretize the space around s 1 using a grid such that the point p is guaranteed to lie inside a grid cell. By choosing the distance between neighboring grid vertices sufficiently small, we guarantee that p lies close to a grid vertex. Then, instead of considering p itself, we deal with its closest grid vertex. To define the grid formally, let ω min = min n i=k+1 {ω i } be the minimum weight of sites not in A and let x = ω min jR A / p √ d (x will be the width of our grid cells). To simplify notation, assume that s 1 is the origin. Otherwise, we can simply translate the grid defined in the following to be centered at s 1 to obtain the same result. Let Γ = { x | ∈ Z ∧ |( − 1)x| ≤ ω 1 (j + 1)R A } be the set of all multiples of x that are not too large. We use the grid defined by the Cartesian product Γ d . Then the following three properties of Γ d are easy to verify.
(i) A point p with s 1 − p ≤ ω 1 (j + 1)R A lies in a grid cell.
(ii) The maximum distance between a point in a grid cell and its closest grid vertex is p √ dx/2 = ω min jR A /2.
(iii) Γ d has at most c 1 (ω 1 jR A /x) d = c 1 (ω 1 /ω min ) d ≤ c 1 ω d 1 vertices for constants c 1 and c 1 only depending on d and p.
Going back to the event in Equation (11), let p be a point with s 1 − p ≤ ω 1 (j + 1)R A and s i − p > ω i jR A (for all i > k). By the first inequality and Property i, p lies in a grid cell of Γ d . Let p ∈ Γ d be the grid vertex with minimum distance to p. Then, by Property ii, p − p ≤ ω min jR A /2. Thus, using the triangle inequality and s i − p > ω i jR A , we obtain It follows that the event in Equation (11) implies For this event, we can now bound the probability. First note that s i − p > ω i jR A /2 implies that the ball of radius ω i jR A /2 around p does not contain s i . By Lemma A.3, the volume of this ball intersected with [−0.5, 0.5] d is min{1, c 2 (ω i jR A ) d } for a constant c 2 depending only on d and p. As the s i are chosen independently and using that 1 − x ≤ exp(−x) for 0 ≤ x ≤ 1, we obtain We resolve the two existential quantifiers for j and p using the union bound. Recall from Property iii that the grid Γ d contains only c 1 ω d 1 vertices. Using that To conclude the proof, it remains to show that the sum over j is dominated by the first term corresponding to j = 1. For this, note that As x is positive in our case, the sum is bounded by a constant due to the convergence of the geometric series. This concludes the proof. Now that we know the probability that A ⊆ S is relevant conditioned on R A , we want to understand how R A is distributed. The following lemma gives an upper bound on its density function. Lemma 6.7. There exists a constant c depending only on k, d, and p, such that the density function f R A (x) of the random variable R A satisfies . Thus, we have to upper bound the slope of Pr [R A ≤ x]. As before, we assume that A = {s 1 , . . . , s k } and that s 1 has minimum weight among sites in A, i.e., min s i ∈A {w i } = w 1 . Recall the definition of R A in Equation (9). It follows directly that R A ≤ x if and only if . Note that this clearly holds for i = 1. For greater i, this is the case if and only if s i lies in the ball B s 1 ((ω 1 + ω i )x) of radius (ω 1 + ω i )x around s 1 .
To simplify notation, we denote this ball with B(x i ) in the following. Note that the volume vol(B(x i )) is exactly the probability for s i to lie sufficiently close to s 1 . As the positions of the different sites s i are independent, we obtain To upper bound the derivative of this, we have to upper bound the growth of vol(B(x i )) depending on x i . For sufficiently small x i , this volume is given by the volume of a ball in R d .
For larger x i , due to the fact that our ground space 7 is bounded, the growth of this volume slows down. Thus, to get an upper bound on the derivative, we can simply use the volume of a ball in R d . Thus, for appropriate constants c 1 and c 2 only depending on d and p, we obtain With this, it follows that which immediately yields the claimed bound.
By Lemma 6.6, we know the probability for a set A to be relevant conditioned on R A and by Lemma 6.7 we know how R A is distributed. Based on this, we can bound the unconditional probability that A is relevant. Lemma 6.8. Let A ⊆ S. For a constant c only depending on k, d, and p, the probability that A is relevant satisfies Proof. Let A ⊆ S and let R A be the random variable as defined before; see Equation (9). Note that 0 ≤ R A ≤ p √ d. By the law of total probability, we have Using Lemma 6.6 and Lemma 6.7, we obtain Again, this is true for the torus as well as for the Hypercube.
for constants c 1 and c 2 only depending on k, d, and p. Ignoring the factors independent of x for now, this expression has the form which lets us apply Lemma A.4 to bound the integral. We obtain As k is an integer, Γ(α) = Γ(k − 1) = (k − 2)!, which is constant. Thus, substituting α and β by its corresponding values and aggregating all constant factors into c yields which is exactly the bound we wanted to prove.
Having bound the probability that a specific subset of sites A ⊆ S of size k is relevant, we can now bound the expected total number of relevant subsets. By Lemma 6.5, this also bounds the number of non-empty Voronoi regions. Theorem 6.9. Let S be a set of n sites with minimum weight 1, total weight W , and random positions on the d-dimensional torus equipped with a p-norm, for constant d. For every fixed k, the expected number of regions of the weighted order-k Voronoi diagram of S is in O(W ). The same holds for random sites in a hypercube.
Proof. For every subset A ⊆ S with |A| = k, let X A be the indicator random variable that has value 1 if and only if A has non-empty order-k Voronoi region. Moreover, let X be the sum of these random variables. Note that E [X] is exactly the quantity, we are interested in. Using linearity of expectation, we obtain Due to Lemma 6.5, a subset A with non-empty Voronoi region is also relevant. Thus, E [X A ] ≤ Pr [A is relevant] and Lemma 6.8 yields For technical reasons, we assume c to be the maximum of 1 and the constant from Lemma 6.8. We continue by proving the following claim: A⊆S |A|=k In addition to implying the theorem, this claim specifies a constant that comes on top of c, which is crucial for the rest of the proof.
We first prove the claim for the situation, in which W is not dominated by the highest k weights. Afterwards, we deal with the other somewhat special case. More formally, let the weights w 1 , . . . , w n be sorted increasingly and consider the case that n−k i=1 w i ≥ 4 −k W , i.e., if we leave out the k largest weights, we still have a significant portion of the total weight. We can use this to estimate the denominator in Equation (12): To bound the fraction by W , observe that the binomial theorem yields as each summand on the on the right-hand side also appears on the left-hand side. This proves the claim in Equation (13) , assume for contradiction that the claim in Equation (13) does not hold for every set of n weights. Then there exists a minimum counterexample, i.e., a smallest number of n weights such that the expected number of non-empty regions exceeds 4 k 2 cW . We show that, based on this assumption, we can construct an even smaller counterexample; a contradiction. First note that n > 2k for every counterexample, as there are fewer than 4 k 2 cW subsets otherwise (recall that c ≥ 1). Now let w 1 , . . . , w n be the minimum counterexample and again assume that the weights are ordered increasingly. Moreover, fix the coordinates of the sites s 1 , . . . , s n and consider two order-k Voronoi diagrams: one on the set of all sites S = {s 1 , . . . , s n }, and the one on all but the k heaviest sites S = {s 1 , . . . , s n−k } (note that this is well defined as n > 2k). In the following, we call the former Voronoi diagram V and the latter V . We define a mapping from the non-empty regions of V to non-empty regions of V . Let A ⊆ {s 1 , . . . , s n } be a subset of size k with non-empty region in V and let p be an arbitrary point in this region. Moreover, let A be the set of sites corresponding to the region of V containing p. Then we map the region of A to the region of A . Note that A and A share all sites that have not been deleted: A ∩ A = A ∩ S . Thus, any site A that is mapped to A must satisfy A ⊆ A ∪ (S \ S ). This limits the number of different regions in V that are mapped to the same region of V to at most 4 k . Thus, the number of regions in V is at least 4 −k times the number of regions in V. As this holds for arbitrary coordinates, this also holds for the expected number of non-empty regions when choosing random coordinates.
As we assumed w 1 , . . . , w n to be a counterexample for Equation (13), the expected number of regions with these weights is more than 4 k 2 cW . Thus, by the above argument, the expected number of regions for the weights w 1 , . . . , w n−k is at least 4 −k · 4 k 2 cW . As we consider the case n−k i=1 w i < 4 −k W , we can substitute W to obtain that the weights w 1 , . . . , w n−k lead to at least 4 k 2 c n−k i=1 w i non-empty regions in expectation. Thus, the weights w 1 , . . . , w n−k also form a counterexample for the claim in Equation (13), which is a contradiction to the assumption that w 1 , . . . , w n is the minimum counterexample and thus to the assumption that there is a counterexample at all.

Geometric SAT with Non-Zero Temperature
In the case with temperature T = 0, we used the fact that every clause contains the k variables with smallest weighted distance; recall Section 4.2. This is no longer true for higher temperatures: for T > 0, a clause can, in principle, contain any variable. However, the probability to contain a variable that is far away is rather small. In the remainder of this section, we show that a constant fraction of clauses actually behave just like in the T = 0 case, i.e., they contain the k closest variables. With this, we can then apply the argument outlined in Section 4.2.

Expected Number of Nice Clauses
Recall that a clause c is generated by drawing k variables without repetition with probabilities proportional to the connection weights. We call c nice if the ith variable drawn for c has the ith highest connection weight with c, i.e., c does not only contain the k variables with highest connection weight but they are drawn in descending order. This is a slightly stronger property than just requiring c to contain the k variables with lowest weighted distance.
Letx be the connection weight of a variable v that has rather high connection weight with c. To show that the probability for v ∈ c is reasonably high, we prove thatx is large compared to the sum of connection weights over all variables with smaller weight. The following lemma bounds this sum for a givenx. We use the Iverson bracket to exclude the variables with weight larger thanx from the sum, i.e., X(c, v) ≤x evaluates to 1 if X(c, v) ≤x and to 0 otherwise. Lemma 7.1. Let c be a clause at any position and let V be a set of n weighted variables with random positions in T d . For T < 1 andx ∈ Ω(W 1/T ), the expected sum of connection weights smaller thanx is in O(x), i.e., Proof. Using linearity of expectation, the term in the lemma's statement equals to the sum over the expectations E X(c, v) · X(c, v) ≤x . To bound this expectation, we consider the three events X(c, v) ≤ (2 d w v ) 1/T , (2 d w v ) 1/T < X(c, v) <x, andx ≤ X(c, v). Note that X(c, v) <x is 0 in the last event and 1 in the former two. Thus, we obtain We bound the first term from above by assuming X(c, v) = (2 d w v ) 1/T whenever X(c, v) ≤ (2 d w v ) 1/T . Moreover, using the CDF for X(c, v) (2) yields For the second term, we have to integrate over the probability density function (PDF) f X (x) of the connection weights X(c, v), which is the derivative of F X (x) (2). Thus, f X (x) = T Π d,p w v x −T −1 for x ≥ (2 d w v ) 1/T , and we obtain For T < 1, this evaluates to Putting these bounds together yields As,x ∈ Ω(W 1/T ), we have W 1/T ∈ O(x), which handles the first term. The second term is also in O(x), asx ∈ Ω(W 1/T ) implies W ∈ O(x T ). Thus, this yields the claimed bound of O(x).
This lets us show that each clause is nice with constant probability. The only assumption we need for this is the fact that no single weight is too large, i.e., every weight w i has to be asymptotically smaller than the total weight W .
Theorem 7.2. Let Φ be a random formula drawn from the weighted geometric model with ground space T d equipped with a p-norm, with temperature T < 1, and with w v /W ∈ o(1) for v ∈ V . Let c be a clause of Φ. Then c is nice with probability Ω(1).
Proof. We prove two things. First, we show that, with probability Ω(1), there are at least k variables sufficiently close to c that they have connection weight Ω(W 1/T ). Second, we use Lemma 7.1 to show that the k variables with highest connection weight are chosen for c with constant probability (in descending order).
For the first part, we show that there is a constant a such that, with constant probability, at least k variables have connection weight at least aW 1/T . For a fixed variable v, we can use the CDF of X(c, v) (Equation (2)) to obtain Note that this is a valid probability, as w v /W ∈ o(1) implies that it is below 1. For the above choice of a, we obtain that the expected number of variables with connection weight at least aW 1/T is 2k. As the connection weights for the different variables are independent, we can apply the Chernoff-Hoeffding bound in Theorem A.7 to obtain that at least k variables have connection weight aW 1/T with constant probability. For the second part of the proof, letx be the connection weight of the kth closest variable. With the argument above, we can assumex ∈ Ω(W 1/T ) with constant probability, which lets us apply Lemma 7.1. To do so, consider the experiment of drawing the first variable for our clause c. Let v be the variable that maximizes the connection weight X(c, v). The probability of drawing v equals X(c, v) divided by the sum of all connection weights. By Lemma 7.1, the sum of all connection weights smaller thanx is in O(x). Thus, the sum of all connection weights is in O (X(c, v)), which implies that v is chosen with constant probability. As we draw variables without repetition, the exact same argument applies for the second closest variable and so on. Thus, the probability that c contains the k closest variables drawn in order of descending connection weights is at least a constant, if there are k sufficiently close variables. As the latter holds with constant probability, c is nice with constant probability.
By the linearity of expectation, this immediately yields the following bound on the expected number of nice clauses. Corollary 7.3. Let Φ be a random formula with m clauses drawn from the weighted geometric model with ground space T d equipped with a p-norm, with temperature T < 1, and with w v /W ∈ o(1) for v ∈ V . The expected number of nice clauses in Φ is Θ(m).

Concentration of Nice Clauses
We show that the number of nice clauses is concentrated around its expectation, i.e., with high probability, a constant fraction of clauses is nice. Our main tool for this will be the method of typical bounded differences [61]; see Section A.6.2. To this end, we consider several random variables, e.g., the coordinates of clauses and variables, that together determine the whole process of generating a random formula. The number of nice clauses is then a function f of these random variables and its expectation is Θ(m), due to Corollary 7.3. Roughly speaking, the method of bounded differences then states that the probability that f deviates too much from its expectation is low if changing a single random variable only slightly changes f .

The Random Variables
So far, we viewed the generation of a random formula as a two-step process: first, sample coordinates for the variables and clauses; second, sample the variables contained in each clause based on their distances. The first step can be easily expressed via random variables. Let V 1 , . . . , V n and C 1 , . . . C m be the coordinates 8 of the n variables and m clauses, respectively. Though the second step heavily depends on the distances determined by the first, we can determine all random choices in advance. For all i ∈ [m] and j ∈ [k], let X j i be a random variable uniformly distributed in [0, 1). The variable X j i determines the jth variable of the ith clause c i in the following way. We partition the interval [0, 1) such that each variable v not already chosen for c i corresponds to a subinterval of length proportional to the connection weight X(c i , v). We order these subintervals by length such that the largest interval comes first. The jth variable of c i is then the variable whose interval contains X j i . Note that this samples k different variables for each clause, with probabilities proportional to the connection weights X(c i , v). Note further that the whole generation process of a random formula is determined by evaluating the independent random variables V 1 , . . . , V n , C 1 , . . . , C m , X 1 1 , . . . , X k m . To formalize the concept of nice clauses in this context, we require some more notation. For i ∈ [m], let V i be the sequence of all variables ordered decreasingly by connection weight with the clause c i . Moreover, let V i [a, b] denote the subsequence from the ath to the bth variable in this sequence, including the boundaries. To simplify notation, we abbreviate the unique element in V i [a, a] with V i [a]. Recall that clause c i is nice if, for each of k steps, we choose the variable with highest connection weight that has not been chosen before. With respect to the random variables, this happens if, for each j ∈ [k], X j i is smaller than the connection weight of V i [j] divided by the sum of all connection weights of the remaining variables V i [j, n]. We thus define the indicator variable which is 1 if and only if the ith clause is nice. With this, we can define the number of nice clauses as f (V 1 , . . . , V n , C 1 , . . . , C m , X 1 1 , . . . , X k m ) = i∈[m] N i .

Bounding the Effect on the Number of Nice Clauses
To apply the method of bounded differences (Theorem A.9 or the more specific Corollary A.10), we have to bound the effect of changing the value of only one of these random variables on f . For the variables C 1 , . . . , C m , this is easy: Changing C i moves the position of the clause c i , which only makes a difference for c i . Thus, the number of nice clauses changes by at most 1. Similarly, changing X j i only impacts the clause c i , which implies that it changes the number of nice clauses by at most 1.
For the variables V 1 , . . . , V n , one can actually construct situations in which changing only a single position drops f from m to 0. There are basically two situations in which this can happen. First, if a single variable is close to many clauses, changing its position potentially impacts many clauses. Second, if many inequalities in Equation (16) are rather tight, then moving a single variable slightly closer to many clauses can increase the denominator on the right hand side by enough to change N i for many clauses. We exclude both situations by defining unlikely bad events. By assuming these bad events do not happen, we can bound the effect of moving a single variable v by The following bound gives a simpler estimate for δ v that will be useful later. Proof. We ignore logarithmic factors and show that δ v / √ w v n converges polynomially to 0 for n → ∞. As logarithmic factors grow slower than any polynomial, this proves the claim. We get Rearranging the exponents yields , and Thus δ v / √ w v n = (w v /n) c for a positive constant c. As w v ∈ O(n 1−ε ), this yields the claim.
The following lemma states that, with overwhelming probability, no point (and therefore no variable) is too close to too many clauses. This eliminates the first problematic situation (and will also help with the second). Note that this statement only assumes random clause positions and holds for arbitrary variable positions, i.e., when moving a variable, we can assume that it holds before and after the movement. Proof. As there are uncountably many points p, it is hard to argue about them directly. Thus, we first reduce the statement to one about finitely many positions, namely the positions of the clauses. Then it remains to show the statement for these positions.
Consider a fixed point p. As B p (r) has diameter 2r, the pair-wise distance between clauses in B p (r) is at most 2r. Thus, if there exists a point p such that B p (r) contains too many clauses, then there exists a clause that has too many other clauses at distance at most 2r. Thus, it suffices to show that for every clause c ∈ C, the number of clauses of distance at most 2r to c is in O(δ v ).
Let c 0 be a fixed clause (we later apply the union bound over all clauses). We want to bound the probability for another clause c to be closer than 2r to c 0 . For this, we use the CDF of the distance in Equation (1). Note that the restriction of Equation (1) to the interval [0, 0.5] is not an issue here, as w v ∈ O(n 1−ε ) implies r ∈ o(1) and thus 2r ≤ 0.5. Thus, we obtain As there are m ∈ O(n) clauses, the expected number of clauses with distance at most 2r to c 0 is which is already the claimed bound of O(δ v ). As 0 < T < 1, this upper bound grows polynomially in n. Thus, by the Chernoff-Hoeffding bound in Corollary A.8, it holds asymptotically with overwhelming probability. Applying the union bound over all O(n) clauses yields the claim.
The above lemma is stated in terms of the distances. In the following it will be useful to think of it in terms of connection weights instead. The following lemma translates the radius r in Lemma 7.5 to the corresponding connection weight between a clause and a variable at distance r. Lemma 7.6. Let v ∈ V be a variable and let c ∈ C be a clause with distance Proof. Using the definition of the connection weight and inserting the above distance, we obtain Combining Lemma 7.5 and Lemma 7.6, we obtain that, for arbitrary variable positions (and random clause positions), no variable has a high connection weight to too many clauses, as summarized by the following corollary.
Corollary 7.7. Let m ∈ O(n), 0 < T < 1, and w v ∈ O(n 1−ε ) for every v ∈ V and arbitrary constant ε > 0. With overwhelming probability, for every variable v and every possible position of v, the number of clauses with connection weight at least w For the second problematic situation mentioned above, consider for a clause c i the k inequalities in Equation (16). We call c i v-critical if for one of these inequalities the difference between the left and right hand side is at most δ v /n. In the following lemma, we first bound the number of critical clauses. Afterwards, we show that the concept of critical clauses works as intended in the sense that moving the variable v does only change the niceness status of v-critical clauses. Proof. A clause c i can only be v-critical if one of the random variables X j i for j ∈ [k] differs by at most δ v /n to the right hand side of the inequality in Equation (16). The probability for this to happen for a single X j i is 2δ v /n. As k is constant, c i is v-critical with probability O(δ v /n). Thus, as m ∈ O(n), the expected number of v-critical clauses is in O(δ v ). As the event of being v-critical is independent for the different clauses, and as this bound is polynomial in n for T > 0 (see Equation (17)), the Chernoff-Hoeffding bound in Corollary A.8 yields the claim.
To prove that the movement of a single variable does not change the niceness status of too many clauses, we argue along the following lines. Let v be the variable we move and consider a clause c. If, before or after the movement, v is so close to c that we get a very high connection weight X(c, v), we basically give up on c and assume that c changes its status (from being nice to not being nice or the other way round). By Corollary 7.7 this only happens for at most O(δ v ) clauses. Similarly, if c is v-critical, we also give up on c, which happens for at most O(δ v ) clauses by Lemma 7.8. Then it remains to show that in all other cases (i.e., when X(c, v) is low before and after the movement and c is not v-critical), the status of c remains unchanged. This is done as follows. As c is not v-critical, the difference between the right and left hand side of the inequality in Equation (16) is somewhat high. Thus, if moving v does not change the right hand side by too much, then c keeps its niceness status. To show this, we can use the fact that X(c, v) is low before and after the movement and thus it cannot change by too much. This change of X(c, v) has to be considered relative to the other connection weights, i.e., changing X(c, v) has less impact if there are other variables with higher connection weight. The following lemma establishes that these other variables with higher connection weight indeed exist. Lemma 7.9. Let w v ∈ O(n 1−ε ) for every v ∈ V and arbitrary constant ε > 0. With overwhelming probability every clause has k variables with connection weight at least Proof. Let x 0 = W 1 T log − 2 T n be the above connection weight and let c be a clause with fixed position. For every variable v, the probability for X(c, v) ≥ x 0 is Π d,p w v x −T 0 = Π d,p w v W −1 log 2 n by Equation (2). Note that we can apply Equation (2) as x 0 ≥ (2 d w v ) 1/T due to the condition w v ∈ O(n 1−ε ) and the fact that W ≥ n. Summing this over all variables yields that the expected number of variables with connection weight at least x 0 is Π d,p log 2 n. By Corollary A.8, c has Ω(log 2 n) variables with connection weight at least x 0 with overwhelming probability. Applying the union bound over all clauses and the fact that k is constant while log 2 n grows with n yields the claim. Now we are ready to bound the effect of moving just a single variable on the number of nice clauses. Every remaining clause c is not v-critical and we have X(c, v) ≤ w T (1+T ) n before and after the movement. In the following we show that a clause c with these two properties is nice after the movement if and only if it is nice before the movement.
We first observe that v does not belong to the k variables closest to c due to Lemma 7.9: With overwhelming probability, there are k variables with connection weight at least W 1 T log − 2 T n, which is asymptotically bigger than X(c, v) as w v ∈ O(n 1−ε ).
Thus, in the right hand side of the inequality in Equation (16), the connection weight X(c, v) only appears in the denominator. To show that the right hand side does not change by too much, let x be the numerator, let y be the denominator before the movement, and let y be the denominator after the movement. Note that |y − y| is exactly the change in X(c, v) caused by the movement of v. With this, the right hand side of the inequality in Equation (16) changes by Note that x (the numerator) is the connection weight of one variable whose connection weight also appears in the sum of the denominator (after and before the movement). Thus, x y ≤ 1 and the above change is upper bounded by |y −y| y . Note that the upper bound on X(c, v) holds before and after the movement and thus X(c, v) can only change by less than this upper bound, i.e., |y − y| < w Moreover, y is the sum of multiple connection weights including the weight of one of the k closest variables. Thus, by Lemma 7.9 and the fact that W ≥ n we can assume that y ≥ n 1 T log − 2 T n. Putting this together yields As c is not v-critical, the difference between the left and right side of the inequality in Equation (16) is at least δv n before the movement. Thus, as the movement can change the right hand side by only less than δv n , the clause c is nice after the movement if and only if it was nice before.
With this we are ready to prove concentration using the method of typical bounded differences.
Theorem 7.11. Let Φ be a random formula with n variables and m ∈ Θ(n) clauses drawn from the weighted geometric model with ground space T d equipped with a p-norm, with temperature 0 < T < 1, with W ∈ O(n), and with w v ∈ O(n 1−ε ) for every v ∈ V and arbitrary constant ε > 0. With high probability, Θ(m) clauses are nice.
Proof. We want to apply Corollary A.10. As defined in Section 7.2.1, the random variables are the variable positions V 1 , . . . , V n , the clause positions C 1 , . . . , C m , and the coin flips X 1 1 , . . . , X k m , and the function f is the number of nice clauses. For N = n+m+km note that |f (X)| ≤ m ≤ N .
For the nice event Γ we assume that the statement from Lemma 7.10 holds. Due to Lemma 7.10, the probability for this is Pr [Γ] ≥ 1 − N −c for any constant c and sufficiently large N . Thus, when choosing c ≥ 3, we satisfy the condition |f (X)| ≤ N c−2 of Corollary A. 10.
Now we have to bound the change of f when changing only one of the random variables, assuming we start with an event in Γ, i.e., we have to determine the ∆ i from Corollary A.10. As mentioned before, changing a clause position C i or one of the X j i impacts only one clause and thus changes f by at most 1. Moreover, as we start with a configuration satisfying Lemma 7.10, f changes by only O(δ v ) for variable v. Thus, for the sum in Corollary A.10 we obtain Due to Lemma 7.4, we have δ v ∈ O( √ w v n/ log n). Thus, the above sum can be bounded by As E [f ] ∈ Θ(m) = Θ(n), this is exactly the bound required by Corollary A.10 and thus the number of nice clauses is in Θ(m) with high probability.

Putting Things Together
Now we are ready to prove our main theorem for the geometric model.
Theorem 7.12. Let Φ be a formula with n variables and m ∈ Θ(n) clauses drawn from the weighted geometric model with ground space T d equipped with a p-norm, temperature T < 1, W ∈ O(n), and w v ∈ O(n 1−ε ) for every v ∈ V and any constant ε > 0. Then, Φ contains a. a. s. an unsatisfiable subformula of constant size, which can be found in O(n log n) time.
Proof. Let m be the number of clauses in Φ that consist of the k variables with minimum weighted distance. By Theorem 7.11 we have m ∈ Θ(m) = Θ(n). In the following, we consider only these clauses. Consider the weighted order-k Voronoi diagram of the n variables and let n be the number of non-empty regions. By Theorem 6.9 and due to W ∈ O(n), we have E n ∈ O(n). Moreover, it follows from Markov's inequality that n ≤ n log n holds asymptotically almost surely: Pr n ≥ n log n ≤ E n n log n ∈ O 1 log n .
Now, determining the k variables of a clause c is equivalent to observing which region of the order-k Voronoi diagram contains c, or more precisely, which k variables define this region. Thus, choosing random positions for the clauses is like throwing m balls into n (non-uniform) bins. Thus, if m ∈ Ω(n / polylog n ), we can apply Corollary A.5. With the above bounds, which hold asymptotically almost surely, it is not hard to see that this condition in fact holds: If n ≤ n, it clearly holds as m ∈ Ω(n). Otherwise, we have n ≤ n log n ≤ n log n , which implies n ≥ n / log n , and thus m ∈ Ω(n / log n ).
Applying Corollary A.5 tells us that, asymptotically almost surely, there is a bin with a superconstant number of balls. In other words, there is a superconstant number of clauses that share the same set of k variables. For sufficiently large n, this is bigger than 2 k , which implies an unsatisfiable subformula consisting of only 2 k clauses. Clearly, it can be found in O(n log n) time by sorting the clauses lexicographically with respect to the contained variables.
Proof. Since j −1/(β−1) is monotonically decreasing, it holds that . This proves all statements of the lemma.

A.2 CDF of Connection Weights in the Geometric Model
The CDF F X (x)) of the connection weights X(c, v) in the geometric SAT model satisfies the following lemma.
Proof. Inserting the definition of the connection weight and rearranging slightly yields As c and v are two random points, we can use the CDF for the distances between random points in Equation (1) to obtain , which concludes the proof.

A.3 Volume of Balls in a Hypercube
We are regularly concerned with the asymptotic behavior of a ball's volume depending on its radius. The following lemma helps us to deal with the edge case, where the ball stretches beyond the boundary of our ground space.
Lemma A.3. Let H be a d-dimensional unit-hypercube in R d equipped with a p-norm. There exists a constant c > 0 such that, for every p ∈ H and r > 0, the intersection of H with the ball B p (r) of radius r around p has volume at least min{1, cr d }.
Proof. In the following, we assume H = [−0.5, 0.5] d (rather than [0, 1] d ), as it makes the proof more convenient. If r is sufficiently small, then B p (r) is completely contained in H. Thus, in this case, the claim follows from the fact that the volume of a ball with radius r in d-dimensional space is proportional to r d . Thus, we have to prove that the parts of B p (r) outside of H are asymptotically not relevant. Let p 1 , . . . , p d be the coordinates of p and assume without loss of generality that p lies in the all-negative orthant, i.e., p i ≤ 0 for i ∈ [d]. We proof the claim by defining a box B with the following three properties. First, the box B has volume proportional to r d . Second, B is a

A.5 Balls Into Heterogeneous Bins
Consider throwing m balls into n uniform bins, i.e., for each ball we draw one of the n bins uniformly at random and place the ball into the drawn bin. The maximum load L is the random variable that describes the maximum number of balls that are together in the same bin. From the analysis by Raab and Steger [55, Theorem 1], we immediately get the following corollary.
Corollary A.5 ([55], Theorem 1). Throw m balls into n uniform bins and let L be the maximum load. If m ∈ Ω( n polylog n ), then L ∈ Ω( log n log log n ) asymptotically almost surely. Now assume we have non-uniform bins, i.e., the probability for each ball to end up in the ith bin is p i with i p i = 1. Intuitively, Corollary A.5 should still hold in this setting, as increasing the probability of some bins only makes it more likely that a bin gets many balls. Making this argument formal yields the following theorem.
Theorem A.6. Corollary A.5 also holds for the non-uniform bins.
Proof. Let B = [n] be the set of all bins and let B be the subset of bins with probability at least 1/(2n). These are the bins whose probability either increased, or decreased by a factor of at most 2. Without loss of generality, let B = [n ]. Note that the probability for a ball to land in a bin of B is at least a constant, as every bin not in B has probability at most 1/(2n). Thus, by the Chernoff-Hoeffding bound in Corollary A.8, a constant fraction of the balls end up in a bin of B with high probability. We make a case distinction on how large n is.
First, assume n ≤ m/ log n. Thus, with high probability, we end up with Θ(m) balls in at most m/ log n bins, which means that at least one bin contains Ω(log n) balls. Thus, clearly L ∈ Ω(log n/ log log n).
Second, assume n > m/ log n. Recall that each bin in B has probability at least 1/(2n). We consider the alternative experiment where, for every ball, each bin in B has probability exactly 1/(2n) to get the ball. Balls not landing in B are discarded. Let L denote the maximum number of balls that share a bin in B . Clearly, we can couple the two experiments such that L ≥ L holds in every outcome. It remains to show that L ∈ Ω(log n/ log log n). For this, let m be the number of balls ending up in B . Note that m is a random variable. However, if we condition on m , then we are back to the normal homogeneous balls into bins, except that we throw m balls into n bins. If we show that m ∈ Ω(n / polylog n ), then Corollary A.5 tells us that L ∈ Ω(log n / log log n ). First note that this is sufficient for our purpose: as n > m/ log n and m ∈ Ω(n/ polylog n), we get log n log log n > log m − log log n log(log m − log log n) ∈ Ω log n − log polylog n − log log n log(log n − log polylog n − log log n) ⊆ Ω log n log log n .
It remains to show that m ∈ Ω(n / polylog n ) so that we can actually apply Corollary A.5. To do so, recall that B has n bins, each with probability 1/(2n). Thus, the probability that a single ball lands in B is n /(2n), which shows that m is mn /(2n) in expectation. As n is almost m (up to logarithmic factors) and m is almost n, this expectation is almost linear in n. Thus, by the Chernoff-Hoeffding bound in Corollary A.8, we can assume that m ∈ Θ mn n holds with high probability. Using that n > m/ log n and m ∈ Ω(n/ polylog n), we obtain mn n > m 2 n log n ∈ Ω n polylog n .
A.6.2 Method of Typical Bounded Differences Theorem A.9 (Theorem 2 in [61]). Let X = (X 1 , . . . , X N ) be a family of independent random variables with X k taking values in Λ k and let Λ = j∈[N ] Λ j . Let Γ ⊆ Λ be an event and assume that the function f : Λ → R satisfies the following typical Lipschitz condition.
(TL) There are numbers (c k ) k∈ [N ] and (d k ) k∈ [N ] with c k ≤ d k such that whenever x,x ∈ Λ differ only in the kth coordinate, we have We derive the following corollary from this, which is more convenient for our purpose and uses a notation more compatible with the rest of the paper.
Corollary A. 10. Let X = (X 1 , . . . , X N ) ∈ Λ be a family of independent random variables and let Γ ⊆ Λ be an event with Pr [Γ] ≥ 1 − N −c . Moreover, let f : Λ → R with |f (X)| ≤ N c−2 and let (∆ i ) i∈ [N ] ∈ Ω(1) be numbers such that for any two x ∈ Γ andx ∈ Λ that differ only in the ith coordinate, we have |f Proof. We want to apply Theorem A.9. First note that |f (X)| ≤ N c−1 implies that f satisfies the typical Lipschitz condition when setting c i = ∆ i and d i = 2N c−2 for every i. We set γ i in Theorem A.9 to γ i = 1/d i yielding e i ≤ 1. Thus, we get the event B with such that Pr f (X) ≥ E [f ] + t and ¬B ≤ exp − t 2 2 i∈[N ] (∆ i + e i ) 2 .
As ∆ i ∈ Ω(1) and e i ≤ 1, we get that the sum in the denominator is up to constants equal to