Multivariate boundary regression models

In this work, we consider a nonparametric regression model with one-sided errors, multivariate covariates and regression function in a general H\"older class. We work under the assumption of regularly varying independent and identically distributed errors that are also independent of the design points. Following Drees, Neumeyer and Selk (2019), we estimate the regression function, i.e, the upper boundary curve, via minimization of the local integral of a polynomial approximation lying above the data points. The main purpose of this paper is to show the uniform consistency and to provide the rates of convergence of such estimators for both multivariate random covariates and multivariate deterministic design points. To demonstrate the performance of the estimators, the small sample behavior is investigated in a simulation study in dimension two and three.


Introduction
We consider nonparametric regression models with one-sided errors that take the general form where Y i is the response variable, X i is the covariate, g is the unknown regression function corresponding to the upper boundary curve and ε i is a nonpositive random error term. The statistical issue of such a model lies hence on the frontier estimation, in other words, on the estimation of g based on the observations (X i , Y i ), i = 1, . . . , n where n is the sample size of the available data.
Models described in Equation (1.1) are also called boundary regression models (BRM) and have received an increasing attention in the last past years. BRM are closely related to production frontier models (PFM). Both share the objective of the estimation of the frontier -the boundary curve -and contribute to the same applications. PFM appeared first in the field of data envelopment analysis (DEA). Introduced in 1950's with the seminal contribution of Farrell (1957), DEA answers the need of developing nonparametric methods to evaluate productivity and to assess efficiency of a system. A production unit is technically efficient if it produces the maximum output which is technically feasible for given inputs, or uses minimal inputs for the production of a given level of output. Relaxing the fundamental hypothesis of convex hull body of the data in DEA models, Deprins, Simar and Tulkens (1984) later introduced free disposal hull (FDH) models; see also Lovell et al. (1994) for further developments. DEA and its extensions are now recognized as powerful quantitative, analytical tools for measuring and evaluating the performance of a system and have countless applications, for instance in social sciences, health-care evaluation systems, banking sectors, supply chain management, energy system assessment or when analysing the performance of public services like hospitals and education. We refer to the books Cooper, Seiford and Zhu (2011) and Ramanathan (2003) for an comprehensive treatment of such methods and an exhaustive development of the applications. On the other hand, stochastic frontier analysis (SFA) originally formulated independently by Aigner, Lovell and Schmidt (1977) and Meeusen and van Den Broeck (1977) offers an interesting alternative with parametric estimations of the frontier; see also the books of Kumbhakar and Lovell (2003) and Cornwell and Schmidt (2008) for more recent references.
In contrast to the aforementioned methods, we concentrate in this paper on an alternative approach that consists in approximating the regression function g locally by a polynomial lying above the data points. Polynomial estimators in frontier estimation have been widely studied in the literature and are employed in several works, for instance in Hall, Park and Stern (1998), Hall and Park (2004), Girard, Iouditski and Nazin (2005), Knight (2001) and in Hall and Van Keilegom (2009) (local-linear estimator); see also the literature using piecewise polynomials (e.g. Korostelëv and Tsybakov (1993), Korostelëv, Silmar and Tsybakov (1995) and Härdle, Park and Tsybakov (1995)).
While consistency of estimators based on local constant approximations (local polynomial estimator of order 0) of the boundary curve may be attained under mild assumptions of the boundary curve -such as continuity of g in Neumeyer, Selk and Tillier (2019) -rates of convergence need stronger regularity assumption on g. A classical framework is to assume that g belongs to some Hölder class of order β. This assumption essentially means that g is β-times differentiable and moreover a Lipschitz condition holds for the β derivative. Further assumptions are also needed on the error distribution such as regular variation, meaning that the distribution of the errors has polynomial tails. These two assumptions are common in the context of boundary models and are met in several papers, see for instance Meister and Reiß (2013), Jirak, Meister and Reiß (2014), Müller and Wefelmeyer (2010), Drees, Neumeyer and Selk (2019), Girard et al. (2013), Härdle et al. (1995) and Hall and Van Dellegom (2009); see also the book of de Haan and Ferriera (2006) for the applications and the motivation of regularly varying errors.
In the context of nonparametric regression models with an Hölder boundary curve and regularly varying nonpositive errors, Jirak et al. (2014) suggested an adaptive estimator for the boundary curve g using a local polynomial estimation based on local extreme value statistics. An adaptative procedure -a fully data-driven estimation procedure -is constructed by applying a nested version of Lepski method which shows no loss in the convergence rates with respect to the general L q -risk. Drees et al. (2019) estimated the regression function similarly to Jirak et al. (2014) via minimization of the local integral of a polynomial approximation in the context of equidistant design points. By showing uniform rates of convergence for the regression estimator they proposed distribution-free tests of error distributions (goodness-offit) where the test statistics are based on empirical processes of residuals. They also discussed asymptotically distribution-free hypotheses tests for independence of the error distribution from the points of measurement and for monotonicity of the boundary function as well.
More often than not, fixed (deterministic) covariates and more specifically equidistant fixed design points are considered. In contrast in the paper at hand, we investigate the two cases of random and deterministic covariates which are both of particular interest. Deterministic covariates are often used in real-life applications when time is involved in the data set. For instance Jirak et al. (2014) studied the monthly sunspot observations and the annual best running times of 1500 meters; see also the plentiful applications in energy and environmental performance analysis provided in the review of Mardani et al. (2018). Besides, deterministic design is met accross a number of papers in regression models, see for instance Brown and Low (1996), Meister and Reiß (2013) and the references within. The case of random covariates is obviously the most relevant and appears in essence in many applications in boundary models, among other, in non-life insurance mathematics and financial risk modelling when analyzing optimality of portfolios and efficiency strategies; see also the extensive literature on modern portoflio theory (e.g. the recent books of Francis and Kim (2013) and Goetzmann et al. (2014)).
In opposition to kernel estimation methods, all the literature cited above concerning the asymptotic study of local polynomial approximations of the boundary curve deal with univariate covariates and as far as we are concerned the topic of providing rates of convergence of such estimators based on a multivariate sample is new and challenging. Beyond the theoretical interest this generalisation raises, we think that extending to the multivariate setup is also interesting from an application point of view. In light of this motivation, the main aim of this paper is to extend the results of Drees et al. (2019) to the multivariate case with arbitrary covariate. That is, under the main assumptions of regular variation of the nonpositive errors and β-Hölder class of the boundary curve, we aim at showing uniform consistency and providing rates of convergence of an estimator based on the minimisation of the local integral of a polynomial lying above the data points for both multivariate random covariates and multivariate deterministic design points.
The remaining part of the manuscript is organized as follows. In section 2 the model is explained, while in section 3 the estimation procedure is described. In section 4 we show uniform consistency and provide rates of convergence of the estimator of the regression function for both random and deterministic multivariate covariates. Section 5 is dedicated to a simulation study to investigate the small sample behavior in dimension two and three. Proofs are postponed to the appendix.

Notation and shortcuts
Notation: B stands for Borelian sets; · and · are the floor and ceiling functions respectively; x means the largest natural number that is strictly smaller than x;F = 1−F denotes the survival function associated to a cdf F ; · ∞ denotes the supremum norm; X 1 d = X 2 means that two random variables X 1 , X 2 share the same distribution; a n ∼ n→∞ b n holds if lim n→∞ a n /b n = 1 for two sequences (a n ) n≥1 and (b n ) n≥1 of nonnegative numbers. Generally, vectors are highlighted in bold writing. For vectors x ∈ R q let x (r) denote the r-th component of x for r = 1, . . . , q. By x we mean the maximum norm that is x := max r∈{1,...,q} |x (r) |.
Shortcuts: cdf stands for cumulative distribution function and iid for independent and identically distributed.

The model
We focus on nonparametric boundary regression models with univariate observations Y i and multivariate covariates X i of the form where the errors ε i are iid non-positive univariate random variables that are independent of the X i and n stands for the sample size. The unknown regression function g thus corresponds to the upper boundary curve. Independence of errors ε i and covariates X i is a typical assumption in regression models and is met among others in Müller and Wefelmeyer (2010), Meister and Reiß (2013), Reiß and Selk (2017) and Drees et al. (2019). Such assumption is crucial and often needed in the framework of frontier models when using statistical methods such as bootstrap procedures; see Wilson (2003) and Simar and Wilson (1998). In the case this hypothesis does not hold, one may consider parametric transformations e.g. exponential, Box-Cox (Box and Cox (1964)), or sinh-arcsinh transformations (Jones and Pewsey (2009)) of the response variable Y i in order to retreive the framework of independence between errors and covariates; see Neumeyer et al. (2019) for the univariate case and also Linton, Sperlich and Van Keilegom (2008).
In the paper at hand, we investigate both cases of random and fixed design points. The former means that the covariates X i are q-dimensional random variables while the latter assumes that X i are deterministically spread over [0, 1] q . Without loss of generality, we work for ease of reading with design points lying on [0, 1] q but results extend effortless to any cartesian product of one-dimensional closed intervals.

The random design case
In the random design case we consider the nonparametric boundary regression model with independent and identically distributed observations (X i , Y i ), i = 1, . . . , n defined by 2) corresponding to model (2.1), where the design points X i are multivariate random covariates distributed on [0, 1] q that fulfill assumption (K4). The errors ε i are assumed to be iid nonpositive random variables that satisfy (K2). The precise statements of assumptions (K2) and (K4) are given in Section 3.1.

The fixed design case
In the fixed design case we consider a triangular array of independent observations Y i,n and deterministic design points x i,n in [0, 1] q for i = 1, . . . , n. Thus we conduct the nonparametric boundary regression model 3) corresponding to model (2.1) with errors ε i,n that are univariate non-positive independent and identically distributed random variables that satisfy assumption (K2).
We allow for fixed equidistant as well as fixed nonequidistant design. In the first case we consider x 1,n , . . . , x n,n that form a grid where we assume that n 1 q is an integer. Note that when q = 1 the univariate equidistant design simplifies to x i,n = i/n for i = 1, . . . , n. In the second case the points are not necessarily equidistant, but we assume that they are even enough distributed on [0, 1] q , see Assumption (K4') below.

Estimating the regression function
To estimate the boundary curve, we use an estimator that locally approximates the regression function g by a polynomial lying above the data points; see Theorem 2.2 in Drees, Neumeyer and Selk (2019) for further details in the univariate case.

The random design case
For x ∈ [0, 1] q , we consider the regression function estimatorĝ defined aŝ where p is a multivariate polynomial of total degree β * ∈ N 0 and minimizes the local integral Remark 3.1 The polynomial p is the solution to the linear optimization problem where p is represented by its vector of coefficients, v is the vector representing the linear functional ||t−x||≤hn , the matrix A is the multivariate Vandermonde matrix whose i-th row has as its entries all the monomials of degree at most β * in the entries of X i , and y is the vector with y i = Y i . For the estimatorĝ to be well-defined, it is necessary that this problem is bounded from below, that is, that the objective is bounded from below on the polytope defined by the constraints. This need not always be the case, as the example in Jirak et al. (2014) with β * = 2 and two support points demonstrates. However, the alternate optimization problem proposed in Jirak et al. (2014) has the same problem of unboundedness. When q = 1, Problem (3.3) is bounded whenever we have at least β * + 1 points. This follows from the fact that the univariate Vandermonde matrix is totally positive when the support points X i are positive. However, for higher dimensions it is not as simple. There, the Vandermonde matrix with q+β * q rows needs not be invertible. As of now, we believe that for q > 1 the boundedness of (3.3) needs to be checked on a case-by-case basis using linear optimization algorithms. By duality theory we know that (3.3) is bounded if and only if there exists a vector g ≥ 0 with A T g = v. This linear program can become very large as q and β * grow. For instance, if q = β * and A has N = 2q q rows, then the interior point algorithm from Vaidya (1989) runs in O(N 2.5 ) time in the worst case, which in terms of q grows faster than 4 2.5q · q −1.25 . Nevertheless, this is a theoretical worst case and the average case might be better, possibly using another algorithm. For implementations, the authors suggest using the Python module scipy.optimize.linprog, which by default uses an interior point algorithm based on the MOSEK interior point optimizer by Andersen and Andersen (2000).
Remark 3.2 To illustrate the estimation procedure we take a look at the simplest case of β * = 0 which results in a local constant approximation. The estimator defined in (3.1)-(3.2) then simplifies toĝ(x) = max{Y i | i = 1, . . . , n with X i − x ≤ h n }. In Figure 1 an example for q = 1 and uniformly distributed X i is shown. For each x-value a constant function is fitted to the data in the neighborhood. We work under the following four assumptions (K1)-(K4).
(K1) Regression function: g belongs to some Hölder class of order β ∈ (0, ∞) that is g is β -times differentiable on [0, 1] q and all partial derivatives of order β satisfy (K2) Errors distribution: The errors ε i are independent and identically distributed on (−∞, 0] with common cdf F that satisfies with α, c > 0 and r(y) = o(|y| α ) when y 0.
(K4) Design points: The covariates X 1 , . . . , X n are iid random variables defined on a probability space (Ω, F, P) and valued on [0, 1] q with cdf F X and density f X that is bounded and bounded away from zero. Besides, they are independent of the errors ε 1 , . . . , ε n .

The fixed design case
Similarly to the random design case, for x ∈ [0, 1] q , we consider the regression function estimatorĝ defined asĝ where p is a polynomial of total degree β * ∈ N 0 and minimizes the local integral We work for the fixed design case under (K1)-(K2) and the following modified assumptions (K3') and (K4').
(K4') Design points: Let I n ⊂ [0, 1] q be a q-dimensional interval which is the cartesian product of one-dimensional closed intervals of length dh n with d > 0. We assume that at least d n (d) and at most d n (d) design points lie in all such I n . Remark 3.4 It is clear for assumption (K2) that the errors do not depend on the covariates in both cases of random and fixed design setups. Still, it has to be noticed that when dealing with the triangular scheme defined in (2.3), the errors depend on n too. This justifies the addition of the second index in ε i,n . Indeed, the i th design point x i,n may vary with the sample size; see the construction of the fixed multivariate equidistant design in Section 2.2 to be convinced.
Remark 3.5 The size of α in assumption (K2) is an important factor for the performance of the estimator defined in (3.1)-(3.2) and (3.5)-(3.6) respectively. Simply speaking the smaller α > 0, the better the estimator. For α < 2 the error distribution is irregular and in this case the rate of convergence for the estimator is faster than the typical nonparametric rate, see the considerations below Theorem 4.1. In Figure 2 we show some examples for different error distributions to highlight the effect of the size of α. To simplify the presentation we restrict the display to q = 1.
Figure 2: Scatter plots of ( i n , Y i,n ), i = 1, . . . , n (fixed equidistant design for q = 1) and the true regression function g(

Main results
In this section we give the uniform consistency as well as the convergence rates of our estimatorĝ separated for the two considered cases.

The random design case
In the next theorem, we provide the uniform consistency as well as the rate of convergence of the estimator of the regression function defined in (3.1)-(3.2) for the random design case.
Note that the deterministic part O(h β n ) stems from the approximation of the regression function by a polynomial whereas the random part O P log(n) nh q n 1/α results from the observational error. Balancing the two error rates by setting h n ≈ (log(n)/n) 1 αβ+q gives sup x∈[0,1] q |ĝ(x) − g(x)| = O P ((log(n)/n) β αβ+q ). For the case of an irregular error distribution, i. e. α ∈ (0, 2), this rate improves upon the typical optimal rate O P ((log(n)/n) β 2β+q ) for the nonparametric estimation of mean regression functions in models with regular errors. The proof of the error rate that stems from the observational error follows along similar lines as the proof in Drees et al. (2019) but major adaptions are needed to deal with the multivariate and random case. The proof of the deterministic error rate that is due to approximating the boundary curve g by a polynomial is based on the proof of Theorem 3.1 in Jirak et al. (2014). In the univariate equidistant fixed case that is treated in Drees et al. (2019) this Theorem can be applied directly whereas in the multivariate, possibly random case that is treated in the paper at hand the proof has to be intensely modified. Indeed, the original proof in Jirak et al. (2014) fully relies on the fundamental theorem of algebra, which states that every polynomial equation in one variable with complex coefficients has at least one complex solution. As far as we know there is no possible extension of such a result for higher dimension hence moving to multidimensional covariates requires completely different arguments. See the proof of Proposition A.1 and especially the proof of Lemma A.2 for this modification. This also gives an alternative proof of Theorem 3.1 in Jirak et al. (2014) and extends it to the multivariate and possibly random case.

The fixed design case
We give in the next theorem the uniform consistency as well as the rate of convergence of the estimator of the regression function defined in (3.5)-(3.6) for deterministic design points. When β * = 0 the Hölder class defined in Assumption (K1) reduces to the so-called class of β-Hölder uniformly continuous functions with β ∈ (0, 1]. In this framework, the boundary curve g may be estimated by a local constant approximation g(x) = max{Y i,n |i = 1, . . . , n with x i,n − x ≤ h n }. (4.1) This local constant approximation based estimator has been studied in Neumeyer et al. (2019) in the univariate setup for both random and fixed design points. Under the weaker assumption of continuity of the boundary curve g, they showed the uniform consistency of the estimator defined in (4.1) on the whole unite interval [0, 1]. Strengthening with the β-Hölder uniformly continuity assumption and iid regularly varying innovations, we obtain from Theorem 4.3 for the multivariate case, the uniform consistency as well as the rate of convergence for the deterministic design case on the whole unit interval [0, 1] q . We sum up in the following corollary.

Simulations
To study the small sample behavior, we generate data according to the model with X 1 , . . . , X n iid ∼ Unif([0, 1] q ) and ε 1 , . . . , ε n iid ∼ Exp(1) for q = 2, 3 and sample sizes n = 10 q , 20 q , 30 q . Thus in the simulated model the error distribution fulfills α = 1 and we consider different values of β * = 0, 1, 2, 3 which means that we investigate local constant, local linear, local quadratic and local cubic approximation. Since our estimator is computed by minimizing an integral over a polynomial the estimation procedure consists of solving a linear programming problem (compare to Remark 3.1). The bandwidth is chosen as h n = n − 1 β * +1+q which corresponds (up to a log term) to the theoretically optimal bandwidth for α = 1 that balances the two error rates, see the considerations below Theorem 4.1. We run the same simulations with fixed equidistant design points x 1,n , . . . , x n,n as described in section 2.2. The results are very similar and thus we only present the results for the random design case.
In Table 1 the results for 1000 replications are shown where we display the estimated mean squared error of our estimatorĝ(0.5, 0.5) (ĝ(0.5, 0.5, 0.5) respectively). It can be seen that the estimator works reasonably well and the results improve as β * grows. The results improve as well as n grows which are both expected effects that correspond to the theoretical result in Theorem 4.1.  In Figure 3 we show the true boundary curve and the estimated curveĝ in comparison. Since in dimension one the presentation of two curves in one plot is clearer than in higher dimensions we display a cut through the two respectively three dimensional surface of the functions. To be precise we plot the functions x → 0.5 sin(2π(x + 0.5)) + 4(x + 0.5) and x →ĝ(x, 0.5) (x → 0.5 sin(2π(x + 1)) + 4(x + 1) and x →ĝ(x, 0.5, 0.5) respectively). It can be seen that the approximation gets better as β * grows both for q = 2 and q = 3. For β * = 2, 3 the approximation is very good, also in both considered cases of two and three dimensional covariates. Exemplarily we show the results for n = 20 q since the effect is very similar for the other cases.
To evaluate the performance of the estimator on the whole interval [0, 1] q we display in Table 2 the arithmetic mean of the estimated mean squared error ofĝ(x 1 ), . . . ,ĝ(x N ) where x 1 , . . . , x N form a grid on [0, 1] q with N = 20 q . It can be seen that the performance is surprisingly poor for n = 10 2 and β * = 2, 3. In Figure 4 we show plots of the estimated mean squared error ofĝ on [0, 1] 2 for these cases. From the picture it can be deduced that the problem lies on the boundaries. The reason could be the smaller number of observations in this area which are of higher significance for larger β * . Proposition A.1 Assume that model (2.2) holds under (K2) and (K4) and consider the regression function estimatorĝ defined in (3.1)-(3.2) where g fulfills condition (K1) for some β ∈ (0, β * + 1] and some c g ∈ [0, c * ]. Then, there exist C β * ,q,c * , C β * ,q and a natural number J(β * ), which depend only on the respective subscripts such that for all Proof: We first highlight that throughout the proof the design points X i : (Ω, F, P) −→ ([0, 1] q , B[0, 1] q ) are random elements. To make the reading easier, we omit the script ω but the proof has to be understood ω-wise that is for any realisation X i (ω), ω ∈ Ω. Besides, unless it is specified otherwise, n ∈ N is arbitrary.
Let x ∈ [0, 1] q be fixed and for n ≥ 1 set Note that assumption (K3) here is not required. Nevertheless, we assume that h n are positive bandwidths such that lim n→∞ h n = 0. The idea of proof is based on Theorem 3.1 in Jirak, Meister and Reiß (2014) but comprehensive adaptions are needed to deal with the multivariate case. We consider random design points satisfying assumption (K4) where the Riemann approximation in the aforementioned paper is replaced by the integral defined in (3.2).
This means that we consider the coefficients (b j ) j for all multiindices j with |j| ∈ {0, . . . , β * } which minimize the objective function Now, a Taylor-Lagrange development up to the order β of g around x yields where r β is the remainder term defined for t, x ∈ [0, 1] q by By assumption (K1) we can make use of the Hölder property (3.4) and get with some constants c g , c 1 (β, q) < ∞.
Consider now that b j are the Taylor coefficients such that With this we define the following two quantities Then, one may rewrite the data points Y i in the model (2.2) as and from what precedes, under the constraint X i ∈ I * n , it follows that Since the errors ε i , i = 1, . . . , n are non-positive, we have for any Thus, since the coefficients (b j ) j minimize the local integral (A.1), we have for all n. Define now the polynomial as the difference of the integrands of the last two quantities. From (A.3), it follows that for any n and for any boundary curve g.
First note that λ(Q 0 ) = 0 or Q ≡ 0 where λ is the Lebesgue measure. In the latter case, Proposition A.1 is trivially true since then from (A.2) we have For Q ≡ 0 we have λ(Q + ) > 0 (A.5) which we will prove via a contradiction. If λ(Q + ) = 0 this would imply that λ(Q − ) = λ(I * n ) since λ(Q 0 ) = 0 and Q − ∪ Q + ∪ Q 0 = I * n . But λ(Q − ) λ(I * n ) = 1 is a contradiction to (A.4) so (A.5) must be true. This is needed for the application of Lemma A.2. Note that by definition It is a Polynomial on [0, 1] q and it inherits from Q the properties of a nonnegative integral 2 min((x (r) + h n ), 1) for some γ 1 , γ 2 ∈ [0, 1] q . Thus B δ has a volume of minimum size c 2 (β * , q)h q n for some c 2 (β * , q) > 0. On the other hand, for any design point X i ∈ I * n , by definition, where the last inequality comes from the constraint Y i ≤ p(X i ) and | i | ≤ * n for all i = 1, . . . , n.
2 Lemma A.2 Let β * and q be natural numbers. There exist positive real numbers δ, c such that for all polynomial functions P : [0, 1] q → R of degree β * with non-negative integral over [0, 1] q there exists a δ-ball B δ ⊆ [0, 1] q w.r.t. the maximum norm such that P ≥ 0 on B δ and Proof: For all polynomial functions P : [0, 1] q → R of degree β * we have where || · || [0,1] q ,Eucl denotes the supremum of the Euclidean norm of its argument over the set [0, 1] q and ∇P stands for the gradient of P . This result follows from Theorem 3.1 in Wilhelmsen (1974); see Remark A.3. By the mean value theorem and the Cauchy-Schwarz inequality, this implies that for all x, y ∈ [0, 1] q we have where || · || is the maximum norm and L = 4 √ q(β * ) 2 . The factor √ q stems from the change of norms. Define ε = 1 2L , δ = |Bε| 4L and c = |Bε| 4 . Let P be any such polynomial function. Let P attain its supremum over the set V + = {x | P (x) > 0} at the point x 0 , and denote by x 1 the point where P attains its supremum on the whole cube [0, 1] q . First, we show where |B ε | denotes the volume of some ε-ball inside [0, 1] q . This step is necessary in case x 1 ∈ V + , otherwise it is not needed but the statement is still trivially true. We note that P is non-zero on the ε-ball B ε centered at x 1 . Indeed, taking y to be the nearest point to x 1 such that P (y) = 0, Equation (A.11) gives ||x 1 − y|| > ε. Next, by the mean value theorem there exists t 0 ∈ B ε such that Applying Equation (A.11) to x 1 and t 0 gives |P (t 0 )| ≥ 1 2 |P (x 1 )|. Hence, In the second inequality, we used that the integral of P over [0, 1] q is non-negative. Next, let B δ denote the δ-ball centered at x 0 , and let P attain its infimum over B δ at the point x 2 ∈ B δ . Note that P > 0 on B δ . Indeed, taking y to be the nearest point to x 0 such that P (y) = 0, Equations (A.11) and (A.12) give ||x 0 −y|| > δ. Finally, applying Equations (A.12) and (A.11) to x 0 and x 2 gives Hence, P (x 2 ) ≥ c|P (x 1 )| as required. This concludes the proof.
2 Remark A.3 The bound given in the Markov inequality in Equation (A.10) in Lemma A.2 is far from being optimal. For the sake of completeness, we recall the full statement of Theorem 3.1 in Wilhelmsen (1974). With the above notation, it writes where T is a compact and convex set with non-empty interior and ω(T ) stands for the thickness of T , that is the minimum distance between two parallel supporting hyperplanes for T . Note that in the context of Lemma A.2, ω(T ) = ω([0, 1] q ) = 1. The bound of Equation (A.13) has been improved later on by Kroó and Révész (1999). They showed that for any convex body (that is a compact convex set with nonempty interior) the constant 4β * 2 ω(T ) may be replaced by 4β * 2 −2β * ω(T ) . One may go further observing that the space [0, 1] q , under some adequate normalisation, may be seen as a central symmetric convex set (a set is central symmetric if and only if with proper shift it is the unit ball of some norm on R q ) so one may use the result of Sarantopoulos (1991) to get an even smaller bound. We refer to the book of Rassias and Tóth (2014) for an exhaustive treatment of Markov-type inequalities for multivariate polynomials. For the sake of simplicity, we have presented the result with the bound given in Wilhelmsen (1974). Similarly Proposition A.6 and Proposition A.7 respectively gives way to the expected rates in the main result of Theorem 4.3. Now observe that it is obvious that we only have to consider those I where there exists a d > 0 such that |(x (r) + h n I r ) ∩ [0, 1]| ≥ dh n for all r = 1, . . . , q with I = I 1 × · · · × I q and by | · | we mean the length of the one-dimensional intervals. Next we will show that with probability converging to one there are at least d n = O(nh q n ) and at most d n = O(nh q n ) random design points in every hypercube I n ⊆ [0, 1] q with edge length dh n . This implies that at least d n random design points lie in x + h n I. The number of points in I n can be written as n i=1 I{X i ∈ I n } and for all I n 1 n n i=1 I{X i ∈ I n } ≤ P(X 1 ∈ I n ) For the term P(X 1 ∈ I n ) we get the upper bound (dh n ) q sup t∈[0,1] q f X (t) and the lower bound (dh n ) q inf t∈[0,1] q f X (t) by and the consideration about the length of I n . Both bounds on P(X 1 ∈ I n ) are of order O(h q n ) by assumption (K4). Thus it remains to prove that with probability converging to one. Set therefore P n f n,x := 1 n n i=1 f n,x (X i ) and P f n,x := E[f n,x (X 1 )] with f n,x (t) := I{t ∈ I n }. Note that P f 2 n,x = P(X 1 ∈ I n ) ≤ (dh n ) q sup t∈[0,1] q f X (t). Note further that f n,x (t) = I{t ∈ I n } = I{ t − an+bn 2 ≤ d 2 h n } -where a n = (a n,1 , . . . , a n,q ), b n = (b n,1 , . . . , b n,q ) and I n = [a n,1 , b n,1 ] × · · · × [a n,q , b n,q ] -and thus the conditions of Example 38 and Problem 28 in Pollard (1984) are fullfilled. Then since |f n,x | ≤ 1 Theorem 37 in Pollard (1984) can be applied and thus sup In |P n f n,x − P f n,x | = o(h q n ) a. s.
which proves (A.14). Now the assertion of the Proposition follows with the same arguments as in the proof of Proposition A.6 with the modification from the proof of Proposition A.7.

2
A.2 Proof of Theorem 4.3 The proof of Theorem 4.3 is similar to the proof of Theorem 4.1 and is based on the following Propositions A.5, A.6 and A.7 respectively.
Proposition A.5 Assume that model (2.3) holds under (K2) and (K4') and letĝ be defined in (3.5)-(3.6) satisfying (K1) for some β ∈ (0, β * + 1] and some c g ∈ [0, c * ]. Then, there exists constants C β * ,q,c * , C β * ,q and a natural number J(β * ) such that for all Proof: The proof is almost identical to the proof of Proposition A.1 and is skipped here for the sake of conciseness.  Figure 5: Example for building sets for q = 2, d n = 4. On the left hand side the first three sets are illustrated; in the middle and on the right hand side the transition from one row to the next one is shown. In each picture the smaller green rectangle indicates D j−1,n , the oval D j,n and the larger purple rectangle D j+1,n , respectively.
It is obvious that we only have to consider those I where there exists a d > 0 such that |(x (r) + h n I r ) ∩ [0, 1]| ≥ dh n for all r = 1, . . . , q with I = I 1 × · · · × I q and by | · | we mean the length of the one-dimensional intervals. Note that at least d n = dh n n 1 q q design points lie in such an I. Now we want to arrange the design points x i,n in sets of size d n such that we can replace sup x∈[0,1] q by a maximum over these sets. To this end set D 1,n := {x i,n : x (r) i,n ≤ dh n , r = 1, . . . , q}.
For j = 2, 3, . . . the set D j,n is built out of D j−1,n by removing one design point from the set and adding a new one in an appropriate way, so that all sets D j,n fulfill: (E1) For every x ∈ [0, 1] q there exists D j,n with D j,n ⊆ {x i,n : x i,n ∈ x + h n I}.
(E2) Each set D j,n contains exactly d n design points.
(E3) For every j ∈ {2, . . . , d n } there exists exactly one i with x i,n ∈ D j−1,n \D j,n , where d n is the total number of sets.
An example for building these sets is given in Figure 5. With this procedure Note that for every j ∈ {2, . . . , d n } there exists exactly one i with x i,n ∈ D j−1,n \D j,n . We denote this by i(j). This implication follows from several Taylor expansions. Let z n → 0 and y n → ∞ denote some sequences with y n z n → 0. Now by a second order Taylor expansion of log(1 + z n ) we get for n sufficiently large k n (1 − (1 + z n ) yn ) = k n (1 − exp (log ((1 + z n ) yn )) = k n (1 − exp(y n z n ) + O(y n z 2 n )) = k n (1 − exp(0) + O(y n z n ) + O(y n z 2 n )) = O(k n y n z n ) where the third equality follows by a first order Taylor expansion of exp(y n z n ). This means that k n 1 − 1 − (1 + d n U (Lr n ))U (Lr n ) dn ln+1 kn = O k n l n k n (1 + d n U (Lr n ))U (Lr n ) dn which proves our claim (A.16) ⇒ (A.15).
Note that U (Lr n ) = aL α |log(n −1 (nh q n ) 1/q )| nh q n (1 + o(1)) for some positive constant a and thus we have U (Lr n ) dn = exp(d n log(1 − U (Lr n ))) ∼ n→∞ exp −aL α log n −1 (nh q n ) 1/q by a second order Taylor expansion of log(1 − U (Lr n )) and d n ∼ n→∞ nh q n . Further (1 + d n U (Lr n ))U (Lr n ) dn ∼ n→∞ exp log log n −1 (nh q n ) 1/q − aL α log n −1 (nh q n ) 1/q = exp log log n −1 (nh q n ) 1/q (n −1 (nh q n ) 1/q ) aL α = h n n 1−q q log n −1 (nh q n ) 1/q (n −1 (nh q n ) 1/q ) aL α −1 = o(h n n 1−q q ) for sufficiently large L. This concludes the proof since for n sufficiently large Now the assertion follows with the same arguments as in the proof of Proposition A.6. 2