A compact formulation for the base station deployment problem in wireless networks

This article considers the base station deployment problem in a wireless network. The natural formulation of this problem usually leads to numerical and memory issues, preventing users from dealing with real‐world cases. We provide a compact reformulation that allows us to get beyond the drawbacks of the natural formulation. Tests are done on ten instances derived from realistic LTE scenarios. The computational results show that the proposed reformulation enables mixed‐integer programming solvers to provide an optimal solution in a short amount of time.


INTRODUCTION
Wireless network design (WND) is the problem of configuring a set of transmitters to provide service in a target area. The term configuring refers both to the optimal identification of the positions, the so-called base station deployment problem, and some parameters of the transmitters (e.g., power emission, frequency). This article addresses the base station deployment problem in LTE networks, thus assuming power emission and frequency as fixed.
The increasing traffic and the densification of the base stations along the territory have led to an increase in interfering signals, making the problem under examination more relevant and challenging. Hence, careful planning of the base station location becomes crucial to reduce infrastructure costs that operators will face and meet the service quality requirements demanded by users.
The research was carried out in collaboration with Fondazione Ugo Bordoni (FUB) [18], a higher education and research institution in telecommunications under the supervision of Ministero dello Sviluppo Economico, providing innovative services for government entities. The collaboration with the FUB was specifically intended to identify a compact formulation manageable by practitioners and implemented using standard modeling languages.
Several optimization models have been proposed in the literature for WND and the specific problem of base station deployment. However, most of the available compact models are affected by numerical issues, becoming more evident when tackling medium/large-size instances.
This article presents a novel compact formulation that partially overcomes the numerical issues affecting the most commonly used formulations. Our formulation has three main advantages: • it is tractable by commercial mixed-integer programming (MIP) solvers, which provide heuristics leading to excellent and fast results (for a discussion of how heuristics of MIP solvers have a crucial impact on solving complex problems, see [9]); • it is reduced in size compared to the natural formulation; hence we can deal with large instances arising from real-world problems.
The remainder of this article is organized as follows. Section 2 discusses the main contributions in the wireless network planning literature. Section 3 reports the formal statement of the base station deployment problem and describes a basic natural formulation. In Section 4, we introduce a compact reformulation that overcomes the main drawbacks of the basic. Section 5 compares the computational results obtained using the basic formulation and the proposed reformulation on realistic instances provided by the FUB. Conclusions are given in Section 6.
Concerning the exact methods, the authors of [27] tackle the problem of optimal base station locations and power emissions, maximizing the service provider's profit. Their mixed-integer linear programming (MILP) formulation is solved via an exact solution method that combines combinatorial Benders decomposition, classical Benders decomposition, and valid cuts in a nested way. In [6], the authors aim to maximize mobile operator profits by maximizing coverage and minimizing costs. This article takes its basis on the formulations presented in previous works [1,2,16] related to LTE-RAN applications. The formulations proposed in [1,2,6,16] are solved using standard exact solvers, and tests have been done on randomly generated instances.
In [26], a two-stage heuristic algorithm is proposed to solve large instances of MILP network planning. In the first stage emission powers are assigned, while in the second stage frequencies are optimized. In [20], the minimization of the total number of deployed base stations is pursued. The model uses binary variables and is solved with a meta-heuristic approach based on swarm intelligence. Another heuristic contribution is [8], which considers the use of constraints on electromagnetic emissions and the distance from sensitive areas. The pursued objective is a weighted function of installation costs and service coverage. The authors propose a heuristic based on a decomposition step, followed by a partial enumeration of the solutions. In [17], a planning methodology is introduced that minimizes interference while maintaining the coverage. The resulting non-convex model is tackled by a two-stage heuristic, made up of a local search followed by an integer programming heuristic.
Some contributions investigate the aspect of numerical problems [7,[10][11][12][13][14][15]25] that arise in the natural formulations of WND due to a constraints matrix with coefficients that greatly vary in their order of magnitude. The numerical issues make the problem intractable for real-world instances, which are typically large. Consequently, most exact MIP-based approaches that aim to overcome numerical drawbacks have been oriented towards non-compact formulations and row generation methods. In particular, a non-compact binary formulation is proposed in [14], in which both locations and power emissions of the antennas are optimized to maximize population coverage. The authors discretize the power emissions getting knapsack inequalities with generalized upper bounds (GUBs, defined in [33]), which are reformulated using GUB covers. They also investigate an exact algorithm consisting of row generation embedded in a branch-and-cut framework. In [15], the same authors of [14] introduce a 0-1 model for a variant of WND related to the feasible server assignment problem. In [10], numerically stable 0-1 formulations and a robust optimization model managing signal propagation uncertainty are investigated. In [7], the maximum link activation problem is considered, and a non-compact formulation is proposed that uses cover inequalities to replace the source of numerical instability. In [12], the source of numerical issues in WND is deeply investigated, and the use of numerically safe linear programming (LP) solvers is suggested to make the solutions reliable.
As for the heuristic approaches proposed to overcome the numerical instabilities, in [11], a MILP problem for the optimal allocation of power emissions and frequency channels is tackled through a genetic algorithm suitable for large-scale problems. The model maximizes the service coverage. In [25], a MILP formulation for the power assignment problem in wireless networks is employed, and the signal orientation of the antenna is also considered. A constructive heuristic followed by an improving local search is proposed. In [13], a matheuristic is investigated, which combines a genetic algorithm exploiting a suitable linear relaxation of the problem and an integer linear programming heuristic, improving the solutions found with the relaxation.
A summary classification of the cited works can be found in Table 1, where NI stands for numerical issues, indicating the contributions that address them. We refer the reader to [24] for a complete view of optimization and numerical challenges in modern wireless network design.

PROBLEM STATEMENT AND FORMULATION
The base station deployment problem aims at selecting a subset of locations for base stations providing wireless connection to a target area. The target area is usually partitioned into elementary areas of identical size, in line with the recommendations of the telecommunications regulatory bodies. Each elementary area is called testpoint, and it is assumed to represent all users within the corresponding elementary area, meaning that a testpoint is a representative receiver. Each testpoint receives signals from all the transmitters. The power received by each receiver is classified as serving power if it relates to the power emitted by the transmitter serving that receiver; otherwise, it is classified as interfering power, according to the physical layer specifications of the LTE standard [30]. A receiver is regarded as served by a base station if the ratio of the serving power to the sum of the interfering powers and noise power (Signal-to-Interference Ratio or SIR) is above a threshold [29], whose value depends on the desired quality of service. We also consider the presence of received noise.
In the planning problem addressed here, we assume that frequency channel and power emissions are given and equal for all transmitters. Regarding the frequency, this is a straightforward assumption because, for different frequency channels, the problem decomposes as there is no interference among non-co-channel signals. The assumption of a unique power level is motivated by the indications of the FUB on the practice followed in the planning phase, in which each activated transmitter uses a power emission equal to a standard value, typically the mean value of the feasible range.

A basic compact formulation
The basic formulation is derived from the formulation reported, for example, in [14]. Let  be the set of potential transmitters and  the set of receivers located at the testpoints. We introduce the variables The quality of the received signal at a testpoint can be measured through several criteria; we adopt the Signal-to-Interference-plus-Noise Ratio (SINR). Let a tb > 0 be the power value measured in t ∈  of the signal emitted by b ∈ , then a receiver t is served by ∈ , if the SINR of the serving power to the sum of the interfering powers and noise > 0 is above a given SINR threshold ≥ 0, namely Following [14], we can rewrite the SINR condition through the big-M constraints where M t is a large positive constant. When x t = 1, (2) reduces to (1); when x t = 0 and M t is sufficiently large, (2) becomes redundant. We can set, for example, M t = + ∑ b∈⧵{ } a tb . A constraint to express a minimum service coverage of the territory is included. We assume that each testpoint weights r t ∈ R to account for the fact that testpoints can represent a variable number of users or be more or less crucial in service coverage. A minimum coverage level c ∈ R of the testpoint is enforced by the constraint: When r t are all equal, for example, to 1, c represents the minimum number of testpoint to be covered, and the constraint above corresponds to a territorial coverage. When r t ∈ [0, 1] can be interpreted as the fractions of the population living in the elementary area t ∈  , as done in our numerical experiments, then c ∈ [0, 1] represents the fraction of population to be covered by the service and the constraint represents a population coverage. Each testpoint must be covered by at most one serving base station, namely ∑ b∈ Under the FUB's purpose, we aim to minimize the number of activated base stations. Thus the basic formulation is the following 0-1 linear programming model: min where S is the feasible region defined as Sets and parameters used in the model are summarized in Table 2. In principle, model (5) can be solved by commercial solvers. However, it is well-known that numerical and memory issues arise when solving practical instances, as widely described in [12,14]. The main issues encountered are: • The power received in each testpoint ranges in a large interval, from small values (order of 10 −7 ) to huge (10 5 ), which makes the range of coefficients a tb in the constraints matrix large and the solution process numerically unstable and possibly affected by error; • The big-M coefficients lead to poor quality bounds that impact the effectiveness of standard solution procedures; • Practical problems lead to models with a large number of variables and constraints.
As a result, realistic base station deployment problems are hard to solve using standard optimal procedures.

A PRIORI REFORMULATION
We present a reformulation tightening the basic formulation and reducing the number of constraints in order to overcome the numerical criticalities outlined in Section 3.1.

VUB-based tightening
First, we tighten the basic formulation by adding the variable upper bounds (VUBs): enforcing that a testpoint t ∈  can be assigned to the transmitter b ∈  only if b is activated. VUBs turned out to be computationally very effective, as reported in Section 5. Furthermore, we can also strengthen each SINR constraint (2) by replacing the z with the x t : Theorem 1. Inequalities (7) are valid for S.
Proof. For each t ∈  and ∈ , inequalities (2) and (7) can be written respectively as z ≥ h a t and using a t > 0. Inequality (6) can be written as z − x t ≥ 0. Hence we get that inequality (2) can be obtained summing up inequalities (6) and (7), meaning that it is implied by them. ▪

Constraint aggregation
In order to reduce the size of the problem, we aggregate the constraints (7) using (4), producing the aggregate SINR constraints where the big-M term depends only on the testpoint t and could be set to M t = + ∑ b∈ a tb ≥ M t . Theorem 3. Inequalities (8) are valid for S.
Proof. Given t ∈  , we can have the following cases according to (4): t is covered by exactly one base station , or t is not covered by any base station. In the former case, thanks to (4), the sum ∑ b∈ a tb x tb reduces at most to a single element a t x t , being the unique transmitter serving testpoint t, and the same is for Since x t = 1 implies z = 1, we get that (7) is satisfied. In the latter case, no transmitter is serving testpoint t and we get ∑ b∈ a tb x tb = 0 and which satisfy (7) when x t = 0. ▪ Observe that Theorems 1-3 allow to use the aggregate constraints (8) for replacing all the SINR constraints (2) in problems (5). Since for each receiver t the aggregate SINR constraint is only one, whereas for each receiver t the SINR constraints are m = ||, we get a formulation with a reduced number of constraints.

Coefficient tightening
We also perform a heuristic sparsification to deal with the numerical issues arising from the coefficients of the SINR inequalities. We set a minimum threshold on the received power below which the received power can be considered null. Namely, we set This allows us to reduce the size of the problem by eliminating some x tb variables a priori.
Furthermore, we can tighten the formulation by computing the smallest possible big-M. Consider the big-M suitable for the aggregate SINR constraints, namely M t = + ∑ b∈ a tb for t ∈  .
To decrease its value, we replace the sum of the signal power received from all the transmitters in testpoint t with the sum of the strongest interferers. In particular, only the strongest interferers might be considered, where is an upper bound of the optimal number of activated base stations, that is, ≥ ∑ b∈ z * b with (x * , z * ) optimal solution. Given , the big-M can be computed as where  t is the set of the base stations emitting the strongest signals received in t, that is, | t | = . The better is the bound , the smaller is the big-M. Hence, the estimate of must be as accurate as possible.

The overall reformulation
The new strengthened and reduced formulation differs from the basic formulation since it contains the aggregate SINR constraints (8) instead of the SINR constraints (2) and considers the addition of VUBs (6). It can be formulated as follows where the new big-M coefficient can be defined asM t = + ∑ b∈ t a tb , in which  t is the set of the transmitters emitting the strongest signals received in t and the coefficient is an upper bound of the optimal value. We have also carried out the heuristic sparsification in Section 4.3.

COMPUTATIONAL RESULTS
We compare the results obtained using the basic formulation (5) and the final reformulation (10) to solve the base station deployment problem. The code has been implemented in Python, and the experiments have been carried out on a Ubuntu server with an Intel(R) Xeon(R) Gold 5218 CPU with 96 GB RAM. Gurobi optimizer 9.5.1 [21] with default settings has been employed as a MIP solver. We set a time limit of four hours for computation time.

The test-bed
The FUB provided us with the test-bed of an existing LTE network, in which the set of transmitters consists of the 135 actual potential locations for the transmitters in the Municipality of Bologna (Italy). Since the technology we refer to is 4G LTE 800 MHz, the elementary areas should have a square shape with a side of 100 m following the Italian Resilience Plan [22], which is based on the guidelines given by the Body of European Regulators for Electronic Communications [4,5]. Thus, the set of the testpoints provided by the FUB corresponds to the centers of the 14 078 squared elementary areas that can be obtained on Bologna.
The system noise and the power received in each receiver by each possible transmitter have been simulated by the FUB using the Cost Hata model [3, 19, 30] with a transmitter power emission set at the mean value of 40 W (being the feasible range [20,80]). The power values expressed by the parameters a tb are in W and scaled by a factor of 10 10 to avoid numerical issues and obtain better accuracy on optimal solutions, as suggested in [12]. Accordingly, the threshold on the quality of service and the system noise are expressed in W; is also scaled by 10 10 . The threshold in (9) has been set around −110 dBmW [31,32]. In (3), the parameter r t has been set to the fraction of the population living in the elementary area t ∈  , and c to the minimum fraction of the population to be covered by service.
Since the number of the testpoints, which is strictly related to the size of the elementary areas, significantly impacts the size of the problem, we decided to aggregate the testpoints and consequently reduce the number of variables and constraints of the model. Specifically, we used k-means clustering to smartly partition testpoints into groups (or clusters) such that intra-cluster data points are as similar as possible while keeping the clusters as different (far) as possible. We considered the sum of the squared distance between the cluster centroid and the vectors describing the data points as a measure of similarity. In particular, each testpoint t ∈  is characterized by a power vector p t = [a tb ] b∈ ∈ R || of the received power values from each transmitter b ∈ . Intuitively, we assume that two points with similar power vectors have similar behavior with respect to the SINR, and hence they can be aggregated in the same cluster.
We fixed the number of clusters to K and used the k-means++ algorithm implemented in the Scikit-learn library [28] with the default setting values for the maximum number of iterations and the relative tolerance, enabling a multistart procedure to improve the local solution. The solution returned by the algorithm is made up of the clusters  i and their centroids p c i with i ∈ 1, … , K, that generally do not correspond to any point p t . We selected as the representative for each cluster  i , the receiver t ∈  i with the p t closest to p c i . Depending on the urban and territorial morphology and transmitter positions, clusters may have different cardinality. Specifically, it is possible to identify areas split into many small clusters as the variation of the power received differs significantly even in neighboring points (e.g., in dense urban areas). In other areas, a coarse cluster is sufficient to represent the behavior of the receivers within it (e.g., in rural areas). This procedure resembles the widespread coarsening and refinement approach of the METIS algorithm [23], which suggests a first selection and subsequent refinement where necessary.
To further reduce the size of the problem, we selected a subset of transmitter-receiver pairs, which any feasible solution would exclude. Indeed, excluding the possibility of serving the full target area with a single transmitter, we can consider the best-case scenario as the one with only two transmitters deployed, where one transmitter works as a server and the other as an interferer. Hence, for each testpoint t, we fix x tb = 0 for all those transmitters b such that the SINR is below the threshold for each possible interferer, namely a tb + a th < ∀ h ∈ ⧵{b}.
It is worth noting that this is not the same as setting the coefficient a tb = 0. The territorial distribution of the testpoints of the Municipality of Bologna is depicted in Figure 1. The blue dots are the 14 078 original testpoints of the Municipality of Bologna identified by the FUB according to [4,5,22]. The black dots are the 4693 testpoints selected with the clustering procedure when the number of clusters is fixed to K = 4693 (about one-third of the original testpoints).
By reducing the number of testpoints and the number of base stations serving each testpoint, we got a reduced network. From this network, we derived ten instances (BO1-BO10), each of them differing in the quality of service required (increasing with the number) in the receivers and in the fraction of the population to be served, as reported in Table 3. Not to affect the solution time, we fixed a priori the upper bounds to small percentages of activated transmitters (15%, about 10% and 5%); a posteriori, we verified that the optimal number of activated transmitters was lower than . An estimate of can also be obtained using a heuristic. Average number of constraints (A) and non-zeros (B) in the tested formulations (see Table 4).

FIGURE 3
Box plots of the relative gap at the root node in the tested formulations (see Table 4).

Results
Here we show the impact of the individual reformulating operations discussed in the previous section (see Figures 2 and 3). The tested formulations are described in Table 4. In particular, we focus on three evaluation criteria, namely size, sparsity, and quality of root bounds. Figures 2 show the average number of constraints and non-zeros of each formulation respectively. Figure 3 shows five box plots, one for each formulation, displaying the distribution of the data concerning the relative gap at the root node based on a summary of five numbers (minimum value, first quartile, median value, third quartile, and maximum value).
Graphs in Figures 2 and 3 show that: • Formulation C, leads to a significant improvement in the root relaxation value, revealing the decisive effect of adding VUBs; • As regards the comparison between C and D, a little sparsification is observed in the decrease of the number of non-zeros (NumnNZs) with the only replacement of the variables in the SINR constraints; • The aggregation of the SINR constraints in formulation E involves, in some instances, a slight worsening of the bounds but leads to a definitely reduced and sparser formulation; • Sparsity continues to increase after carrying out coefficient tightening operations: formulation F is characterized by fewer non-zeros and also by fewer constraints.
A comparison between formulation B (5) and formulation F (10) is reported in Table 5. The evaluation metrics considered are: number of variables (NumVars), constraints (NumConstrs) and non-zeros (NumNZs); lower bound  produced at the root node (RootLB), relative optimality gap at the root node (RootGap), best upper bound (BestUB), final relative optimality gap (Gap); number of explored nodes (Nodes); time needed to produce a lower bound at the root node (RootLBTime) and the total solution time (TotTime); coverage gained on the testpoints selected with the clustering (ClustCov) and the original testpoints (OrigCov). Denoting the optimal value with Opt and the value of the best lower bound with BestLB, the RootGap is computed as 100(Opt-RootLB)/Opt, and the gap as 100(Opt-BestLB)/Opt. When it happens that the best upper bound coincides with the optimal value, the former is shown in bold. Results show that formulation F is definitely reduced in size and sparser than B. This is mainly due to the aggregation of the constraints and the coefficient tightening, as previously discussed. The quality of the bounds at the root node is better in F, mainly due to the use of VUBs. In some instances (e.g., BO6, BO8-BO10), the problem is even solved at the root node. The time needed to get the lower bound at the root node may be higher in B, but it leads to a much better bound. Thanks to a good root bound, the number of the explored nodes are heavily reduced in F, and consequently, the time spent on the branch-and-cut tree search. Overall, total solution times are significantly reduced if we compare F to B. Indeed, none of the basic formulations have been successfully solved within the time limit, whereas all the final formulations have been solved to optimality within the time limit. Note that by eliminating the time limit of 4 h, the solution times to solve formulation B are never less than 11 h on the ten instances and over 24 h for six out of ten.
In the end, formulation F turns out to be much more competitive than the basic formulation B.
For what concerns the accuracy of the solution, we evaluated the coverage of the optimal solutions found. To do this, we simulated the optimal solutions and verified the coverage of the testpoints, that is, verified the satisfaction of the SINR threshold. We can distinguish between the minimum fraction of the population to be covered (c) and the coverage actually obtained in correspondence to the optimal solution. The latter can be evaluated on the testpoints selected with the clustering procedure and is denoted by ClustCov. Otherwise, it can be evaluated on the original testpoints and, in this case, is denoted by OrigCov. The results of simulations show that the coverage actually gained on the instance is never less than the required minimum coverage c. This is probably due to the scaling operation made in advance on the coefficients and the ability of Gurobi to deal with numerical issues. The coverage returned by formulation F on the clustered testpoints is very close to the coverage computed on the original testpoints, validating its effectiveness.
We refer the reader to Appendix A for the figures that show the assignment of testpoints to each activated base station according to the optimal solutions found in the instances.

CONCLUSIONS
In this study, we addressed the base station deployment problem. We introduced a new compact reformulation that overcomes some of the numerical and memory issues of the existing models. The main scope was to provide a compact formulation suitable for large instances. Starting from a basic natural optimization model, we intervened in the formulation by carrying out both strengthening operations (i.e., additions of variable upper bounds, variable replacement, heuristic coefficient tightening, big-M reduction) and size reduction operations (i.e., constraints aggregation). The new reformulation was tested to solve the problem in different cases concerning the LTE technology in the Municipality of Bologna. The proposed reformulation allows us to efficiently solve large instances of the problem to optimality, in solution times consistent with planning operations.

APPENDIX A
Figures A1-A10 depict the optimal assignment of testpoints to each activated base station via color code. Each figure refers to the optimal solution found on an instance using the final formulation. The black dots represent the uncovered testpoints, whereas the gray dots represent the original testpoints not included in the instance.
The activated base stations have the same color as the testpoints they serve. The direction of the signal emitted by the antennas is illustrated in a simplified way with an arrow. The representations show that the assignment of the testpoints to the base stations appears to be consistent with their respective positions.

FIGURE A1
Optimal assignment of testpoints to the base stations on BO1. N/A, not available since the testpoint is not in the instance.

FIGURE A2
Optimal assignment of testpoints to the base stations on BO2. N/A, not available since the testpoint is not in the instance; None, the testpoint is covered by none of the activated antennas.

FIGURE A3
Optimal assignment of testpoints to the base stations on BO3. N/A, not available since the testpoint is not in the instance; None, the testpoint is covered by none of the activated antennas.

FIGURE A4
Optimal assignment of testpoints to the base stations on BO4. N/A, not available since the testpoint is not in the instance; None, the testpoint is covered by none of the activated antennas.

FIGURE A5
Optimal assignment of testpoints to the base stations on BO5. N/A, not available since the testpoint is not in the instance; None, the testpoint is covered by none of the activated antennas.

FIGURE A6
Optimal assignment of testpoints to the base stations on BO6. N/A, not available since the testpoint is not in the instance; None, the testpoint is covered by none of the activated antennas.

FIGURE A7
Optimal assignment of testpoints to the base stations on BO7. N/A, not available since the testpoint is not in the instance; None, the testpoint is covered by none of the activated antennas.

FIGURE A8
Optimal assignment of testpoints to the base stations on BO8. N/A, not available since the testpoint is not in the instance; None, the testpoint is covered by none of the activated antennas.

FIGURE A9
Optimal assignment of testpoints to the base stations on BO9. N/A, not available since the testpoint is not in the instance; None, the testpoint is covered by none of the activated antennas.

FIGURE A10
Optimal assignment of testpoints to the base stations on BO10. N/A, not available since the testpoint is not in the instance; None, the testpoint is covered by none of the activated antennas.