Adaptive pursuit learning for energy‐efficient target coverage in wireless sensor networks

With the proliferation of technologies such as wireless sensor networks (WSNs) and the Internet of things (IoT), we are moving towards the era of automation without any human intervention. Sensors are the principal components of the WSNs that bring the idea of IoT into reality. Over the last decade, WSNs are being used in many application fields such as target coverage, battlefield surveillance, home security, health care monitoring, and so on. However, the energy efficiency of the sensor nodes in WSN remains a challenging issue due to the use of a small battery. Moreover, replacing the batteries of the sensor nodes deployed in a hostile environment frequently is not a feasible option. Therefore, intelligent scheduling of the sensor nodes for optimizing its energy‐efficient operation and thereby extending the life‐time of WSN has received a lot of research attention lately. In particular, this article investigates extending the lifetime of the WSN in the context of target coverage problems. To tackle this problem, we propose a scheduling technique for WSN based on a novel concept within the theory of learning automata (LA) called pursuit LA. Each sensor node in the WSN is equipped with an LA so that it can autonomously select its proper state, that is, either sleep or active, with an aim to cover all targets with the lowest energy cost possible. Our comprehensive experimental testing of the proposed algorithm not only verifies the efficiency of our algorithm, but it also demonstrates its ability to yield a near‐optimal solution. The results are promising, given the low computational footprint of the algorithm.


RELATED WORKS AND CONTRIBUTIONS
A comprehensive survey on energy-efficient coverage protocols in WSNs was carried out in References 15,17. In Reference 12, an energy-efficient scheduling algorithm called EC 2 was proposed to address the problems of energy-efficiency, coverage, and connectivity in military surveillance applications using the fuzzy graphs. An adaptive energy-efficient target detection based on mobile WSNs was proposed in Reference 18. The authors proposed a clustering algorithm based on k-means++, and an energy control mechanism was used to reduce the energy consumption of nodes. A genetic algorithm (GA) approach for k-coverage and m-connected node placement in target based WSNs was proposed in Reference 19. In order to fulfill both coverage and connectivity requirements, the authors' scheme found a minimum number of potential positions to place the sensor nodes for a given set of target points. The detailed working of GA is presented in Reference 20. In Reference 21, node failure and battery-aware coverage protocol called optimized discharge-curve-based coverage protocol (ODCP) for WSNs was proposed. The authors' ODCP protocol used battery discharge rate, failure probability, and coverage overlap information to determine optimal sleep schedules for redundant nodes and thereby reducing the energy consumption of the network.
Furthermore, an energy-efficient distributed algorithm for weak-barrier coverage using event-based clustering and adaptive sensor rotation was proposed in Reference 22. In Reference 23, a system using the GA to optimize k-coverage criteria in WSNs to ensure continuous monitoring of targets with limited energy resources for the longest possible time was considered. The authors' proposed model extends the network lifetime in a range of 23% to 45.7%. Moreover, through the proposed algorithm, the authors also showed that the running time to form the network structure was reduced by 12%. A minimum energy target tracking with coverage guarantee in WSNs was proposed in Reference 24. The authors showed that the residual capacities of the sensors could be preserved and balanced to carry out additional target tracking missions using the same WSN. Some WSN monitoring applications may require that at least k directional nodes in m-connected WSNs track the targets. Therefore, the authors in Reference 25 proposed a target tracking system using directional nodes. Network coding is an effective method used to improve network throughput and, thus, efficiency by combining the packets at the intermediate nodes from multiple sources, thereby reducing the number of transmissions. Therefore, the authors in Reference 26, proposed a novel approach for target coverage in WSN based on network coding technique.
While the methods described in the above literature address the problem of target coverage in WSN. But at the same time, these methods are complex and hence are not scalable. One of the scheduling methods for energy-efficient WSN is through learning automata (LA). 14,27 This LA provides the sensor node with the purpose of learning their state and selecting their correct state, that is, either active or sleep mode, to extend the lifetime of the WSN's network. 16 In Reference 28, a pDCDS algorithm was proposed to find a minimum number of nodes to monitor p-percent coverage in the deployed network.
The authors investigated connected p-percent problem in WSNs and devised a degree-constrained minimum-weight connected dominating set to solve the coverage problem. The authors' pDCDS algorithm used LA to select the best dominator and dominate sensor nodes to cover p-percent coverage in the networks. Reference 29 proposed a topology control protocol based on LA, which is named as LBLATC. Using distributed LA, the authors reduced the energy consumption by dynamically optimizing the transmission range while keeping the connectivity. In Reference 30, a new distributed approach using LA scheme was proposed to maximize the number of barriers and minimize the energy consumption for the border surveillance in WSNs. The authors proposed a distributed border surveillance (DBS) algorithm aimed to find the minimum possible number of nodes in each barrier to monitor the network borders. In the DBS approach, a LA was used to find the best nodes to assure barrier coverage at any moment.
In Reference 31, the authors proposed an imperialist competitive algorithm-based method called ICABC to solve the barrier coverage problems in WSN. ICABC algorithm managed to find the best sensor nodes to cover the network barriers in the deployed network. In Reference 32, the authors focused on the problem of partial coverage, targeting scenarios in which the continuous monitoring of a limited portion of the area of interest is enough. The authors presented a novel algorithm called partial coverage learning automata that relies on LA to implement sleep scheduling approaches to minimize the number of active sensors for covering the desired portion of the region of interest, preserving the connectivity among sensors. Similarly, in Reference 33, the authors proposed a greedy heuristic algorithm to deal with the coverage problem of heterogeneous WSNs in the case when the complete coverage of the network is not needed. In Reference 16, an energy-efficient scheduling algorithm based on LA for the target coverage problem was proposed. In this scheme, a sensor node selected its appropriate state (either active or sleep) with the help of LA-based technique.
A LA-based scheduling algorithm was designed in Reference 34 to find an optimal solution to the problem of target coverage, which can generate both disjoint and nondisjoint cover sets within the WSNs. In this scheme, one LA is responsible for selecting the sensor nodes that should be enabled at each point to cover all the targets. This is similar to pursuit learning. They use a dynamic threshold to determine a reward. The reward is granted whenever the size of the current set of active sensors is reduced. However, this will lead to slow convergence. As time proceeds, it will even be harder to get better solutions (lower the threshold), and the reward probability will decrease. In Reference 35, the authors extend the previous work in Reference 34 with an algorithm where they designed a pruning rule to manage critical targets in order to maximize network lifetime. A pruning rule was specifically designed to avoid the selection of redundant sensors. The authors also demonstrated that selecting the learning rate with a high accuracy leads to a longer network lifetime. A LA-based algorithm for solving coverage problem in directional sensor networks was proposed in Reference 36. The authors proposed a fully distributed algorithm based on irregular cellular LA to find a near-optimal solution for selecting the correct working path for each sensor node. Then, a centralized approximation algorithm based on distributed LA was proposed that can cover all targets with the minimum number of active sensors. The proposed algorithm is a fully distributed algorithm in which each sensor node chooses its optimal direction solely based on the directions selected and the targets covered by its adjacent sensor nodes. Different from other works, 37 tackled the issue of having maximum network lifetime with adjustable ranges (MNLARs) and proposed a heuristic MNLAR algorithm for directional sensor networks. In situations where the sensors have several directions and different sensing ranges, the authors suggested two greedy-based scheduling algorithms aimed at selecting the correct sensor directions and sensing ranges so as to meet the target coverage problem requirement and, at the same time, optimize the lifetime of the network. Similar to Reference 37, a new LA-based approach equipped with a pruning rule for maximizing network lifetime in WSNs with adjustable sensing ranges was proposed in Reference 38. Furthermore, a new genetic-based approach for maximizing network lifetime in directional sensor networks with adjustable sensing ranges was proposed in Reference 39. The authors suggested a target-oriented GA-based algorithm that could form cover sets consisting of sensors with suitable directions and sensing ranges so that all network targets could be tracked desirably. In Reference 40, the authors explored the priority-based target coverage issue where the targets have different requirements for coverage quality, and as a result, they may need to be tracked by more than one direction of sensors. In order to solve this problem, they suggested an LA-based scheduling algorithm to find a minimum subset of sensor directions to track and satisfy the consistency of coverage requirements of all targets. The authors proposed an algorithm used a pruning rule to avoid selecting redundant directions and selecting more than one direction of the same sensor for each cover set. In Reference 41, the authors explored the issue of target coverage where a subset of sensors should be allowed to track all targets while extending the lifetime of the network. Three centralized algorithms for sensors scheduling were proposed in which an LA would decide the sensors to be triggered to cover the available targets. In their proposed algorithm, the LA attempts to select the sensors that cover more targets and manage the critical targets to maximize the lifetime of the network. The list of available solutions and their proposed method are summarized in Table 1. Since LA provides the sensor node with the purpose of learning their state and selecting their correct state, that is, either active or sleep mode, to extend the lifetime of the WSN's network. Therefore, this article leverages the simplicity of LA and proposes a novel adaptive pursuit learning based energy-efficient target coverage in WSNs. Although the theory of LA has been applied before to solve the problem of area coverage, our solution enjoys some desirable designed properties compared with Reference 14. In fact, we apply LA to each of the sensors to determine each state: active or sleep. Therefore, we opt to pursue the joint action of the team LA corresponding to the best solution found so far using the concept of pursuit learning. [42][43][44] Our proposed scheme is different from the work of Reference 14 as the LA update of latter work does not track the best solution found so far. In Reference 14, the authors implemented a reward and penalty mechanism in the following manner: • Reward action sleep-if all targets of the sensor are covered. In other words, taking the opposite action, which is active, would not result in any benefit.
• Reward action activate-if the sensor in question covers at least one target exclusively. In other words, taking the opposite action, which is sleep, will result in at least one less target that cannot be covered.
In summary, the principal contributions of this article are as follows: • We leverage the simplicity of LA and propose a novel adaptive pursuit learning based energy-efficient target coverage in WSNs.
• Our proposed solution enjoys some desirable designed properties compared with existing studies. In fact, we apply LA to each of the sensors to determine each state: active or sleep. Therefore, we opt to pursue the joint action of the team LA corresponding to the best solution found so far using the concept of pursuit learning.
• In order to fasten the speed of the algorithm and increase its scalability when the number of sensors is large, we propose a clustering algorithm that divides the main coverage problem into a set of smaller independent subproblems. In this manner, we can solve the subproblems in a parallel manner.
The remainder of this article is organized as follows. In Section 3, we introduce some background concepts about the theory of LA, which is an essential component in our solution. In Section 4, we explain our proposed novel adaptive pursuit learning based energy-efficient target coverage algorithm in WSN. Theoretical convergence results of the pursuit-LA and discretized learning automata (DLA) is also explained in Section 4.
Further the clustering technique employed in this article is also explained in this section. Performance evaluation is carried out in Section 5. Finally, conclusions and future works are drawn in Section 6.

BACKGROUND
In this section, we shall review some basic concepts about the theory of LA, which is fundamental for the understanding of our article.

Learning automata
In the field of automata theory, an automaton [45][46][47][48][49] is characterized as a quintuple made out of a set of states, a set of outputs or actions, an input, a function that maps the present state and the input to the following state, and a function that maps a present state (and input) into the current output. The following definitions are widely used in the field of LA. 50 4. F(.,.):Q × B  → Q is a mapping as far as the state and input at the moment t, with the end goal that, It is known as a transition function, that is, a function that decides the condition of the automaton at any resulting time instant t + 1. This mapping can either be deterministic or stochastic.

G(.)
: is a mapping G:Q  → A, and is known as the output function. G decides the action taken by the automaton in the event that it is in a given state as If the sets Q, B, and A are all finite, the automaton is said to be finite.
The environment, E, ordinarily, alludes to the medium where the automaton functions. The Environment has all the outer variables that influence the activities of the automaton. Mathematically, an environment can be preoccupied with a triple ⟨A,C,B⟩. A, C, and B are characterized as follows: is the output set of the environment. Once more, we consider the situation when m = 2, that is, with = 0 representing a "Reward," and = 1 representing a "Penalty." 3. C = {c 1 ,c 2 , … , c r } is a set of punishment or penalty probabilities, where component c i ∈ C relates to an input activity i .
The way toward learning depends on a learning loop, including the two entities: the random environment (RE), and the LA, as depicted in Figure 1. In the learning procedure, the LA persistently communicates with the environment to process reactions to its different activities (ie, its decisions). Finally, through adequate communications, the LA endeavors to gain proficiency with the ideal action offered by the RE. The real procedure of learning is represented as a set of associations or interactions between the RE and the LA.
The automaton is offered a set of actions, and it is obliged to picking one of them. At the point when an action is chosen among the pool of actions, the environment gives out a response (t) at a time "t." The automaton is either penalized or rewarded with an obscure likelihood c i or 1 − c i , F I G U R E 1 Feedback loop of LA. LA, learning automata separately. Based on the response (t), the state of the automaton (t) is updated and another new action is picked at (t+1). The penalty probability c i satisfies: We now present a few of the significant definitions utilized in the field. P(t) is alluded to as the action probability vector, where, P(t) = [p 1 (t),

in which every component of the vector:
Given an action probability vector, P(t) at time t, the average penalty is: The average penalty for the "pure-chance" automaton is given by: As t  → ∞, if the average penalty M(t) < M 0 , in any event asymptotically, the automaton is commonly viewed as superior to the pure-chance A LA that performs better than by pure-chance is said to be expedient.
It ought to be noticed that no ideal LA exists. Marginally suboptimal performance, also termed above as -optimal execution, is what that LA researchers endeavor to accomplish.
where > 0, and can be arbitrarily small, by a reasonable choice of some parameter of the LA.

Problem formulation
Let us consider that there are a set of M targets denoted by T = {T 1 ,T 2 , … .T M } which are being monitored by a set of N sensor nodes S denoted by S = {S 1 ,S 2 , … .S N }. These two sets are deployed in a X × X area. All sensor nodes have a fixed sensing range "R." In addition, we assume that the

Adaptive learning algorithm
We have used the LA algorithm for scheduling the sensor nodes. We shall now delineate the details of our algorithm, which helps in finding the best active set of sensors that are monitoring the maximum number of targets at any given instant. As in Reference 14, the actual flow of the algorithm is divided into three phases, which include an initial phase, a learning phase, and the target monitoring phase. We shall now focus on the initial phase and the learning phase. The monitoring phase is identical to the one presented in Reference 14, and therefore, it is omitted here for the sake of brevity.

Initial phase
In this phase, each sensor node in the network is provided with LA, which helps the sensor node to select its state either to be active or sleep.
In the initial stage, both states are equally probable, that is, the probability of selecting either of the active or sleep state is initialized to be 0.5.
Here, sensor nodes are endowed with a certain level of autonomy permitting to establish communication, transfer messages, including their ID, position, and list of covered targets with their neighbor node autonomously. This phase is followed by the learning phase and the target monitoring phase.

Learning phase
In the learning phase, each of the sensor nodes is equipped with LA. At first, the node is selected randomly. Using LA, each node selects its state.
Then it broadcasts the message packet, including its all information to the rest of the sensor nodes. We attach to each of the N sensors a LA. For instance, let us consider the senor S i . The automaton's state probability vector at the node i at time Thus, p (i,j) (t) is the probability at time instant t to select an action j. In our settings, we have two actions: sleep or active. For the sake of notation, let us denote 0 as the action sleep, and 1 denote the action active.
The feedback function is a binary function which yields a reward whenever the coverage has been improved. This is denoted by |Current coverage set|< |Best coverage set| in Algorithm 1. In more simple terms, if the aggregate state of the sensors chosen by the team of N LA yields an improvement in the coverage, which means ensuring full coverage with less number of sensors, the joint action of the LA that yielded that solution is rewarded by increasing the probability of the actions which formed that particular solution.
be the best aggregate action of the team of LA so far yielding the highest coverage.
Thus, the idea of pursuit here is to reward the LA whose aggregate action is the highest possible so far, that is, till the time instant t.

Update mechanism of the continuous learning automata
We consider the LA update equations at node i. The update is given by: is the update parameter which is also called learning parameter and is time-independent. We shall also give some theoretical results for the case where is time-dependent. However, generally within the theory of LA, the most used schemes possess a learning parameter that is The informed reader would observe that the above update scheme corresponds to the linear reward-inaction LA update. 27 This pursuit paradigm is an adaptation of the PolyLA-Pursuit scheme 52 recently proposed by Goodwin and Yazidi in the context of machine learning classification problems. The pursuit LA used in this article is more suitable for solving deterministic optimization problem than classical LA used commonly in the literature to solve coverage problems that work better for stochastic optimization problems. Classical LA suppose that the environment response are stochastic, which translates in our context to a "stochastic" coverage results for a deterministic configuration. As this is not the case in our problem assumptions, meaning that the result of a configuration is deterministic in terms of coverage, it is more suitable to use our method pursuit LA. The pursuit-LA biases the search probability vector towards the optimal solution found so far. Furthermore, pursuit LA is faster and more accurate as it is by design suitable for solving deterministic optimization problems.
If j ≠ j * i (t) then p (i,j) (t + 1) is reduced by multiplying by , which is less than 1.
However, if j = j * i (t), then p (i,j) (t + 1) is increased by: In the rest of this article, we shall call the above update scheme as continuous learning automata (CLA) to differentiate it from the DLA which will be presented in Section 4.4.
The update scheme is called pursuit-LA 52 and has rules that obey the rules of the so-called linear reward-inaction LA. The idea is to always reward the transition probabilities along with the best solution obtained so far.

Theoretical convergence results of the Pursuit-LA
We will now state some theoretical results that catalogue the properties of the Pursuit-LA for both the time varying update parameter and the fixed update parameter. The proofs of theorems can be found in the Appendix.

Theorem 1.
The optimal solution is generated with probability 1 only if the update parameter t obeys the following condition: In Theorem 1, we give the convergence result of the Pursuit-LA for the case of fixed parameter that is time-dependent.
Proof. The proof of Theorem 1 is given in Appendix. ▪ Theorem 2. The optimal solution is generated with probability 1 only if the update parameter → 1.
Proof. The proof of Theorem 2 is given in Appendix. ▪ In Theorem 2, we give the convergence result of the Pursuit-LA for fixed step-size parameter .

Introducing a new variant: DLA
Within the theory of LA, there are two main families of schemes, CLA and DLA. 27 In this regard, we present here a counterpart version of the CLA introduced in Section 4.2.
where (i,j) is given by Equation (8) and where Π H denote the following projection The discretized pursuit LA possesses the principles of the Pursuit-LA and is merely a discretized version of it that operates on a discrete probability space where the probability can take only a finite number of values, given that the increase and decrease has a fixed step.
The informed reader would observe that in classical DLA scheme we rather use a resolution parameter that describes in loose terms the number of discrete values that the LA probability can take. In our case, it is easy to note that the number of discrete values is in the order of ⌈ 1 ⌉ + 1.
The convergence proofs of the DLA are similar to those of the CLA found in the appendix and therefore are omitted for the sake of brevity.

Clustering for reducing problem complexity
The number of possible configurations of N sensors in active and sleep modes is 2 N and this makes it practically impossible to find exact solutions for target coverage problems with more than few dozens of sensors as the computational time grows exponentially as a function of the number of sensors. However, in many cases it is possible to simplify the problem by splitting the sensors and their targets into smaller sets of independent clusters. For instance, if a target is covered by just a single sensor, these two objects may be separated from the rest and identified as a single cluster because the solution to the target coverage problem of this small cluster can be solved independently of the other targets and sensors. In this simplest possible case the only solution is that the sensor must be in active mode. In Figure 3, one such cluster can be seen colored in blue where the sensor S 5 covers the target T 5 . Another slightly more complex cluster is the yellow one where the sensors S 3 and S 7 cover target T 3 . Again the solution of the coverage problem can be separated from the rest, only one of these two sensors needs to be active. When a single sensor covers two or more targets, all other sensors covering one or more of these targets must also be part of the same cluster, since a solution to the coverage problem of these targets must involve all these sensors. Such a case can be seen in Figure 3 where the sensors S 1 and S 4 cover the targets T 1 and T 4 .
As sensor S 10 also covers target T 4 , this sensor must also be part of this pink cluster as its state is crucial in order to determine the optimal sensor configuration in order to cover the targets T 1 and T 4 . In this way, it is in general possible to split a complex configuration of sensors and targets into independent clusters and thereby reduce the coverage problem into subproblems which can be solved in parallel by virtue of the clustering mechanism. In the case presented in Figure 3, a target coverage problem with a solution space of 2 10 solutions has been reduced to 5 coverage problems with a solution space of 1, 2, 2, 2, and 8, respectively, for the five clusters. This enables us to find exact solutions for much larger problems than without clustering and obviously also improves the capabilities of the other algorithms when first reducing the complexity of the coverage problem at hand.
We present Algorithm 2 which finds all independent clusters for a given set of targets and sensors. Targets may in some cases be linked together in long chains leading to large clusters and it is essential that the algorithm loops through all possible targets which might be a candidate member In general the solution space of the target coverage problems discussed in this article is too large to allow exact solutions. For small sets of sensors and targets, the proposed algorithms can be precisely evaluated by comparing to brute force (BF) algorithms computing the optimal exact solution. A major advantage of the clustering method is that it is possible to compute exact solutions to problems consisting of a large number of sensors and targets by generating subsets and finding exact solutions for each of them. Then other nonexact algorithms may be evaluated by addressing the initial large set problem and comparing their results to the optimal solution known because of the clustering. Without the clustering method these approximate algorithms could just be compared with each others, without knowing how far from the exact their results were.

Distribution of the cluster sizes
The clustering operation is a novel feature in the proposed solution that can speed up any sensor coverage algorithm. In this subsection, we would like to characterize the distribution of the cluster sizes, which gives us an idea about how much gain we could achieve by using clustering. Figure 4 depicts the distribution of the number of clusters obtained for 1000 random configurations using a fixed number of targets and sensors, that is, 100 and 50, respectively, for a sensor range of 50. The sensors were deployed in a 600 by 600 grid. As seen in Figure 4, the most common case is that approximately 20 clusters are generated when applying the clustering algorithm. This means that the huge target covering problem of 50 targets and 100 sensors is typically reduced to 20 small problems containing just a handful of targets and sensors. Of course, if the sensor range is larger, the number of clusters will be smaller and the method will not be as efficient.
The distribution of the number of clusters generated for 1000 random deployments of 100 sensors and 50 targets F I G U R E 5 Cluster size distribution of all clusters created for 1000 random deployments of 100 sensors and 50 targets F I G U R E 6 Dependence of cluster size on the sensor range. For each sensor range 1000 deployments of 100 sensors and 50 targets has been carried out. The error bars indicate the standard deviation of the distribution of cluster sizes Figure 5 shows the distribution of cluster sizes for all the clusters generated in the same 1000 experiments of random deployment of 100 sensors and 50 targets. Most of the clusters are seen to be quite small; almost all of them are less than 20. The largest cluster contains 32 sensors and is a bit beyond the size, which can be solved using a BF algorithm. However, in most cases, the size is less than 20, making it possible to reduce the large coverage problem to many smaller problems that can be solved easily.
The size of the clusters generated when applying the clustering algorithm to deployment of sensors and targets, strongly depends on the sensor range. If the range is large, a single sensor will cover many targets, and this will lead to more entanglement and larger clusters. In Figure 6, the dependence on sensor range when generating the clusters is shown. For each of the sensor ranges shown in the figure, a thousand deployments of sensors were carried out, and clustering applied. As the sensor range is increased, the average number of clusters generated is reduced from almost 50 to close to 1. When the sensor range reaches 100, all sensors and targets are entangled, and this leads to a single cluster, meaning that it is not possible to reduce the complexity of the coverage problem. When solving a large coverage problem that has been reduced to many small problems in the form of clusters, the size of the largest of the clusters determines how hard it is to solve the entire problem. When the size of the largest cluster is 20 or less, the coverage problem can be solved using BF, computing all possible combinations of sensor activation. The green line shows how the size of the largest cluster depends on the sensor range. For a sensor range of 50, the average size of the largest cluster is 20. In the case of a cluster of 20 sensors, a BF solution is possible, and this will also be the case for smaller sensor ranges. As the sensor range increases, the error bars show that the standard deviation of the size of the largest cluster becomes quite large, and the same is the case for the average cluster size of the 1000 experiments. This makes sense as for small sensor ranges; clustering will systematically lead to many small clusters. For a very large sensor range, the result will mostly be just a single cluster and less variance. In between, the clustering will lead to both small and large clusters and a large variance in the cluster size.
Doing similar clustering experiments, one can establish which coverage problems can be addressed for a given number of sensors and targets and to what extent the coverage problems can be reduced to smaller and tractable problems.

BF algorithm
The BF algorithm is the process of finding the best solution by trying every possibility and selecting the one which gives the best result. It is a straightforward method to find the optimal result. For example, suppose we have four sensor nodes and five targets that need to be covered.
Our goal is to find the minimum number of sensors that covers all five targets. In the case of the BF algorithm, the total number of possible solutions is given by 2 n where n represents the input size. In our example, we have four sensors, which means n=4. This implies that we have 2 4 , that is, 16 solutions ranging from 0000, 0001, 0010 … to 1111. Here, 0 represents that the sensor is in off state, and 1 represents that the sensor is active. By trying every solution, the BF algorithm finds the best solution that gives the minimum number of active sensors to cover all the targets.
From this example, we see that the total number of possible solutions is dependent on the input size. Since the number of possible solutions grows exponentially as a function of the input size, it cannot be used to solve a complex problem involving a large number of sensors as the number of alternatives gets too large for any computer to handle. To see the execution time of the BF algorithm, we have conducted 50 experiments for a number of sensors between 5 and 30, recorded the time to find the optimal solution, and the average results are presented in Figure 7. The shadowed area of the graph corresponds to a 95% confidence interval. Figure 7 shows that the BF execution time is very short for a small number of sensors. For instance, for five sensors, it takes only 10 −4 seconds to find the optimal result.
For 18 sensors the algorithm spends 1 second, while for 30 sensors it takes slightly less than 10 4 seconds, which is almost 10 8 times larger than for five sensors. Most importantly, from Figure 7, we observe that the execution time of the BF algorithm has an exponential relationship with the number of sensors, as the logarithm of the execution time can be seen to be close to linear as a function of the number of sensors.

EXPERIMENTS
In this section, we present our experimental results that demonstrate the effectiveness of our approach. The simulations were performed using a customized simulation environment implemented in Python. For the sake of clarity, we summarize all the abbreviations in Table 2. In all the experiments, we use a grid size of 600 × 600.

Accuracy of the algorithm
In this section, we compare the accuracy of the cluster-based -CLA algorithm with its counterpart solution CLA (which does not involve clustering) and with the optimal solution obtained with the brute force algorithm (-BF). We report two major experiments, a small scale experiment with a maximum of 100 sensors and a large-scale experiment with a maximum of 300 sensors.

Small scale experiments
In these experiments, the learning parameters are set to = 0.9999 and = 0.01 while the sensor range is set to 50. We vary the number of sensors by a step size of 10 from 10 sensors to 100 sensors and choose the number of targets to be half the number of sensors. To obtain reliable results, we test 30 different configurations at random for each step size of sensor pools.
Our proposed -CLA algorithm yields an optimal number of active sensors even as the size of the sensor pool increases to 100, as seen in The difference of accuracy between -CLA and CLA is seen in Figure 9 to increase from zero for 30 sensors to a maximum of 10% at 100 sensors.
For this number of sensors, the accuracy of the CLA algorithm is around 90% while the accuracy of -CLA is a 100%, the performance gap being 10%.

Large-scale experiments
Although the -CLA algorithm achieves optimal performance in the previous experiment when the maximum number of sensors is 100, we cannot claim that it is an optimal heuristic. In this experiment, by resorting to a number of sensors beyond 100 sensors, we increase the scale of the experiment and measure its performance. Using 100, 200, and 300 sensors with respective ranges 50, 40, and 30, we compare the -CLA, CLA, and -BF algorithms as shown in Figure 10. For these relatively small sensor ranges it is still possible to find exact solutions using the -BF algorithm. For each setup, the experiments were done 100 times. To gain a better insight into the CLA's performance, we report both the CLA's average performance as well as the best-achieved result characterized by MIN-CLA, that is, the minimum number of sensors among the 100 runs obtained. Interestingly, the -CLA algorithm remains ideal and have an accuracy that is identical to that of the -BF. The performance of the -BF algorithm corresponding to a blue line is identical to the performance of -CLA presented by a purple color line and hence covered by the latter line. CLA performance tends to decline, however, and even MIN-CLA's best-case scenario for CLA is far from optimal. For instance, when the number of sensors is 300, the CLA

Effect of the parameter on the performance in terms of accuracy and execution time
In these experiments, we test the effect of the learning parameter on the convergence time and accuracy of our LA algorithms. We use different values of in this experiment ranging from = 0.9, = 0.99, = 0.999, to = 0.9999. The sensor pool size is varied by a step size of 10 from 10 to 100 sensors. The range of the sensors is kept constant at 50, and the target numbers are chosen to be half the number of sensors. For each configuration, 100 experiments are performed. Figure 11 shows the average number of active sensors for different values of for the -BF, CLA, and -CLA algorithms. Please note that the average number of active sensors obtained from 1000 experiments is each number in the graphs 1 .
As seen in Figure 11, the -BF algorithm offers an average number of active sensors as the optimal solution for all targets, whereas for = 0.99, our proposed -CLA provides 18.8 as an average number of active sensors which is only 1% higher than the optimal solution, whereas CLA achieves 25.6 which is about 37.6% higher than the optimal solution. However, for both = 0.999 and = 0.9999, -CLA offers an average of 18.6 sensors, which is the optimal performance, while CLA gives 21.2 and 19.4, respectively. Based on this, we can conclude that for the -CLA algorithm, = 0.999 is a small enough value to achieve 100% accuracy.   Similarly, Let t 1 be the minimum number of iteration to reach 1 − from 0.5, which can be found by solving the following equation.

Network life-time
Network life-time is always considered as one of the most important parameters in WSN applications because of use of small batteries. Recharging or replacing batteries is not always feasible in a hostile environment or network with a large number of nodes, and so on. 11, 53 We define each sensor to have a battery capacity of 1 time unit and the network life-time to be the number of time units over which the sensor network can cover all of its targets. 54 Prolonging the network life-time in this case reduces to maximizing the number of disjunctive groups of sensors, meaning with no common element, that each covers all the targets. 14 In this article, we adopt a "peeling-off" procedure to find those disjunctive groups and consequently compute the life-time. In simple terms, we use our pursuit-LA solution to find a group of sensors to be active for 1 time unit, then those sensors are removed and the network life-time is increased by one. Afterward, the procedure continues by "peeling-off" the group for which the battery was exhausted, and proceeding further to the next group. The procedure continues until we reach a point that we cannot ensure full coverage with the remaining sensors that are left. Therefore, to check the efficiency of our proposed -CLA algorithm in terms of network life-time, we conducted an experiment where we varied the number of sensors starting from 100 with step size 50 up to 500.
In each case 50 experiments were performed and the sensor range and the number of targest were set to 50 as fixed parameters throughout the experiments. Figure 14 clearly shows that when the number of sensors is 100, the difference between the average network life-time of the two algorithms is quite small, but as the number of sensors increases, the gap between the two network life-times also becomes larger. For 500 sensors, the network life-time computed by our proposed -CLA algorithm is almost 10, whereas for the CLA algorithm the result is nearly 6. This means that our proposed -CLA algorithm on average adds four more units of a life-time to the network. Another difference to observe from the experiments is the variation of the computed life-times. The shadowed area in Figure 14 represents a 95% confidence interval and the variation of the results of our proposed -CLA algorithm is smaller than for the results of the CLA algorithm.

Comparison of CLA and DLA approach
In this section, we compare both -CLA and its counter-part -DLA where -DLA stands for Cluster-based DLA. In order to allow a fair comparison of both algorithms, we follow the same experimental approach for comparing learning algorithms as in Reference 55 and determine the respective tuning parameter for -CLA and -DLA that yields 99% of the time an optimal solution over a set of 100 experiments. Then we report the number of iterations to achieve this result. As we see in Section 5.2, the value of plays an important role in finding optimal results, so we have created an automated system that increases or decreases the value of to find out the accuracy of 99% using experimental procedure. 55 Table 4 represents the results of the CLA and DLA experiments.
From  In this section, we compare our work to the approach presented by Mostafaei and Meybodi. 14 We call the latter algorithm MM-LA, where the acronym MM stands for Mostafaei and Meybodi. We resort to a number of sensors ranging from 10 to 50 (with steps of 10) for this experiment, and we let the number of targets be half the corresponding sensor pool size. Ten different random deployment of sensors and targets are used to obtain statistically reliable results. See Figures 15 and 16 for the average results.
According to Figure 15, the execution time of the proposed -CLA algorithm is always lower than MM-LA for = 0.9 and = 0.945 using different pool sizes of sensors. Likewise, Figure 16 illustrates the number of iterations taken to converge the probability of sensors to 1 or 0. Interestingly our proposed algorithms are also much better than MM-LA in this experiment. We note that MM-LA requires about 640 and 13 300 iterations for = 0.9 and = 0.945, respectively, for a sensor pool size of 50, whereas -CLA only needs about 85 and 135 iterations per cluster, respectively.
Furthermore, in Figure 17

CONCLUSION AND FUTURE WORKS
In this article, we focused on solving the problem of target coverage in WSNs in a better way and proposed the -CLA algorithm, a novel-solution based on a cluster of sensors and targets using LA to help the sensor determine whether to be active or to sleep autonomously using the LA pursuit concept. Not only does our proposed algorithm find the optimal solution for covering all targets, it also takes minimal execution time compared with existing solutions.
The proposed -CLA algorithm can provide 100% accuracy, based on the result presented in Section 5. Similarly, in different situations, the proposed -CLA algorithm is very fast in terms of execution time including small to high sensor pool size and different lambda values. Interestingly, -CLA needs a fewer number of iterations and spends less time to determine the optimal results compared with other existing 14 approaches.
Through our experimental testing, it is evident that our proposed adaptive learning algorithm is a feasible and viable approach for the sensor nodes to select their appropriate "sleep" and "active" state autonomously for energy-efficient target coverage in the WSNs.
For future work, it would be interesting to apply our approach for the sensors connectivity problem with mobile targets. In addition, the application of game theory to the existing approach would be an interesting research direction.

ACKNOWLEDGMENT
A preliminary version of this article was presented in Reference 56.

APPENDIX
Proof of Theorem 1. The proof follows similar arguments as in Reference 57. Using recurrence, we can obtain a lower bound on p (i,j) (t): Let p min (0) > 0 a lower bound on p (i,j) (0).
Let A t = {C(t) ≠ C*} the event that at iteration t, the candidate solution does not contain the optimal solution C*.
Let B T the event that optimal solution is not found up to instant T.