### SUMMARY

- Top of page
- SUMMARY
- 1 INTRODUCTION
- 2 RELATED WORKS
- 3 EVOLUTIONARY ALGORITHMS
- 4 MULTI-OBJECTIVE TWO-NESTED GENETIC ALGORITHM
- 5 SIMULATION RESULTS
- 6 CONCLUSIONS
- REFERENCES
- Biographies

In this study, an optimal method of clustering homogeneous wireless sensor networks using a multi-objective two-nested genetic algorithm is presented. The top level algorithm is a multi-objective genetic algorithm (GA) whose goal is to obtain clustering schemes in which the network lifetime is optimized for different delay values. The low level GA is used in each cluster in order to get the most efficient topology for data transmission from sensor nodes to the cluster head. The presented clustering method is not restrictive, whereas existing intelligent clustering methods impose certain conditions such as performing two-tiered clustering. A random deployed model is used to demonstrate the efficiency of the proposed algorithm. In addition, a comparison is made between the presented algorithm other GA-based clustering methods and the Low Energy Adaptive Clustering Hierarchy protocol. The results obtained indicate that using the proposed method, the network's lifetime would be extended much more than it would be when using the other methods. Copyright © 2011 John Wiley & Sons, Ltd.

### 1 INTRODUCTION

- Top of page
- SUMMARY
- 1 INTRODUCTION
- 2 RELATED WORKS
- 3 EVOLUTIONARY ALGORITHMS
- 4 MULTI-OBJECTIVE TWO-NESTED GENETIC ALGORITHM
- 5 SIMULATION RESULTS
- 6 CONCLUSIONS
- REFERENCES
- Biographies

Military and civilian applications of wireless sensor networks are expanding rapidly. In a wireless sensor network (WSN) a large number of small wireless sensor devices are usually deployed into a particular field to perform certain tasks of sensing or tracking and return the information back to a data sink [1]. There are several challenges in designing WSNs because the sensor nodes have limited resources of energy, processing power and memory.

Several energy-efficient routing protocols have been proposed in the literature to deal with the limited battery life of sensor nodes in order to increase the network lifetime. Al-Karaki and Kamal in [2] classified routing protocols based on network structure into flat, hierarchical and location-based ones. All nodes of flat networks are assigned the same role. When a node needs to send data, it may find a route to the data sink. In location-based protocols, the geographic information of nodes is used for relaying data.

In hierarchical routing protocols, an attempt is made to conserve energy by means of arranging sensor nodes in clusters. Each cluster has one head node called cluster head (CH). Other nodes send their data to the CH. This is usually performed based on time division multiple access to avoid collisions. The CH aggregates all the received data from the nodes of that cluster and forwards it to the data sink to which all data is to be reported. One may cite the Low Energy Adaptive Clustering Hierarchy (LEACH) [3], Power-Efficient GAthering in Sensor Information Systems (PEGASIS) [4], Threshold sensitive Energy Efficient sensor Network (TEEN) and AdaPtive TEEN [5] as a few well-known examples of hierarchical routing protocols.

In LEACH, the sensor nodes form clusters based on the received signal strength. CHs are used as routers to the sink. Random rotation of CHs is used to prevent the drainage of the battery of any given node. Using a very simple randomization procedure, each node chooses a random number between 0 and 1. If the number is less than a given threshold — indicating the percentage of nodes expected to act as CHs — the node would introduce itself as CH to which other nodes in that cluster would directly send their data. Otherwise, the node should join the nearest CH to form a cluster. However, if none of the CHs are in the radio range of the node, it would introduce itself as a CH [3].

Nodes in PEGASIS form chains. Each node in the chain only communicates with a neighbor and only one node is selected from that chain to transmit data to the sink. The difference between LEACH and PEGASIS is that in contrast to LEACH, PEGASIS uses multihop routing by forming chains and selecting only one node instead of multiple nodes to transmit to the data sink. Results of NS-2 simulations indicate that PEGASIS performed approximately 200% better than LEACH for a 50 m ×50 m network and 300% better for a 100 m ×100 m network [4].

In TEEN and APTEEN the sensor network architectures are based on a hierarchical clustering where closer nodes form clusters and this process goes on in the second level until the data sink is reached [5].

Arranging sensor nodes in clusters in such a way that the network parameters, such as lifetime and delay, are optimized is an optimization problem. Several heuristic methods, for example, genetic algorithm [6] and ant colony optimization method [7], are used in the literature to solve this problem.

Genetic algorithm is a population-based optimization algorithm that uses random search to propose the best possible design. We use this algorithm in order to get the most efficient clustering scheme. The reason for choosing this algorithm is its convergence and its flexibility in solving multi-objective optimization problems.

Conventional multi-objective algorithms use weighting methods to get just one optimum answer. However, using population-based multi-objective optimization algorithms, a set of optimal answers generally denoted as pareto-optimal [8] could be reached from which one answer is chosen.

In this paper, Strength Pareto Evolutionary Algorithm (SPEA), presented in [8], is used to get the optimum clustering schemes in WSNs. Our objective functions are lifetime: expressed as the time to which the first node of the network fails to work because of battery exhaustion and the network's delay expressed as the maximum number of hops during data transmission from the nodes of the network to the data sink.

The rest of the paper is organized as follows: Related works are discussed in Section 2. Section 3 describes evolutionary algorithms The proposed multi-objective two-nested genetic algorithm (GA) is discussed in Section 4. The results of simulation and comparison between the proposed method and the other methods are presented in Section 5. Finally, the paper is concluded in Section 6.

### 2 RELATED WORKS

- Top of page
- SUMMARY
- 1 INTRODUCTION
- 2 RELATED WORKS
- 3 EVOLUTIONARY ALGORITHMS
- 4 MULTI-OBJECTIVE TWO-NESTED GENETIC ALGORITHM
- 5 SIMULATION RESULTS
- 6 CONCLUSIONS
- REFERENCES
- Biographies

There are several works suggesting the use of intelligent algorithms in WSNs in order to extend the network lifetime. Most of these algorithms use two-tiered clustering methods in which each node in each cluster sends its data directly to the head node of that cluster [9-14]. A hierarchical cluster-based routing protocol is proposed by Hussain and Martin in [9] and [10]. In this protocol, nodes self-organize into clusters. Each cluster is managed by a set of associates called head set. Each associate acts as the head of the cluster using a round-robin technique. In each cluster, the nodes send their data directly to the CH, which transmits the aggregated data to the data sink. In addition, energy-efficient clusters which are identified using a heuristics-based approach are retained for a longer period of time. Hussain *et al.* in [11] improved the hierarchical cluster-based routing protocol using GA. In their work, the data sink performed the simultaneous optimization of total dissipated transmission energy and delay by clustering the nodes.

Jin *et al.* in [12] proposed another GA-based clustering method in which the GA minimizes the total dissipated energy and also allows the formation of predefined clusters to reduce the total communication distance. Their results showed that the optimal ratio of CHs to the number of nodes in the network was about 10%. Also the formation of predefined clusters reduced the communication distance by 80% in comparison with the direct distance.

Hussain *et al.* in [13] improved on their proposed idea in [11] by improving the fitness function used for GA. They have shown that their method conserves somewhat more energy than the GA-based clustering method proposed by Jin *et al.* in [12]. Hussain *et al.* [13] compared their proposed method with the LEACH protocol in different layouts. They showed that it saves more energy than LEACH, especially with an increase in the number of nodes in the network.

Seo *et al.* in [14] presented an interesting GA-based algorithm for efficient clustering of sensor nodes. In their algorithm called LA2–GA, the chromosomes are considered as two-dimensional matrices that contain information about the relative positions of the nodes. However, the algorithm is limited to a two-tiered clustering scheme.

In this paper an optimal method of clustering homogeneous sensor nodes using a multi-objective genetic algorithm is proposed to maximize the network lifetime. Each chromosome defines the head of each cluster (CH) to which cluster each node belongs. The most efficient topology for sending data from the nodes to the CH in each cluster is obtained using another GA similar to that presented by [15]. The second GA, in contrast to the first, could be considered as the lower level GA whose task is to compute the fitness values of the chromosomes of the top level GA.

Because the role of the CH is energy consuming, this burden is rotated among nodes by the clustering algorithm to be executed periodically. At each round of execution of the clustering algorithm, the nodes that have previously acted as CHs are not selected again as CH in order to balance the energy consumption between the nodes.

The advantage of the proposed algorithm, in addition to its multi-objective nature, is its generality. The GA-based clustering methods already proposed assume that nodes of each cluster send their data directly to the CH. This may not be the most energy-efficient topology in each cluster. Heinzelman *et al.* in [3] showed that when transmission energy is nearly the same as the received energy, which occurs when transmission distance is short and/or when the radio electronics energy (the energy consumed in the electronic circuitry) is high direct transmission is more energy-efficient. Thus the most energy-efficient protocol depends on network topology and radio parameters of the system. In this paper, it is assumed that the energy consumption of the nodes during transmission is much more than the radio electronics energy.

The proposed clustering method is centralized, that is, clustering is carried out by the data sink. We assume that the sensor nodes are homogeneous, that is, all nodes have similar processing and energy resources. It is also assumed that the data sink has complete knowledge about every node's geographic location in the network. Such information can be obtained by any of the various localization techniques already proposed in the literature (e.g. see [16-19]). It is also assumed that the network is static, that is, the nodes have no mobility.

### 4 MULTI-OBJECTIVE TWO-NESTED GENETIC ALGORITHM

- Top of page
- SUMMARY
- 1 INTRODUCTION
- 2 RELATED WORKS
- 3 EVOLUTIONARY ALGORITHMS
- 4 MULTI-OBJECTIVE TWO-NESTED GENETIC ALGORITHM
- 5 SIMULATION RESULTS
- 6 CONCLUSIONS
- REFERENCES
- Biographies

In this section, a new efficient method of clustering the sensor nodes is presented The presented algorithm tries to get clustering schemes in which energy consumption and delay are optimized. To get this, an implementation of SPEA as a multi-objective optimization method is used at the top level. In addition, another GA is used to get the most efficient topology of data transmission from the nodes to their CHs. Because two nested GAs are used, we call the algorithm the multi-objective two-nested genetic algorithm or M2NGA.

The M2NGA is a centralized algorithm, that is, it is carried out by the base station (data sink). The flowchart of the algorithm is shown in Figure 1. Here, M2NGA is discussed for two-level clustering of nodes. However, it is easily extendable to multilevel clustering where the CHs are also divided into clusters. In this case, M2NGA should be executed twice and more if more levels of clustering are desired. The following subsections discuss M2NGA implementation in detail.

#### 4.1 Chromosome coding

At the top level genetic algorithm, all the chromosomes are represented by an *m*×2 matrix in which *m* is the number of nodes (Figure 2). The number in the first column of row *i* states which cluster the node *i* belongs to. The elements of the second column of the chromosome are selected randomly from {0, 1} where ‘1’ states that the node *i* is the head node of the cluster to which it belongs, that is, it acts as the CH. Being ‘’ means that the node *i* acts as a normal node and is not the CH. Chromosomes are generated such that there is only one CH in any cluster. As an example, node 3 in the chromosome shown in Figure 2 belongs to cluster 2 and is a normal node.

#### 4.2 Population initialization

For any given cluster, if there is a node such that none of the other nodes in that cluster are in its radio range, then this cluster is infeasible. The chromosomes in the initial population are generated such that there are no infeasible clusters. To generate the initial population, a node is selected randomly. This node is considered as the CH and all its in-range nodes join the cluster. We have used 50 chromosomes in the population in this study.

#### 4.3 Fitness assignment

Our goal is to arrange the wireless sensor nodes in clusters such that the network lifetime and delay are simultaneously optimized. We use the time at which the first node of the network fails to work properly from battery exhaustion as the definition of the network lifetime. This definition is commonly used in the literature. On the basis of this definition, M2NGA finds the clustering schemes where the energy consumption in the most energy-consuming node is minimized in order to extend the network lifetime.

After proposing cluster schemes by the chromosomes in the population, their fitness values are evaluated using another GA. This internal GA attempts to get the most efficient topology of data transmission from the nodes to their CH in each cluster.

The chromosome coding in this GA is depicted in Figure 3 as an example. Each chromosome consists of *n*–1 parts, where n is the number of nodes in the cluster. Node 1 is the CH and the others are the member nodes.

The fitness assignment of each chromosome in the internal GA is as follows: First, the energy consumption of the cluster nodes for sending one data bit to the CH and also their delay in terms of number of hops is computed. Second, the energy consumption in infeasible clusters or clusters with at least one isolated node, that is, a node with no other cluster node in its radio range is assigned to be zero. Then, the fitness value for each chromosome that suggests a feasible topology is computed as

where *Delay*_{i} is the vector of the delays of data transmission of the cluster nodes in term of hops, *Energy*_{pop} is the vector of maximum energy consumption of the nodes in the clustering schemes offered by the chromosomes of the population and *Energy*_{i} is the vector of energy consumption in the cluster nodes of the chromosome *i*. Exponential function is used to sharply lessen the weight of the chromosomes whose delays are considerable.

The energy consumption of the sensor nodes is computed in two steps. First, the energy consumption in the cluster nodes is computed by the internal GA. The internal GA attempts to suggest a topology in which the maximum energy consumption among the cluster nodes is minimized and hence the network lifetime is maximized. It also considers delay by lowering the fitness of chromosomes whose delay is not acceptable.

At the next step, the energy consumption in the CHs is computed by the same GA. In this step, the internal GA considers the CHs as nodes whose aim is relaying their aggregated data to the data sink in an energy-efficient manner with the least delay, hence the optimum intercluster route is obtained.

The firstorder radio model presented in [3] is used for calculating consumed energy. In this model, the transmission energy for transmitting *k* bits of information from node *i* to node *j*, *E*_{T i j} is calculated by Equation (1).

- (1)

Also, the consumed energy for receiving *k* bits of information, *E*_{R}, is given by Equation (2)

- (2)

where *E*_{elec} is the energy consumed in the electronic circuit *ϵ*_{amp} is the energy consumed by the amplifier of the transmitter and *d* is the distance between node *i* and node *j*.

The output of the internal GA contains the cluster delay and a vector including the energy consumption of each node of the network for transmission of one bit of data. The cluster delay is considered as the maximum hops of data transmission from the cluster nodes to the CH. Similarly, the network delay is considered as the maximum hops of data transmission from the network nodes to the data sink. Thus, M2NGA uses the delay and the maximum of energy consumptions to compute the fitness of the chromosomes. Computation of fitness values of the chromosomes is performed as discussed in SPEA in [8].

#### 4.4 Elitism

The nondominated chromosomes are directly transferred to the next generation. As the number of nondominated chromosomes is limited, there is no need for the clustering method proposed in SPEA.

#### 4.5 Selection

Selection of chromosomes is carried out using the ranking selection method [6, 24] to allow the chromosomes with lower fitness values to have the chance to be selected, whereas in the Roulette Wheel method the search space is almost limited to the chromosomes with the highest fitness values.

#### 4.6 Crossover and mutation

Singlepoint crossover is used to produce new children from parent chromosomes [24]. Crossover is carried out with a probability of less than one. If crossover occurs a random number between 1 and m-1 is selected. Then, the two selected parent chromosomes are cut at that point and their data are exchanged. After exchanging data, the chromosomes are modified so that there is just one CH in each cluster of chromosomes. An example of crossover is depicted in Figure 4 where a random crossover index is chosen. Then the two parents exchange their data in that point and finally the two chromosomes are modified.

For mutation, a node is selected randomly and is assigned to another random cluster. The probabilities of crossover and mutation used in the proposed algorithm are 0.9 and 0.1, respectively. These values are used to extend the search space sufficiently.

### 5 SIMULATION RESULTS

- Top of page
- SUMMARY
- 1 INTRODUCTION
- 2 RELATED WORKS
- 3 EVOLUTIONARY ALGORITHMS
- 4 MULTI-OBJECTIVE TWO-NESTED GENETIC ALGORITHM
- 5 SIMULATION RESULTS
- 6 CONCLUSIONS
- REFERENCES
- Biographies

A random deployment of 40 sensor nodes in a 100 m ×100 m field (test model 1) as shown in Figure 5 is used for simulation. The radio range of each sensor node is assumed to be 6 m. The parameters of the radio model are considered such that the nodes consume much more energy during transmission than the energy consumed in the electronic circuitry (i.e. radio electronics energy consumption).

The bit rate of all nodes is assumed to be the same. Optimization is carried out for sending one bit of data from all nodes to the data sink. The termination condition used is 3000 iterations of GA. The parameters of the first order radio model used are as follows:

The nondominated chromosomes are transferred to the next generation together with their fitness values. Moreover, some of the parent chromosomes will be directly transferred to the next generation and will remain unchanged because the probabilities of crossover and mutation are less than one. Fitness values of these chromosomes are also transferred to the next generation to avoid their recalculation. These techniques considerably improve the speed of the genetic algorithm.

The algorithm suggests that the clustering scheme depicted in Figure 6 is the most energy-efficient one wherein every cluster is distinguished with a different mark and the CHs are identified with a circle around them.

Figure 7 shows how the optimal Pareto converges in M2NGA to the best solutions. In this special example, the Pareto-optimal contains just one solution, that is, both objective functions are minimized. In general, the Pareto-optimal contains several solutions. Then depending on the application, one of the solutions should be selected as the optimal clustering scheme.

M2NGA is a very general clustering scheme for wireless sensor nodes. The GA-based clustering methods already presented are not as general as the proposed algorithm. For example, [11, 12, 14] present a GA-based clustering algorithm for two-tiered sensor networks. However, their drawback is that the energy of the clusters may not be optimal because two-tiered clustering is just a special case of clustering. Another important advantage of M2NGA to the other methods is its multi-objective solution.

Hussain *et al.* [12] showed that their GA-based two-tiered clustering algorithm performs better than existing protocols such as LEACH. Different network layouts with different number of nodes are used in order to compare M2NGA with already GA-based clustering methods. A comparison between M2NGA, the two-tiered GA-based clustering method, LA2D-GA and the LEACH protocol is presented in Figure 8 and Table 1 where the maximum amount of dissipated energy for sending one bit from the nodes to the data sink in the four methods is compared. In this comparison, the minimum energy dissipation among the Pareto-optimal solutions is chosen for M2NGA. The goal of the GA in two-tiered clustering method is to deliver a clustering scheme in which total consumed energy for sending data from nodes to the data sink is minimized. However, the nodes of each cluster send their data directly to the head of that cluster in this method. The average energy per bit transmission of ten rounds of the LEACH protocol is used for comparison. In addition, the percentage of nodes expected to act as the CHs in the simulation of the LEACH protocol is considered to be 10%.

Table 1. Comparison of M2NGA with different clustering methods in terms of maximum amount of dissipated energyfor sending one bit from the nodes to the data sink. | The number of nodes |
---|

20 | 30 | 40 | 50 |
---|

LEACH | 2.565e-6 | 3.145e-6 | 1.158e-5 | 2.196e-5 |

GA-based | 1.461e-6 | 2.205e-6 | 2.429e-6 | 3.381e-6 |

LA2D-GA | 1.487e-6 | 2.116e-6 | 2.278e-6 | 3.612e-6 |

M2NGA | 1.461e-6 | 1.346e-6 | 1.355e-6 | 3.0066e-6 |

Minimum proportional improvement | 0% | 36.39% | 40.52% | 11.07% |

It is shown that M2NGA consumes much less energy per bit than the other methods. The last row of Table 1 shows the minimum proportional improvement of energy efficiency obtained using M2NGA compared with the other methods. As can be seen, no improvement is reached in the 20-node network. The reason is that M2NGA suggests two-tiered clustering in networks with a small number of nodes and its solution is the same as the two-tiered GA-based clustering method. However, in networks with a large number of nodes, multihop data transmission would improve energy efficiency, but the amount of improvement depends on network layout.

Here, energy per bit is used as a metric for a comparison of network lifetime in different methods. Both distance transmission and transmission via multihop result in throughput reduction [25], [26] ,p. 323, and [27] ,p. 116. Therefore, transmission via multihop could be used to counteract the effect of reduction of throughput because of increased distance. In addition, techniques such as cross-layer design [19, 28] could be used to overcome the throughput reduction problem in multihop transmission. Therefore, energy consumption per bit is a suitable metric for network lifetime comparison.

It should be mentioned that because the proposed method uses twonested genetic algorithms to maximize the lifetime of the network, the data sink needs much more time depending on the number of generations and the populations of GA than the two-tiered GA-based clustering. However, this is not a drawback for the algorithm because the data sink is supposed to be connected to an unlimited power supply. In addition, receiving data from the sensors in the data sink can be carried out in parallel with executing the clustering algorithm to form an efficient clustering scheme for the next round. Although running time is important for any algorithm, the issue of complexity is not critical in M2NGA because in many applications of sensor networks, the clustering scheme is changed once every few days. The execution time of M2NGA for different number of nodes is shown in Figure 9. It is shown that the execution time for 50 nodes is less than that for 40 nodes. This is because of the algorithm's dependence on network layout and the flexible nature of GA.

To obtain a good or at least a suboptimal clustering scheme in networks with a large number of nodes, we can modify population size and number of generations for the algorithm to yield a solution in a reasonable time. The degree of superiority of the clustering scheme proposed by M2NGA over the other methods depends on the network layout and the algorithm parameters such as population size and the number of generations. Thus, M2NGA outperforms conventional clustering methods even with increased number of nodes.

After the clustering scheme is defined in the data sink, it is sent to nodes via messages. The messages inform the nodes whether or not they are chosen as CH. If the node is chosen as CH, the message should contain the cluster's member nodes. Also, the nodes which are defined as the cluster member should be informed to which node their data should be sent. The overhead imposed on the network is these messages, which should be sent through the network only each time the clustering scheme is changed.

In WSNs, sensor nodes are usually switched periodically between sleep and active modes in order to save energy. To perform switching in multihop clusters, a time division multiple access mechanism like that used in 802.15.4 [29] ,pp.139–142 could be adopted. In this mechanism, a beacon message is sent from the CH to the cluster nodes for synchronization whenever sensors become active. Then, each sensor sends its data in its allocated time slot to avoid interference.

The simulations were performed in a MATLAB (Mathworks, Natic MA, USA) environment. The results of the simulations show that we could save much more energy compared with the other methods.