On hybrid energy utilization for harvesting base station in 5G networks

In this paper, hybrid energy utilization was studied for the base station in a 5G network. To minimize AC power usage from the hybrid energy system and minimize solar energy waste, a Markov decision process (MDP) model was proposed for packet transmission in two practical scenarios. In this model, one buffer is transmitted with a given number of data packets over finite transmission time intervals. In the two scenarios of packet transmission, solar energy harvesting (SEH) is stored in the first scene, while the other scene uses the energy immediately. The MDP model determines the best actions and decisions for both scenarios. When considering a finite battery size and finite packet buffer, the MDP model defines the actions, states, state transition probabilities, and cost value for each action. In the first scenario, the received solar energy harvester will not be used if it is not enough to transmit the packets in the buffer. In the second scenario, the received solar energy harvester will be used immediately. In both scenarios, the cost value is the weighted sum of AC power and SEH wastage. The simulation results showed the optimum buffer sizes could be determined for the balance between AC power consumption and SEH wastage based on the cost value of the proposed model. Finally, the numerical results indicated that the proposed MDP model could reduce AC power usage by up to 50%.


| INTRODUCTION
One often-mentioned report 1 emphasis that 2% of the human CO2 footprint is caused by the information and communication technology. More exactly, 0.5% of the global energy supply is consumed by mobile communication networks. 2 Obviously, the aforementioned statistics testify to the rapid growth of energy demand. Thus, it is important to improve networks efficiency, particularly energy consumption optimization in the cellular communication industry. 3 Such energy optimization delivers tremendous business benefits for mobile operators and contributes to resolve the bountiful energy and environmental issues. This energy optimization will be supported not only by mobile operators but also by devoted social programs targeting energy issues.
In recent years, renewable energy sources in supplying cellular networks are blooming rapidly to decrease carbon footprints. 4 The renewable energy sources come from diverse resources such as wind, solar, and hydro. Besides, utilizing renewable energy sources in supplying cellular base

| Related works
Various studies are now concentrating on green communication utilization in point-to-point communication systems. [15][16][17][18] The first utilization approach focuses on solar energy usage efficiency. 15,16 In energy harvesting wireless sensor networks, 15 a resource allocation with minimum matched marginal increment method was proposed. In, 16 an expectation-maximization algorithm was proposed to maximize harvested energy utilization in parallel with an MDP. A second approach, which focuses on the hybrid energy solutions, has been studied. 17,18 The optimal and suboptimal solutions were obtained for minimizing constant energy usage, and a dynamic programing method was considered. 17 In, 18 a discretetime Markov chain was used to analyse the average packet drop probability (PDP). This solution does not consider the optimal transmission policy in two specific retransmission protocols.
On the other hand, several techniques have been proposed to maximize energy utilization for a base station. [19][20][21][22] To reduce the energy consumption of base station, the sleep mode design was proposed for a green base station in, 19 based on switching off the base station transmission or using fewer transmission antennas. This design considers both the time and spatial domains according to the LTE standards. In addition, the problem of minimizing the average grid power for the base station in a single cell, powered by both the renewable energy and the power grid, was indirectly solved in 20 using the dynamic programing scheme. This scheme obtains a power outage trade-off curve based on statistical information regarding traffic intensity and harvested energy.
To allocate energy resources at an energy harvesting base station, both the statistic characteristics of solar energy arrivals and a greedy source of data traffic were considered, 21 based on Denver's solar radiation data and a utility function, respectively. Besides, the artificial neural network expected the arrival of solar energy arrivals in a short period. To find the optimal power and bandwidth allocation, an algorithm was designed in 22 to maximize the energy transfer efficiency across the network by taking into account a full knowledge of all channel gains, user numbers, and the total energy available in each cell. In, 23 energy harvesting for a Super Wi-Fi network was studied to support the demands in a large-scale network. This requires a sophisticated access point selection strategy that relies on solar history activities. Basically, the selection utilizes the MDP and partially observable MDP (POMDP) in an iterative algorithm to obtain the best access point for users. In, 24 another efficient harvested energy scheme was proposed that targets on different applications such as reconnaissance, home automation, and environmental monitoring. In, 25 POMDP was used to select the access point for a super Wi-Fi network based on the conditions of the base stations and a battery supplied with the harvested energy.
The aforementioned point-to-point communication works did not study hybrid energy utilization for 5G networks. However, the literature works considered statistical information for harvested energy or sleep mode design without taking into consideration for the randomness of harvested energy arrival or online transmission data. In addition, none of the previous works linked practical transmission scenarios for the MDP model with the study of trade-off among three elements: the minimum dropped packet ratio, the minimum the wastage of solar energy harvesting (SEH), and the minimum AC power utilization was achieved for a 5G base station using the proposed MDP method. The proposed method aims to balance between the previous elements. The contributions of this work are summarized as follows: • The proposed MDP model can determine the optimal action and optimal transmission policy based on the stochastic model. It minimizes AC power usage in the packet buffer. It minimizes energy wastage in the state of the battery. It minimizes dropping packets from the total packet transmission. • The proposed model has a practical transmission policy that determines the optimal buffer size in two different scenarios. The scenarios consider the finite and infinite packet buffer sizes. In addition, this paper focuses on joint impact of the battery and packet buffers. Also, it addresses the trade-offs between various buffer sizes. Moreover, the simulation results show the optimum buffer size as the trade-offs between both percentages the wastage energy and power consumption that it can be determined when the size of the battery is equal to the packet buffer.
The rest of this paper is organized as follows. The hybrid system model is clarified in Section 2, which describes the MDP formulation for transmission probabilities, and the transmission scheme for two practical scenarios. The simulation results are presented in Section 3, and concluding remarks are provided in Section 4.

| HYBRID SYSTEM MODEL
Many parameters will affect the accumulative harvested energy, such as weather conditions, sunshine duration, and rechargeable battery storage capacity. Furthermore, solar energy usually emerges in a smooth fashion over a short time period. In this work, the hidden Markov chain and the solar power measurement were used to construct a framework that could characterize the availability of solar power. In order to do so, we started with a real data records taken at 5-minute intervals from a solar site at Elizabeth City State University between 2008 and 2010. 26 In fact, the solar radiation incidents on an energy harvesting devices are affected by its surrounding obstacles (eg, clouds and terrain), including harvests absorption, reflection, and scattering phenomena which can be seemingly presented as a Gaussian random variable by the law of large numbers. These observations motivated us to show the irradiance of a finite number for possible states. In each state, the hidden Markov chain was used to determine the normal distribution with an unknown mean and variance. On the other side, solar panels generate electrical power from the semiconductor photovoltaic outcome. This generated solar energy depends on various factors such as solar intensity, the temperature, and the solar panel geolocation. More than that, the generated solar energy can be estimated using yearly geolocation meteorological weather data. Furthermore, the System Advisor Model (SAM) 27 and PVWatts 28 models can estimate the generated solar energy hourly. Notably, the generated solar energy varies in a short period of time (ie, several minutes) rather than in instantaneous fluctuations. Then, the adopted algorithm optimizes the BS cell size intervals of several minutes interval and neglects the instantaneous generated energy effect. However, the solar energy generated may not be sufficient supply for a given BS. Also, several BSs located in the same geographical area will experience approximately similar weather conditions. Thus, the solar panels will produce the same harvest green energy for all BSs in the same area. Taken into consideration for the fluctuations of green energy, the hybrid system considers a downlink for one BS which provides packets transmission for the user equipments (UEs) with 5G, as illustrated in Figure 1. This BS is supplied by two power sources, the constant power source and the battery power from SEH, to transmit data for UEs. If the SEH in the battery is insufficient, the BS will use F I G U R E 1 A hybrid system model architecture an AC power source. Otherwise, the BS uses the battery source to transmit data. In order to utilize from the SEH, the MDP model determines the best action store or uses SEH to minimize AC power usage and the wastage of SEH in the long term. Solar energy harvesting wastage generally occurs if the packet buffer is empty but the battery is full and especially when a new battery unit arrives before a new packet arrives.
In this work, we aimed to minimize the AC power in the base station using a hybrid supply of energy based on maximum harvesting power and minimum energy wastage, as depicted in Figure 2. This trade-off was investigated using MDP, which is elaborated in Section 2.1.

| Markov decision process
It is well known that the MDP is a mathematical framework used in making decisions when the expected outcomes are partially random and partially controlled by the decision maker. Indeed, the MDP mathematical framework is universally used in real-world problems to provide planning and control solutions. In different situations, MDP has the capability to capture main essence and purposeful activity core for any system model. For those reasons, the MDP has formed the basis on which many important studies in the field of learning, planning, and optimization have been built upon. For example, the MDP is utilized in different application fields such as automated control, robotics, economics, planning, and manufacturing. Various approaches exist to find the decision of MDP. The most common approach is the value iteration. In value iteration, the dynamic programing algorithm can be used to solve MDP. The second common approach is to compute the optimal policy using linear programing and policy iteration. In the aforementioned approaches, the key components are the computation of value functions and each value state within its space. With an unequivocal representation of value functions as a value vector of various states, multiple sequences and elegant algebraic equations can be performed as the solution algorithms.
In this subsection, the objective of the MDP was to provide the decision maker with an optimal policy π: S →  which determines the map including all states and actions for the system model. For an optimal transmission policy, the essential function is to determine in each state what action to perform in each state. An optimal policy will optimize (either maximize or minimize) a predefined objective function, which will aim to achieve different targets for different formulations. Then, the optimization technique to use depends on the characteristics of the process and on the "optimal criterion" of choice, which is the preferred formulation for the objective function. Some literature uses the terms "process" and "problem" interchangeably. In this work, we presented that the algorithms assuming the state and action spaces, S and , are finite. Then, the optimal actions were calculated iteratively based on Bellman's equation. This equation determines the necessary condition for the optimal policy to be obtained. This iterative algorithm calculates the expected value of each state using the value of the adjacent states until convergence. In general, smaller tolerance values ensure higher precision results in iterative methods.
Based on the proposed MDP model, we studied the best action for each time slot based on transmission probabilities for hybrid energy system. Also, the optimal policy for the long-term case is considered. The designing transmission scheme for this model defines the states, actions, transition probabilities, and reward function for each action. The transmission scheme can be described as (S,, ,andℜ). Let S be the state space, which is a composite space of the packet state  = {0, … ,N Q }, and battery state  = {0, … ,N  }; that is, S =  × , where × denotes the Cartesian product. At the state S = (i, j), there are i packets and j available solar energy units in the packet buffer and the battery, respectively.  =   ×   is the composite action for transmitting packets using hybrid energy,  C = {0, … ,N Q } and   = {0, … ,N B }, where N  and N B are the maximum allowable action indexes using AC power and harvested solar power, respectively. These indexes present the size for the packet buffer or battery. Moreover, the action will be N  = N B if and only if the size for both the packet buffer and battery are the same when one type of energy is used. The transmission policy for each scenario is assumed to have the state transition probability for different actions, and ℜ is the reward function to harmonize the cost value of AC power source usage and the SEH wastage for all actions. To ensure reliable data transmissions for UE, AC power usage was used, since the SEH is an unpredictable source. However, the SEH wastage from the battery occurs when the packet buffer is empty. To minimize AC power usage and the SEH wastage, the proposed method obtains the decisions when using or storing SEH from the battery based on two practical scenarios. The considered scenarios are suitable for the MDP model since it requires making decisions on the best time to transmit packets. 17 Initially, both scenarios store SEH when the transmission packet takes place. However, the second scenario will use SEH immediately if and only if the packet buffer is not empty. As a result, the aforementioned scenarios will minimize the AC power usage from the total energy consumption. In addition, the joint impact of the battery and packet buffers are also considered. The performance trade-offs between these factors including the wastage energy and the power consumption are investigated in the two practical scenarios. To find the best decisions, the practical scenarios were studied to find the best time to use or store SEH. In the first scenario, the proposed model receives and stores the SEH arrival before transmitting any packets. Besides, there is a probability to use or save SEH in each time slot which depends on a decision to transmit a packet or not. In the second scenario, the proposed model receives the SEH while transmitting the packets or before packet transmission. Moreover, it uses SEH from the battery immediately to transmit the expected packets to the buffer by SEH in the battery. In Figure 3, it shows when the packet transmission will occur for both of the practical scenarios.
In both scenarios, K is the number of transmission intervals; P T is the new packet arrival probability for new SEH arrivals; α k is the amount of harvested energy; and H k is the probability of harvesting energy. However, U k is the probability for using harvesting energy immediately to transmit a packet only in the second scenario. The transmission scheme for both scenarios is relied on the proposed model. In this scheme, a Markov chain for the Markov decision process of buffer sizes is described in the following subsections.

| Action
For each packet transmission, it was assumed that the system consumes one energy unit. Wherein, the total number of transmitted packets is equal to a A +a C . Moreover, the action of transmitting the maximum number of the packets is equal  Table 1. In the figures, the black line represents no transmission, and the blue line indicates packet transmission using SEH, AC power, or hybrid power. The violet line indicates no transmission but with a dropped battery unit.

| The system states with transition probabilities
The transition probabilities of both the packet buffer and the battery were defined from the current state (S Q , S B ) = (i, j) to the next state (S Q , S B ) = (i′, j′) with regard to the action A = (a A , a H ) in the following cases: Case 1: Transition probability 1 (T 1 ) represents no probability for SEH (1 P H ) and no probability for packet arrival (1 P P ).
Case 2: Transition probability 2 (T 2 ) represents no probability for SEH (1 P H ) and a probability for packet arrival (P P ). Case 3: Transition probability 3 (T 3 ) represents a probability for SEH (P H ) and no probability for packet arrival (1 − P P ). Case 4: Transition probability 4 (T 4 ) represents a probability for SEH (P H ) and a probability for packet arrival (P P ).
The equations for the transition probabilities can be expressed for each current state, next state, case, and action as the following conditions: 1. The present and next states are the same for both the packet buffer and the battery. (1)

Goal First scenario Second scenario
Energy wastage More opportunity to reach a full battery and more power wastage from the battery Less opportunity to reach a full battery and very little power wastage from the battery Dropped packets There is a probability of dropped packets There is no probability of dropped packets

Packet buffer full percentage
For the simple case of receiving one packet (low data rate), there is a probability of reaching a full buffer before packet transmission For the simple case of receiving one packet (low data rate), there is no probability of reaching more than half of the buffer before packet transmission

Policy
There is a chance to use different decisions to determine when to use harvested solar energy to transmit packets (trade-off) There is only one way to determine when to transmits packet using harvested solar energy immediately

Performance
The performance is better when the data rate of harvesting energy is high since there is the probability of saving power consumption in the next buffer state The performance is better when the data rate of harvesting energy is low because it will not waste battery energy if the buffer is not empty Buffer size Infinite packet buffer Finite packet buffer 2. The present and next states are different for the packet buffer but the same for the battery.

3.
The present and next states are the same for the packet buffer but different for the battery.

4.
The present and next states are different for the packet buffer and for the battery.

| Cost function
The cost value is defined as a weighted sum of three elements, AC power usage (a A ), solar energy wastage, and the number of dropped packets. The cost of each transmission packet using AC power or the wastage of one SEH unit is one energy unit. In order to achieve the balance between elements of the cost value, we considered the Bellman's equation, which calculates the minimum of the total cost.
where, γ is a discount factor, and J * represents a function of the expected cost value to achieve the system's objectives, and C is the cost value for states S, actions A, and transition probability T A , where these parameters are defined for the first scenario in the following equations.
where, K is an interval for each time slot which determines for the other parameters. The α K , H K−1 , B K , D (1) K−1 , and T K represents the incoming SEH, the stored SEH, the total number of remaining packets transmitted at time intervals, and the remaining number of time intervals.
For the second scenario, C is defined as follows.
where, U H is the probability of using harvested energy immediately. U H is equal to one if S Q > 0; otherwise, it is equal to zero.

| SIMULATION RESULTS
The proposed MDP model was calculated using the value iteration algorithm, as shown in Figure 6, and the MDP iteration was used to calculate the reward value. This algorithm was evaluated for two proposed practical scenarios that considered the following metrics.
• The dropped packet ratio.
• Wastage of SHE.
• The AC power utilization ratio.
• The optimal buffer size.

| Performance of the MDP model
In the proposed scenarios, we considered the buffer sizes for both the battery (N B ) and packet set (N  ) ranged from one to ten. Besides, the action to transmit one the buffer sizes for slot ( = 1) is consuming 1 mW. If multiple packets are (Dropped one battery unit); (3) (4) , the consumed power is equal to multiply 1 mW by the number of transmitted packets. Moreover, the hybrid power is used for packet transmission. Also, there was a probability of receiving only one packet or one solar energy unit for each time slot. However, the reward function could determine the minimum AC power usage using Bellman's equation. First, it calculated the optimal action which has the minimum cost from all actions for each time slot sequentially. Then, each result for the optimal action was multiplied by the discount factor γ = 0.99 and summed up for 1000 iterations in Equation (5). As shown in Figure 7, we compared the percentages of dropped packets from the total transmission packets for two proposed scenarios. Notably, the percentage of dropped packets for different buffer sizes increased in the first scenario. Also, the performance of the first scenario would improve if the battery states increased in this simple case. The percentage of dropped packets in the second scenario was equal to zero because the data rate of the receiving packet was lower than the transmission packet. Where, in the second scenario, the packet buffer can use SHE, AC power, or hybrid power to transmit a packet directly.
As shown in Figure 8, we compared the percentage of energy wastage from the total harvested solar energy in both scenarios. For different battery sizes, the percentage of solar energy wastage decreased when the battery size was increased in both scenarios. Moreover, the performance of the second scenario was better than that of the first scenario. Hence, the second scenario used the harvested energy immediately, except when the buffer of the packet was empty.
As shown in Figure 9, we compared the hybrid power using the MDP harvesting power model and the pure AC power for model and AC power utilization ratio concerning different transition probabilities set. That is valid for the MDP transition probabilities set. In particular, the power consumption for different buffer sizes was investigated. Both scenarios showed good ability to minimize AC power usage; however, the performance of the second scenario was slightly better than the first scenario. For high data rate in the receiving packet, it was expected that the performance of the first scenario would be better than the second scenario; the trade-off between using harvested energy immediately or not would cause a balance between AC power consumption and using harvested energy efficiently. The relationship between buffer size and power consumption was approximately linear because it was assumed that the power needed to transmit one packet that the power for one packet transmission would consume one battery power or one AC power unit. Figure 10 shows that the joint influence of battery, and packet buffers can achieve trade-offs for both the percentage of wastage SEH and the percentage of AC power over the total hybrid power usage. Moreover, it illustrates the percentage of AC power over the total hybrid power usage for both scenarios increased when the model size increased, because accommodating more packets in the packet buffer consumed more power. On the other hand, the percentage of SEH wastage over the total SEH decreased when the model size increased because the battery size could receive more SEH. Furthermore, the cross points between the red and green lines at buffer size were around 5 and 2, which showed the optimum buffer sizes needed to find the balance between AC power usage and SEH wastage for the first and second scenario, respectively.

| Performance influence between the packet buffer and battery
As shown in the aforementioned figures, the MDP model traded off the performance metrics for both scenarios. Additionally, the second scenario achieved better results than the first scenario in all performance metrics.
In summary, the dropped packets could occur only in the first scenario. On the other side, the energy wastage in the second scenario was less than that in the first scenario. However, the power consumption for both scenarios was approximately equal. Furthermore, after taking into account two practical scenarios, the optimal buffer size was determined while considering the trade-off between the energy wastage and AC power consumption.
To investigate more joint influence between the packet buffer and the battery, two more cases for the size of the packet buffer and battery are assumed. In the first case, the battery buffer size was fixed at five units, and the packet buffer ranged from one to ten units. This case focused on the effect of different packet buffer sizes in compared with the fixed battery size. Moreover, the reason for selecting five as the fixed battery size was because the optimal buffer size was five in the first scenario of Figure 10. The second case proposed the same fixed size for the packet buffer. Also, the size of the battery ranged from one to ten. Figures 11 and 12 illustrated the first case and the second case, respectively. In Figure 11, the energy wastage decreased in both scenarios, even if though the size of the packet buffer was less or more than the fixed size of the battery, because the number of packets in the packet buffer can utilize SEH more efficiently. However, the power consumption decreased when the packet buffer was less than five because the battery could accommodate more SEH. Besides, the power consumption increased when the size of the packet buffer was more than five because the capacity of the packet buffer could receive more arrival packets which will use more energy. As shown in Figure 12, the wastage energy decreased in both scenarios when the size of the battery was less than five because the battery could have more space when increased the size of battery, especially if the packet buffer was empty. On the other hand, the energy wastage increased when the size of the battery was more than five, because the battery would have more opportunity to be full and would lose some battery arrival units. Also, the power consumption would first increased then decreased when the battery size was five, because the capacity of the battery could affect to consume the AC power or SEH. As a result, both Figures 11 and 12 cannot achieve the optimal buffer size for both the packet buffer and battery since the capacity of buffers for the packet buffer and battery are not synchronized as in Figure 10.

| CONCLUSION
The maximum utilization of hybrid energy was investigated for the base station in a 5G network. By taking into account the unpredictability of the SEH source, the MDP model was proposed to balance the minimum AC power usage and the minimum amount of wasted SEH. The model sets the state's space, the actions, and the state transition probabilities, from which it could determine the best transmission action for each time slot and the best long-term transmission decision. Two practical scenarios were analyzed to study when the best time would be to use and store SEH. The first scene stored the SEH for packet transmission afterward, and the second scene used the SEH to transmit packets immediately. As a result, the proposed model could determine the optimum buffer size needed to balance AC power consumption and SEH wastage.