DRX-based energy-efﬁcient supervised machine learning algorithm for mobile communication networks

The continuous trafﬁc increase of mobile communication systems has the collateral effect of higher energy consumption, affecting battery lifetime in the user equipment (UE). An effective solution for energy saving is to implement a discontinuous reception (DRX) mode. However, guaranteeing a desired quality of experience (QoE) while simultaneously saving energy is a challenge; but undoubtedly both energy efﬁciency and the QoE have been essential aspects for the provision of real-time services, such as voice over Internet protocol (VoIP), voice over LTE, and mobile broadband in 4G networks and beyond. This paper focuses on human voice communications and proposes a Gaussian process regression algorithm that is capable of recognizing patterns of silence and predicts its duration in human conversations, with a prediction error as low as 1.87%. The proposed machine learning mechanism saves energy by switching OFF/ON the radio frequency interface, in order to extend the UE autonomy without harming QoE. Simulation results validate the effectiveness of the proposed mechanism compared


INTRODUCTION
With the ever increasing traffic growth of wireless communications systems, energy efficiency becomes a very important and challenging issue [1]. 5G and 6G wireless networks promise a notable number of innovative services and technologies [2,3], whose incorporation may increase even more the power dissipation in the network. Energy demand from battery supplied devices, needed to tolerate new software applications, exceed substantially the real technology capacity from batteries. As we move toward new generation networks (NGN), user experience also depends on his ability to save energy. Energy efficiency at the base station (BS) has attracted a lot of attention from the industry and the research community [4], as the BS contributes with a much larger energy consumption than the user equipment (UE). However, energy efficiency at the UE is also extremely important lately, also due to the quest for controlling devices and services remotely by voice [10].
In today's competitive world, VoIP quality of experience (QoE) can pave the way to success for any business [12]. VoIP is increasingly common among providers and is used in diverse industries. As stated by the business consulting company Frost & Sullivan, mobile VoIP alone (like WhatsApp and Viber, excluding fixed VoIP) constituted a 30 billion dollars business worldwide by 2015 (up from just 600 million dollars in 2008), and this trend continues. Because of this, Frost and Sullivan is incentivizing all mobile providers to start applying VoIP [12]. In November 2017, the number of concurrent VoIP users on Skype exceeded 300 million. Research has shown that there are approximately 200 million VoIP hard phone subscribers in the world and the number of mobile VoIP users exceed 150 million [13]. Undoubtedly, the deployment of highspeed networks has contributed to the increasing growth and popularity of VoIP [12].
5G and 6G networks have expanded their focus, including not only human-centred communications but also many use cases of machine-type communications (MTC) [14,15]. Since MTC devices are usually battery-operated without frequent human intervention, energy efficiency becomes extremely important. Based on the MTC traffic parameters, in [16] a radio resource control protocol, that makes use of DRX, was designed by taking into account the unique characteristics of MTC devices. In [17] and [18], DRX mechanisms associated with MTC services are discussed. In [17], the average power consumption of the wake-up enabled MTC device is modelled by using a semi-Markov process and then optimized through a delay-constrained problem. In [18] the authors propose a new online learning based DRX mechanism, which aims at improving energy efficiency for MTC services by adapting to different traffic patterns.
IoT applications emerge as new and innovative ways to collaborate, communicate and interact, both from a human and machine perspective [19]. In this sense, as the natural mode of communication, the integration of speech/voice and telephony into IoT applications can offer a versatile method to provide human interaction, communication, and control [19]. Successful integration of IP telephony and IoT technologies is mutually beneficial and even essential for both fields [20]. Such an advancement would enable numerous new IoT applications and products with voice-awareness, which can provide a flexible user experience in a more economical way than traditional methods. Ultimately, all these new applications and features will result in a broader and smarter ecosystem [20].
However, human communication based on voice over IP (VoIP) continues to be one of the main demanded services from wireless networks [25], be it for one-to-one conversations or for online meetings. It is a feature of voice communications that the signal is composed of active periods separated by pauses or silence periods [26]. Interrupting the transmission during pauses or silence intervals has very little impact on the quality-ofexperience (QoE), which could be exploited by a DRX mechanism. By conditioning the switching to the OFF state with the beginning of a silence period, DRX is able to save energy without deteriorating the QoE, provided that the turning to the ON state is synchronized with the start of the activity period. However, we must take special care not to prolong the OFF state beyond the beginning of active speech. Therefore, there is an operational trade-off between energy savings and QoE.
Several DRX approaches for VoIP communications can be found in the literature, as for instance [22][23][24][27][28][29]. In [27], the authors propose a modified power-saving mechanism (PSM) that reversely applies the state transition of legacy LTE PSM by considering the attributes of network propagation delay, whereas in [28] it is considered an adaptive DRX method that utilizes service and terminal acknowledgement provided by the deep packet inspection mechanism and modifies the DRX settings on the fly. In [29], the authors model the DRX mechanism using an n-state Markov chain, and evaluate the energy consumption and the delay based on the transition probabilities between states, but unfortunately the proposed method does not guarantee a high level of QoE. Furthermore, [27][28][29] do not exploit the active/silence periods characteristic of VoIP sessions to reduce energy consumption. In [22], the authors exploit the silence periods of a VoIP session to achieve greater energy savings in WiFi networks, but without taking into account the effect of burst losses nor ensuring a required QoE value.
Each service has its particular traffic characteristics, and that is not different with VoIP. The alternation of silence and active periods is typical of voice communications, and it provides an essential feature to control the DRX mechanism in order to save energy without affecting the QoE. However, for that sake, it is of paramount importance to statistically characterize the behaviour of silence periods. Dynamic DRX mechanisms that predict packets arrival using a simple neural network, in response to traffic conditions, are proposed in [23,24]. An algorithm for multiple beam communications is proposed in [23], which enables dynamic short and long sleep cycles. In [24], a neural network is trained for prediction of next packet arrival time based on real wireless traffic. However, [23,24] do not focus on the distinctive features of VoIP that, according to [30], are described more accurately by means of a Gaussian process (GP). It is known that GPs can conveniently be used to specify a very flexible non-linear regression with feasible computational load, working well on small datasets and having the ability to provide uncertainty measurements on the predictions. Gaussian process regression (GPR) is non-parametric, so rather than calculating the probability distribution of parameters of a specific function, it calculates the probability distribution over all admissible functions that fit the data [31].
The method proposed in this paper performs a trade-off analysis between the average receive energy consumption and the VoIP service QoE. We propose a new DRX mechanism by a service aware supervised learning algorithm, which predicts the duration of the silence periods by means of a GP model that effectively characterizes the VoIP traffic pattern. To the best of our knowledge, this approach has not been previously proposed. The results show that our novel mechanism significantly reduces energy consumption with respect to previous DRX schemes without appreciably deteriorating QoE, while predicting silence periods with an error of less than 2%. Table 1 summarises the features of each analysed mechanism. Now, the contributions of this paper are summarized as follows: (i) We propose a new DRX-based energy-efficient supervised machine learning algorithm for mobile communication networks, which saves energy by switching OFF/ON the radio frequency interface, in order to extend the UE autonomy without harming QoE; (ii) For the first time, the problem of silence intervals prediction on human voice communications is addressed using a Gaussian process model, with a prediction error as low as 1.87%; (iii) Our proposal allows an improvement in energy savings as big as 30% compared with the other DRXbased mechanisms and its energy benefits are even greater when high levels of QoE are required.
The rest of this paper is organized as follows. Section 2 introduces the system model, while Section 3 describes Gaussian process techniques to characterize the behaviour of the silence periods through regression. Section 4 designs a model that is capable of predicting the length of the silence periods. Section 5 discusses some performance metrics, while Section 6 evaluates the proposed method in terms of energy saving and QoE. Finally, Section 7 concludes the paper. Table 2 lists the acronyms and Table 3 the symbols used throughout this paper.

SYSTEM MODEL
The period in which the audio does not contain spoken words is called "a period of silence", which can correspond to an interval between two consecutive words in a sentence or to a pause in the conversation. Intuitively, during a period of silence it is not necessary to transmit or receive any packets, resulting in energy savings, since the payload in real-time transport protocol (RTP) packets in such periods does not have significant data. In VoIP communications it is possible to distinguish RTP packets of silence from RTP packets of voice. Since we assume that the transmitter sends a comfort noise (CN) packet at the beginning of a silence period, allowing the receiver to identify the start of the silence period. The main purpose is to notify the receiver Sixth Generation Networks so that comfort noise is generated while the silence lasts. In this way, the transmitter avoids sending unnecessary packets and the listener does not think that the communication has been cut off, as happens in absolute silence. However, in our case we take advantage of that information to turn OFF the radio frequency interface, for the sake of saving energy also at the receiver side.
Once the silence period is over, the first incoming packet has a time stamp that indicates an interval larger than expected for two consecutive active packets, but the sequence number has  only increased by one, thus indicating a new activity period after the silence [32]. Moreover, the use of voice activity detection techniques allows to distinguish speech from non-speech segments in an audio stream and delimits the length of the period of silence using comfort noise and normal RTP packets. Opportunistically and dynamically turning OFF/ON the RFI at the transmitter side is a trivial operation. However, the receiver cannot take advantage of the activity restart notifications while its RFI is turned OFF, then predicting with high precision the length of periods of silence at the receiver side is a considerably complex but much necessary task. Note that, increasing the OFF time causes an increase in the energy savings in the receiver. However, the risk of packet loss increases if an active period starts during the OFF time. Since, although the new packets are stored in a buffer waiting for re-establishing the connection, excessive delay causes information loss, ultimately affecting the service QoE. Figure 1 illustrates our System Model. Without limiting the relevance of our proposal in 5G networks and beyond, we use as example a typical Voice over LTE (VoLTE) architecture by which the UE accesses an IP flow of conversational multimedia. The modules involved in the operation of the proposed scheme are represented within the UE and described in Sections 2, 3 and 4. It is necessary to clarify that we assume that the UEs involved in the communication keep a single active conversational session, even though we consider that in conference sessions it could perform in a similar way.
Finding a proper solution between the opposed aspects of energy savings and QoE is challenging, as predicting the duration of each silence period is non-trivial. Without losing the generality, in this work we assume that voice activity detection is used because most of the current codecs support it, so whenever the RFI is active we can determine when a silence period FIGURE 1 System model: DRX mechanism aimed at saving energy considering the QoE by a service aware supervised learning algorithm, which predicts the duration of the silence periods by means of a GP model that characterizes the VoIP traffic pattern begins, but not how much time it will remain. Therefore, we need a scheme capable of predicting this prolongation of silence with accuracy. One technique to achieve this goal is to exploit the traffic pattern. Specifically, in this work we consider a simplex VoIP traffic model [33], where the UE is only aware of descending traffic.
Then, we can calculate the length of silence periods between consecutive voice packets and use statistical analysis to describe the silence periods. Next, we present an algorithm capable of predicting the length of future silence in new conversation periods based on the history of previous observed silence. When trying to predict the silence duration there are three possible scenarios that are illustrated in Figure 2. First, (a) the radio interface (RFI) is turned ON before the end of the period of silence, what is inefficient in energy saving terms. Second, (b) the RFI is turned ON beyond the current silence period, therefore leading to packet losses, decreasing the QoE. Third, (c) the ideal scenario, in which the system is capable of predicting the exact duration of silence, where we can save the maximum amount of energy without deteriorating QoE. An efficient solution to design such a system is to first statistically characterize voice traffic in terms of silence/active periods through regression techniques, making use of training data, and then utilize such system to make efficient prediction on unknown data. The silence and active periods are analysed in details in Section 5.2

GAUSSIAN PROCESS
Supervised learning [34] in the form of regression is an important element of statistics and machine learning, either for data set analysis or as part of a more complex problem. Traditionally, parametric models have been used for this purpose, which can be easily interpreted and used to advance a general theory from the model. In that sense, models like the Auto Regressive-Moving Average (ARMA) model and the Auto Regressive-Integrated-Moving Average (ARIMA) model stand out for stochastic processes. However, according to [35], when the process is only approximately governed by stochastic differential equations the parametric modelling can be weak. Besides, parametric models such as the Normal, Weibull and Poisson distributions require an assumption about the algebraic form of the relationship between two or more variables, as well as a lot of clean, complete, and uncorrelated data to be properly validated [36]. The shortcomings of these parametric techniques are caused by their limited assumptions over the underlying function f ; such linear assumptions are inadequate to model a longterm forecasting problem and thus a more powerful model is needed. Non-parametric techniques like GPR provide a solution to the problem, as mentioned earlier. These techniques make much less restrictive assumptions over the function f . Non-parametric methods do not require a prior model. They count with innumerable possibilities to configure weights and to adapt to different non-linear datasets [37]. The neural network (NN) approach is the most popular and widely used non-parametric method. A NN can perform tasks that a linear algorithm cannot and it is usually reliable for highly dynamic and non-linear processes. However, we have preferred not to use NNs on this occasion, since among its documented limitations are [38]: • The resulting models are black boxes for which the association between features and classes may not be easily described; • The training process is typically lengthy, especially using the batch strategy; • Tuning the configuration parameters, such as learning rate or selection of the activation method, can be time consuming; • Estimating the minimum size of the training set to get accurate results is not obvious.
Alternatively, other non-parametric models, such as Support Vector Machines (SVM) and GP, can overcome these deficiencies and provide efficient and powerful classification algorithms that are capable of dealing with high-dimensional input features and with theoretical bounds on the generalization error and sparseness of the solution provided by statistical learning theory. SVM and GP models are non-parametric models based on kernel functions and Gram matrices. The advantages of GP with respect to SVM is that their predictions are truly probabilistic and they provide a measure of the output uncertainty. In addition, there exist algorithms for GP hyperparameter learning, but not for the SVM framework [39]. GPs are Bayesian non-parametric models that are becoming more popular for their capability to capture highly non-linear data relationships in tasks such as dimensionality reduction, time series analysis, and novelty detection [40]. More specifically, GP has the following strong aspects [41]: • The prior specification of covariance function enables to accommodate a wide class of non-linear regression functions, while prior knowledge about the regression function can be incorporated; • The model can be easily applied to address regression problems with multidimensional functional covariates; • The model provides a natural framework for modelling a non-linear regression function and a covariance structure, simultaneously.
A GP, denoted by f → P (m, k), is fully specified by its mean function m(x) and covariance function k(x, x ′ ). It is a natural generalization of the Gaussian distribution, whose mean and covariance are a vector and a matrix, respectively. The main objective of using a GP is to derive certain rules on how to predict future data based on the training. Therefore, we should determine a function (after training the model) that makes accurate predictions on future data. Typically, there is vague prior information, so we use a prior hierarchical where the mean and covariance functions are parameterized in terms of hyperparameters.
First, we need a prediction function whose parameters must be optimized in order to minimize the prediction error. For this where log p(y|x i , Θ) is the logarithmic marginal probability of y with respect to x i given the hyperparameters Θ. In this case, y → P (m, k) and = m(x i ), i = 1, 2, … , n, n is the length of the training set, is the covariance matrix, (.) T is the matrix transpose. Note that (1) contains two terms, the first, (y − )(y − ) T ∕2 , depends on the training set as and are calculated based on n training samples. The second, log (| | 1 2 (2 ) n 2 ), is a logarithmic term which penalizes the complexity and normalizes the model, as it depends on n. The trade-off between penalty and adjustment of the data in the GP model is implicit. There is no weight parameter that needs to be tuned by an external method, such as cross-validation. This is a very convenient feature, as it simplifies training [31]. Now, we can find values of hyperparameters that optimize L based on their partial derivatives, L∕ Θ m and L∕ Θ k , where Θ m and Θ k are the hyperparameters of the mean and covariance functions, respectively, and (Θ m , Θ k ) ∈ Θ. Figure 3 contains a flowchart showing the hyperparameters associated with the proposed scheme and where GP is used in it. The logarithmic marginal probability (L) is used to take a measure about the difference between the predicted (expected) and empirical (observed) values of silence duration. Therefore, L is a reference used to optimise the hyperparameters values in the Gaussian process regression [31]. The values that minimise the error (‖L‖) are those used for the posterior stage of prediction.

PROPOSED SCHEME
The proposed method aims at predicting future silence periods, in order to control the DRX mechanism, saving energy without compromising the QoE. In order to achieve this goal, first we describe the mean and covariance functions that are used to characterize the statistics of the voice activity as follows in Figure 3, and thus define an appropriate GP model. Then, such model is used in an algorithm that, given a predefined reliability parameter, is able to determine how long a silence period should last. This algorithm can therefore control a DRX mechanism that is able to operate the trade-off between power consumption and QoE.

Mean and covariance functions
As mean function m to describe the behaviour of the silence periods, we use the one suggested by ITU Recommendation P.59 [42], whose parameters were estimated for silence periods of up to 5s. This Recommendation has been recently used in [43] to model the interactivity of real-world conversations with distinct levels of features and different transmission delays, proving its accuracy from an empirical study. By this mean we model the probability density function (PDF) of pause duration by two weighted geometric PDFs: where x = 1, 2, 3, … , 5000 and so that the cumulative probability distribution function according to [42] can be represented as where T is the average duration of pauses and represents the minimum pause time as to consider it a silence period. Therefore, T and are the hyperparameters of the mean function that are used to optimize L, in order to later find the function that predicts silence periods. For that sake, we use (1) to calculate the marginal probability and determine the generalization error; then we proceed to minimize the objective function L using optimization techniques based on the gradient [31].
As covariance function, we consider a Matern 5/2 kernel [31], widely used in Machine Learning to define statistical covariance between values at two points, where r = √ (y − ) T (y − ) is the Euclidean distance between y and , 0 is an amplitude and a scale hyperparameter that determines the relevance of the input parameter. A kernel (or covariance function) describes the covariance of the Gaussian process random variables. Together with the mean function the kernel completely defines a Gaussian process. As m(x) we can utilize any real function, however for k(y, ) the function should be positive definite, which implies that the matrix should be symmetric, hence invertible. The same apply for kernels, therefore any kernel function can be use as covariance function [31]. Then, we can state that k(x, y) = f (x) ⋅ f (y), where k is the kernel function, x, y are n dimensional inputs, ⋅ denotes the dot product, and f (.) is a map from n-dimension to m-dimension space such that m ≫ n.
Matern 5/2 kernel 1 is a twice differentiable covariance. It is commonly used to define the statistical covariance between measurements made at two points that are d units distant from each other. Since the covariance only depends on distances between points, it is stationary. A lot of time this kernel works better than the standard Gaussian kernel as it is "less smooth". It is considered the simplest and natural covariance, in spatial statistics, that works well in ℜ 2 .
The partial derivatives of (4) and (5) are calculated with respect to the hyperparameters in order to obtain the values that optimize the objective function and start the training. The optimization problem is formulated as follows (See Figure 3): where (T ,ˆ) ∈ Θ m and (ˆ0,ˆ) ∈ Θ k . The values ofT ,ˆ,â ndˆ0 that meet this condition are calculated using the partial derivatives of L with respect to each of them where = k(y i , i ), i = 1, 2, … , n is the covariance considering the training set. Then, solving the following systems of two equations with two variables each, and we obtain the trained model f → P (m s , k s ). The procedure we used to solve these systems of equations was as follows. In either (11) or (12), we fix the variableˆ(ˆ) and calculate the gradient by varyingT (ˆ0), then with the value ofT (ˆ0) that minimizes L we varyˆ(ˆ) to find the global minimum. Finally, with these values the mean and covariance functions used for the prediction are obtained.

Silence duration prediction
Next, based on the GP model in Subsection 4.1, we describe an algorithm that, when a silence period is detected, predicts how long such period should last under a certain statistical confidence. Let us assume that X denotes the length of the silence period, and  (X ≤ Γ) = (Γ) is the probability that the silence period is less than Γ, while the probability that a silence period is longer than Γ is 1 − (Γ). Assuming that the silence period has already lasted for Γ seconds, then the conditional probability that this silence period lasts for longer than (Γ + Δ) is Using , the proposed Algorithm 1 is able to predict appropriate values of Δ (new increase in time) that give a conditional probability greater than or equal to a given confidence interval . The inputs of the proposed algorithm are the confidence , the current silence duration Γ, and the trained GP model f → P (m s , k s ). Once a silence period is detected, the algorithm calculates how long it is expected to last (Δ) and the OFF time is extended to Γ = Γ + Δ, updating the value in the cumulative distribution function. The Algorithm 1 continues calculating values of Δ until Δ > 0 cannot be found, either because the algorithm is not able to find a value of Δ with a probability of occurrence or that the maximum of f is reached, then the algorithm sets the RFI ON.
The model keeps updating the mean and covariance functions by maintaining the training process, but using the new incoming data as to achieve greater precision and adapt the system to changes in the traffic distribution. Note that if the training values of the successive calls are used in the continuous training process, significant energy values can be saved by maintaining a high fidelity, therefore in the proposed scheme we use the prior knowledge of silence periods to predict the future silence based on this self-similar behaviour. The proposed Algorithm 1 can be used to control a DRX mechanism as to achieve a given trade-off between average power consumption and QoE. For that sake we determine the values of and Δ that minimize the energy consumption 2 while keeping a fixed QoE value Q i . Such problem can be formulated asΔ where P is the average power consumption. Then, from the fixed QoE value we calculate the R factor, obtaining packet loss rate (PLR) values for reaching that QoE through Eqn. (19). Next, we obtain and Δ from the PLR and the CDF since, in last instance, they are inversely proportional. error (subject to PLR 3 ) is calculate as follows: where is obtained from Δ as a simple calculation from [13]. Processing delays do not affect the performance of our scheme, since due to the low computational cost required these small delays are absorbed by the playback buffer.

PERFORMANCE METRICS
In this section we discuss two performance metrics utilized to assess the proposed method and to compare it with competing schemes from the literature: energy consumption and QoE. Additionally, in order to confirm the accuracy of the predictor, we calculate the root mean square error (RMSE) as a metric of how close are the expected values from the observed empirical ones. The error is obtained as follow: where f i are the predictions (expected values), f o i are the observed values (known results) and N is the sample size.

Energy consumption
The energy consumption of mobile devices depends on different components such as active processes, display, and transceiver. For the sake of simplicity, we adopt a simplified model that refers only to the energy consumption of the radio frequency (RF) interface, based on [44]. The RFI consumes the most energy in the active state, during which it is transmitting or receiving packets. In the OFF state, on the other hand, most RF circuits are turned off to reduce energy consumption, leading to a much smaller consumption. Moreover, our goal is to analyze the energy savings due to the proposed scheme, not the specific energy consumption values. Hence, we use a relative energy consumption model, which takes into account the percentage of the total time that RFI is ON. Note that this consideration allows a fair and effective comparison between different DRX mechanisms, in which energy savings are achieved precisely by turning OFF the RFI. Then, first we compute T a , the time in which the RFI is ON, as where t a is the ON time per DRX cycle (time interval between the beginning of two instants t on ) and N c is the number of cycles that are executed during the call. If T t is the total call time, then is a normalized measure of energy when using a DRX mechanism. The smaller E, the larger the energy savings.

Quality of experience
In recent years, QoE has become fundamental in telecommunications, taking into account the evaluation of human experience when interacting with technology in a particular context [45]. Although QoE depends on quality of service (QoS) metrics like packet loss and latency, a good (bad) QoS does not necessarily means a good (bad) QoE [46]. QoS metrics are specified by t on (On time) 2, 3, 4, 5, 6 (TTI) [29] t sc (Short cycle time) 16,20,32,40, 64 (TTI) [29] t lc (Long cycle time) 40, 64, 80, 128, 160 (TTI) [29] IT (Inactivity Timer) 50 (TTI) [29] n s (Step) 2 to 16 [29] (Observed silence time) 50 ms [22] (Prediction probability) 0.25 to 0.60 [22] I s (Voice signal impairments) 0 [52] I e (Equipment impairments factor) 10 [52] A (Advantage factor) 0 [52] B pl (Robustness factor) 14.1 [52] R 0 (SNR factor) 129 [52] the service and not by the user [47], while the QoE approach focuses on the user [48]. The QoE leads to a better understanding of user satisfaction with a service in a given context. In [49], the authors discuss why QoE assessment occupies a key role in various multimedia networks and applications and why it is a challenging problem. We use the ITU-T Model E [50] to estimate VoIP session QoE by using a psycho-acoustic factor R ∈ {0, 100}, with the modifications proposed in [51] to take into account the effect of burst losses. Then, the rating factor R is a scalar value defined as a linear combination of the individual impairments and is given by (19): I e f f = I e + (R 0 − I e ) P pl where R 0 is the signal to noise ratio (SNR) factor, I s represents the impairments that occur simultaneously with the voice signal, I d is the delay factor, I e is the equipment impairment factor, caused by the codec and packet loss, and A is the advantage factor. Moreover, P pl is the packet loss ratio (PLR) considering the burst effect, B pl is the robustness factor relative to packet loss.
To calculate I d we follow [52]: where D e is the average packet delay, d 0 = (D e − 177.3) [ms], and In order to calculate the R factor we take typical impairments and advantage values (I s , A, B pl , I e and R 0 ) from [52]. See Table 4.  Following (19), (20) and (21) we can see that the R factor depends on the end-to-end delay and the PLR. Then, let us analyze traces in Figure 4. The traces are representative of the traffic behaviour in LTE mobile networks under different channel conditions. These traces are a mimic for essential characteristics of real communications in various networks from United States and Korea at different times and traffic patterns, which characterize the diversity of situations in LTE networks when handling voice traffic. These same four traces have been selected by 3GPP for use by that body for testing under channel impairments [53]. The study of these traces using histograms shows that the 76% of the active periods have duration below 200 ms, while for silence periods the duration of the 82% of the total silence bursts are lower than 500 ms. After analyzing these traces, we obtain that the mean delay is between 110 and 168 ms. The jitter also varies, although it can be seen that in all traces self-similar behaviour occurs. We assume in the subsequent analysis that packet loss is only caused by the exces-FIGURE 5 VoIP satisfaction thresholds [54] sive delay effect due to the DRX mechanism. Finally, since the traces constitute a continuous flow of packets, we insert silence periods following ITU recommendation P.59 [42], obtaining the resulting traces in Figure 4(b).
The behaviour of the R factor as a function of the delay can be seen in Figure 5. End-to-end delay of less than 180 ms have little influence on the R factor, however values that exceed 180 ms have a large impact. According to [54], for delays beyond 280 ms it is not possible to guarantee user satisfaction.
It is well known that audio packets must be played back at the receiver at regular intervals [55] and that, to achieve this, delay variation must be attenuated in the playback buffer. As a consequence, this step also adds delay in the process. Audio packets find an end to end (t e ) delay, which is composed by several factors, the variable factors are the network delay (d N ) and the buffer delay (d b f ). If the path is the same (without changes), t e is asymptotically constant: where d Tx is the transmission delay (including encoding and look ahead delay), d N is the network delay, d dec is the decoding delay and d b f is the jitter removal buffer delay, which is variable, both d b f and d dec are part of the reception delay. The traces in [53] include d Tx , which is a constant delay, and d N , which is variable. Then, in the receiver side, d dec is constant and has a very small value comparing with d b f and d N , so we can dismiss it. We consider a fixed buffer size, large enough to attenuate the delay variations. Therefore, t e is asymptotically constant because d N + d b f is asymptotically constant. Based on the above considerations, the mouth to ear delay, d MTE , is modelled as the typical t e delay plus the additional time while the RFI at the mobile is OFF. Note that while the RFI is turn OFF, the packets that are about to be sent cannot be delivered and remain in the system. Therefore, what determines their true late arrival is the playback time conditioned by the size of the playback buffer. In this work, the maximum tolerable delay (d MAX ) is considered to be of 240ms, which according to [54] is an adequate threshold for user satisfaction, then d MTE for packets to be reproduced at the UE is enclosed by d MAX , and packets with d MTE > d MAX are considered lost. Note that this threshold must be greater than the maximum delay in the traces and as close as possible to this value. If the delay is taken to be too small, we obtain losses that are not inherent to the DRX mechanism itself and cannot be avoided in any way. Conversely, taking too large a threshold value can directly affect QoE. Packets with a delay greater than this threshold are discarded to avoid QoE degradation. Then, we calculate the PLR as follows: where where d p i is the total delay for each p i and N is the sample size. Finally, we calculate the corresponding mean opinion score (MOS) value, which provides a subjective measure of the impact that service failures have on the users. The MOS takes values between 1 and 5, with 1 being the lowest QoE and 5 the highest. The MOS value can be calculated from the R factor as follows [52] where and The QoE is calculated at regular intervals in order to maintain link quality at all times. If at any time the quality falls, it is a consequence that the model describing the silence made wrong predictions or that anomalous traffic was present. To avoid miss-detection, the system stops the event prediction and returns to the first stage to derive the prediction function again. In order to validate the proposed scheme, the results obtained by simulating this algorithm are discussed in the next section. The evaluation also includes previous published algorithms for comparison purposes.

Practical implementation considerations
Switching between OFF/ON states is a regular procedure, however, the transition states result in delays that lead to an increase in power consumption. Nevertheless, without losing the generality in this paper we assume for all the algorithms that the transitions between OFF and ON states are instantaneous, what implicate an error in the computation of power consumption, but guarantee fair comparisons between them. P.59 [42] allows times for silence below 40 ms, but in this paper we only consider as silence periods those with a duration of 40 ms or higher. Therefore, in all the algorithms we dismiss silence periods lower than 40 ms. Certainly, there is an error in the response times, however, following [56] the switching times for the DRX mechanism are in ranges of just 1 ms, then at worst, the resulting error in assuming instantaneous transitions for minimum silence periods is about 1.25%, which is small compared to the other calculations.
On other hand, although we perform the algorithms in a synchronized way, where the receiver clock is locked to that of the transmitter; we would like to emphasize that the effect of offset due the lack of synchronism on the subjective quality is minimal because the time scales involved are fairly long [57], representing a negligible error, around 0.01%.

PERFORMANCE EVALUATION
In this section, we analyze the performance of the method proposed in this work. Direct numerical comparisons are made against [29] and [22], since these works are widely recognized in this area of study. The work in [29] aims at saving as much energy as possible, while in [22] the focus is on predicting silence periods. Thus, it is pertinent to compare the accuracy of our predictor with the one in [22] while at the same time comparing the energy saving capability of our system with that in [29].
In the DRX mechanism presented in [29] the system is ON for a time t on with a combination of short and long sleeping cycles of duration t sc and t lc . The scheme starts in the ON state and switches OFF according to the traffic characteristics. If packets are not received during an interval IT , then it goes to a short cycle OFF state. If packets are received, then it returns to the ON state, otherwise the short cycle counter is incremented. If n s short cycle states occur in a row, then the system enters into a long cycle state, in which it takes longer to wake up and check if packets are being received. If packets are received, the system returns to the ON state. In the following, the timers are set according to [29] and are listed in Table 4. Moreover, in [22] the authors consider the empirical cumulative distribution function (ECDF) obtained from the analysis of previous traces to predict how much time the system can be OFF. At the beginning of a silence period, it turns OFF the RFI for a predefined time , and then predicts the length of the silence period using the ECDF. Herein, the observed silence time is set to 50 ms (as recommended in [22]) while varies depending on the curve of the QoE/saving trade-off. All the parameters used in the simulation are defined in Table 4, unless otherwise specified.
To perform the evaluation, we use traces from [53], available in [21] and which were obtained from real-world call logs of RTP packet arrival times collected in different wireless networks in South Korea and the United States, in combination with (b) Prediction Error.
We use the GP model in Subsection 4.1 to obtain a curve that describes the behaviour of the silence periods along the traces, using 75% of the sample data to perform training while the remaining 25% are used for testing. All algorithm programming and data analysis was carried using MatLab [58].

Numerical results
In Figure 6 we illustrate the performance of the proposed method in predicting the statistics of silence periods, using the VoIP traces available in [21]. Figure 6(a) models the cumulative distribution function (CDF) of silence periods from predicted and empirical results. We calculate the CDF as follows: where the right-hand side represents the probability that a random variable X takes on a value less than or equal to x. Note that empirical CDF is calculated from (29) as well but with the empirical values obtained from the reading of the traffic pattern and not from the prediction of the traffic pattern. Then, in order to confirm the accuracy of the predictor, we calculate the RMSE from (16). From the visual inspection of Figure 6(a) it is clear that the proposed method can predict well the statistical behaviour of silence periods. Moreover, Figure 6(b) illustrates the error between predicted and actual silence periods per audio packet, confirming the good accuracy of the proposed method. Moreover, the root mean square error (RMSE) of the predictions is as low as 1.87%.
Next, we utilize traces from [53], which have different characteristics in terms of delay, jitter and PLR, in order to evaluate the performance of the proposed algorithm and compare it to [22,29]. We adjust the parameters of each mechanism estimating the MOS. The range of permissible MOS values varies from 3.6 to 4.5, (where users evaluate the communication as satisfactory), to determine the trade-off curve between QoE and energy savings. Figure 7 shows the QoE/consumption curves of each of the algorithms 4 . The theoretical limit in Figure 7 assumes that the system is able to know exactly when the silence period begins and how long it will last, being capable of handling the OFF/ON states to locate the lower bound of the energy consumption without exceeding the established maximum threshold of packet losses. Then, while keeping a fixed QoE value: where D e and P pl were introduced previously, d max and PLR Q i are the maximum possible values subject to the condition. Note that the theoretical limit is unreachable for our scheme since the prediction of silence is made from the probability of silence learned from the traffic pattern, then our algorithm is rigorous with respect to maintaining QoE and sacrifices the savings in the presence of high probability of error. Figure 7 considers a normalized measure of energy consumption, with respect to the case when the RFI is turned ON all the time, versus the target MOS. From the results it is clear that the proposed scheme significantly reduces the consumed energy to achieve a certain MOS when compared to [22,29]. This is because the proposed mechanism is able to save a greater amount of energy during silence periods, by adequately estimating their occurrence and duration from the characteristics of the flowing traffic. For example, to reach a MOS equal to 3.6 the proposed method consumes about 2.25% of energy, while [22,29] exceed 3%, leading to a relative saving of approximately 34.5%. As we increase the target MOS value, the advantage of the proposed scheme becomes even more evident. For MOS values greater than 4.1 the energy consumption in [29] increases exponentially, while when [22] operates at a MOS larger than 4.25 the consumption increases rapidly.
To facilitate visual comparison in terms of energy consumption, Figure 8 shows the energy consumption of each scheme as a function of the MOS limit values used in the simulations. We also use an intermediate value and the average consumption is shown as well. Note that the benefit of our proposal is even greater when a high level of MOS is required. Unlike 3GPP [29] and SiFi [22], our mechanism is able to adapt itself to service patterns and MOS constraints, being capable of tracking accurately the silence periods without harming the QoE.

CONCLUSION
In this work we introduced a method to predict silence periods through ML, reducing energy consumption with respect to previous schemes without deteriorating QoE, while predicting silence periods with an RMSE of less than 2%. We were able to solve a very complex optimization problem, the trade-off between energy savings and QoE, by introducing ML mecha-nisms, allowing to characterize quite accurately the behaviour of silence periods in a VoIP session to reduce the energy consumption in more than 30% without delays in processing complexity affecting the performance of our scheme due to the low computational cost required, while ensuring a desired QoE level when predicting future silence periods in which the device does not need to remain in constant ON state. In future work we will focus on two-way conversation with a four state model to characterise the voice patterns, as well as on interaction and integration between speech/voice into IoT applications. In addition, traffic-aware DRX schemes for machine-type communications will be proposed to improve energy efficiency.