QoE-centric interference-aware Stackelberg game for cache delegation relay in wireless multimedia communications

Relaying multimedia with decent Quality of Experience (QoE) is paramount for future wireless networks. In this paper, the problem of multimedia cache relay delegation with device-to-device (D2D) communication interference is studied. A solution with gamethe-oretic cached multimedia relay is proposed considering the mutual interference between the downlink relay and the succeeding uplink communications. The interference between D2D and uplink is modelled in the game competition. In the proposed solution, the base station (BS) and relay device (RD) are treated as two rational game players, where the BS acts as the leader, and RD acts as the follower. The utilities of BS, and RD are modelled using the expected QoE of the user equipment (UE). Analytical modeling shows that the system achieves the desirable state by acquiring the Stackelberg game equilibrium between the BS, and RD. The models show that the RD gets positive incentives for relaying, the UE experiences better QoE through D2D communication, and BS achieves higher pay-off. Various cases are quantitatively analysed whether the Nash Equilibrium exists with numerical discussion. Various relay to uplink ratios, amount of relay trafﬁc, transmission distances between relay and uplink, as well as various numbers of interference uplinks are also studied.


INTRODUCTION
Due to tremendous growth of mobile multimedia data traffic and the adoption of extremely high frequency spectrums in future generation of wireless networks [1], leveraging cached data content on neighboring devices to fight against severe path loss and non-line-of-sight communication is necessary to improve Quality of Experience (QoE) [2][3][4][5]. In addition, the huge numbers of heterogenous wireless devices such as smart phones, wearable virtual and augmented reality devices result in an exponential growth in diverse mobile multimedia traffic, forcing us to rethink the design of wireless resource allocation to support co-existence of new applications such as mobile reality, virtual reality, mixed reality and self-driving vehicles [6]. Furthermore, the current lower layer resource allocation technologies are already hitting upper-bound of Shannon capacity, and thus it is necessary to find a new frontier such as pricing to alleviate the burden of communication  [7][8][9]. The mm-wave and visible light communications are both promising choices in wireless networks for providing large bandwidth and high data rate. With the nature of short distance line-of-sight communication, the relay participants become a must in future wireless networks [10]. In this work, we assume the relay device (RD) utilises D2D transmission scheme in the same mm-wave frequency band to help the BS serve UEs with better QoE. In a typical communication scenario, the UE acquires multimedia capacity from BS via uplink channel, and BS serves UE through downlink channel directly.
In the communication scenario discussed in this paper, we assume the RD in the vicinity can help BS by providing the multimedia content relay service to UE with its pre-cached data [11]. We call the data link between RD and UE as the D2D link and it is centrally assigned by the BS. Benefiting from the mm-wave FIGURE 1 Illustration of cached multimedia relay with mutual interference. BS transmits data to relay device in the first time frame (without interference), but interference occurs between the D2D (relay to UE-A) and uplink (UE-B to BS) in the second time frame communication, the D2D link that operates in shorter distances with higher speed will provide better QoE and save communication energy. On the other hand, the "high level mutual interference" between the D2D relay link and succeeding uplink cannot be ignored when facing the practical communication scenarios, especially when the BS needs to offload traffic. The interference will be more severe with higher density of devices, and thus eventually affects the QoE of UEs in the wireless network.
As shown in Figure 1, in this paper multiple uplinks and a single D2D link are modelled as the mutual interference links. Using the case shown in Figure 1 as an example, UEs such as UE-B, UE-C and UE-D have interference with the D2D link that cannot be ignored. Three participants (BS, RD and UE) are considered in the proposed system with the following roles, respectively. The BS is not only improving the QoE of UEs and offloading data traffic, but also seeking high economic revenues for itself from the data service it provides. The RD gets rewards or profits from BS when it provides cached multimedia content relaying service to UE, in order to compensate for the cost of its resource such as energy consumption and channel utilisation time etc. The UE in the wireless network only cares about the multimedia QoE that it experiences from the transmissions.
Numerous contributions have been made in the literature regarding how to handle the power control and resource allocation issues [12][13][14]. Authors in [12] proposed a novel architecture to support 5G spectrum management in order to improve the QoE of users in the next generation wireless networks. The dynamic spectrum assignment in small cells scheme was designed to eliminate the communication interference problem in spectrum sharing. In [14], authors provided an adaptive multimedia delivery approach in multi-rate enabled IoTs to improve the transmission security and imaging service quality. Game theory to solve resource allocation problems in multiplayers situa-tions has drawn significant attention in the last decade in wireless resource allocation research [15][16][17]. Based on solid theoretical foundations, game theory acts as a strong mathematical tool to better analyse the interaction among rational entities [15]. Authors in [16] used a Stackelberg game to model the interaction between the service provider, subscribers and secondary buyers. It turned out that the service provider would obtain optimal cost and the end users (i.e. the subscribers and secondary buyers) would gain high QoE simultaneously at the equilibrium state of the game. A content-aware QoE-sensitive pricing game was studied in [17], where all players gain better utilities by acquiring the equilibrium state of the Stackelberg game. A tutorial on the application of game theory for signal processing in wireless networks was discussed in [18], which also described typical strategic-form and coalition-from games in details. The game-theoretic techniques have various advantages to solve the selfish incentive problems in modern wireless networks [19,20]. Recent work [29] proposed an incentive-based Stackelberg game allowing the relay devices to determine the price of power allocated to them by the source device. Stackelberg equilibrium was derived and proved with close form equations of optimal power allocation. Research in [30] modelled the Stackelberg game such that eNB paid incentive to D2D users to relay the data among its best neighbors. The base station played leader role to minimise its cost and D2D users were the followers maximising their utility. Other recent related works regarding Stackelberg game and D2D relay could be found in [31] and [32].
Unlike most of the past research in literature, the mutual interference between the D2D relay link and the succeeding regular uplink is considered in this paper when mm-wave frequency band is used. From the economics perspective, the BS might be interested in knowing how much power it should request from the delegated RD to satisfy the QoE requirement of UEs, in order to offload its traffic burden. In response, the RD should set up the proper power cost for providing QoE-assured multimedia relaying service. If BS purchases excessive relay power, such relay power becomes interference in the next time slot for its own uplink communication; if the BS purchases too little relay power, the relay power might not be enough to provide decent QoE to the UE. Due to shorter distance between RD and UE, it is likely to provide higher data rate, lower bit and packet error rate, as well as reduce energy consumption. To address the utility maximisation problem with mutual interference in delegating the cached multimedia through wireless relay communications, we model the interaction between the BS and the RD as a two-stage Stackelberg game. The objective of this game is to obtain the desirable cost and amount of power to allocate, at an equilibrium state that both players have incentives to follow. More specifically, the BS as the leader determines its amount of power to request from RD, that directly impacts the follower RD's response with a cost per unit of power in the noncooperate game. The equilibrium state will be achieved such that both the BS and the RD satisfy their objectives of optimising their own profits and utilities. It is worth noting that, even the role of the RD is delegating the BS to complete the downlink multimedia service, the relationship between BS and h The wireless channel gain or channel loss.
The wireless channel noise. c The cost per unit of power.
d Physical distance between the transmitter and receiver.
Wavelength of the carrier signal.

L
The total length of data packets, in bits.
RD tend to be non-cooperative due to their different objectives. Most of the time, their objectives could be competitive or even contradictive. We assume the BS and the RD are both selfish and rational when providing or relaying multimedia service in wireless networks. How to deal with the power resource purchasing and how to set the proper cost between such two rational participants becomes an essential issue in the proposed system. The rest of this paper is organised as follows. The system model is introduced in Section 2, where the utility models of the BS and the RD are jointly considered in details, incorporating power, relay and interference. Then we design the Stackelberg game-theoretic model between the BS and the RD, and discuss and analyse the possibility of equilibrium of the game in Section 3. In Section 4, we discuss the game equilibrium conditions based on the QoE service quality requirements of the UE, the multimedia traffic characteristics, and the cost set by the RD. The simulations results that demonstrate the effectiveness and efficiency of the proposed communication scheme are presented in Section 5. Finally, we draw conclusion in Section 6.
The key notations and nomenclature in this paper is summarized in Table 1.
The novelty of this paper is the investigation of a new problem the interference between D2D link in the second time slot and the regular uplink. The BS needs to request "moderate" D2D relay power rather than max amount of power. This paper also has novelty of the identifying the conditions that Nash Equilibrium is available. Several conditions are discussed and we spend the whole Section 4 to elaborate when the Nash Equilibrium exists.

Utility modelling of base station
The mutual interference between the D2D relay link and the regular uplink is considered in our model as shown in Fig-ure 1. We assume the objective of BS is to ensure all UEs in its service range gaining relative high QoE, either through the direct links or through the delegated relay. Therefore, BS has to ensure that the purchased power in the D2D relay link will not significantly interfere with other UE-BS uplinks. Although the D2D relaying service plays a vital role to guarantee the QoE of downlink UEs, the excessive mutual interference could deteriorate the quality of the channel. Let w denote the equivalent revenue per unit of multimedia service quality, which typically is a predefined system parameter. Let Q ua denote the overall multimedia service quality from all links. Then the utility of BS can be modelled as the difference between its revenue and cost: where c is the cost per unit of transmitting power and P R is the amount of power that the BS purchases from the RD. The cost of BS is calculated as the product of the cost times the purchased transmitting power from the RD. For the multimedia service quality of each link, it is calculated as the distortion reduction summation of all packets multiplied by the data rate of each channel, representing both the application quality and the network throughput of the wireless system. The overall data service quality of all links Q ua is presented as [28]: where M is the number of UEs in the vicinity that could possibly have mutual interference with the D2D relay link. Furthermore, we let D i denote the cumulative distortion reduction of transmitted data packets in i th link, and D R to denote the total packets' accumulative distortion reduction in the D2D relay link. The term Q ua is presented in two parts: the first part includes D R and data rate, which implies both the multimedia service quality and the communication throughput of the D2D relay link; the second part is the summation of multimedia service quality of all other uplink UEs, representing the uplink communications in the succeeding time frame. Here we calculate the data rate as the Shannon capacity [21], where B is the channel bandwidth, h denotes the high level mutual interference channel gain or loss between RD and other UEs, h RU denotes the interference channel gain between RD and a specific user UE-A, and h i implies the channel gain or loss between the i th UE and the BS. In a way similar to [22] the mutual interference channel gain or loss where d is the physical distance between the transmitter and receiver, and T denotes the path-loss exponent. The term k 0 = ( 4 ) 2 is a constant coefficient and is the carrier signal wavelength. The flat-top antenna model is considered in our work, meaning the antenna gain is constant. Similar to the antenna model in [23], if a device is located within the beam width it will receive the power and interference from the transmitter, otherwise there is no power leakage.
In the game-theoretic multimedia cache relay delegation service model, the BS pays the RD for delegating or relaying its downlink service data. The payment from BS would be the incentives for the RD to ensure multimedia service quality in the D2D link, in order to satisfy the UE's QoE requirement. The cumulative distortion reduction of transmitted packets (D i ) represents the total multimedia data quality of all packets that the receiver obtained from the wireless link. It is approximated as the summation of the distortion reduction of each individual multimedia packet, multiplied by the probability that the packet is successfully transmitted and decoded with regards to the encoding dependency inherited from the source coding codec in a way similar to [24]: where Q i is the distortion reduction of i-th packet, L i denotes the length of the i th packet, and e denotes the bit error rate of the physical channel. The total data length satisfies the (4) represents the i th packet's dependency set with respect to multimedia source compression. In this paper, the terms of frame and packet are used interchangeably and we consider three major frame types (I, P and B) that have been widely used in different video compression algorithms [25]. As illustrated in Figure 2, the details of one group of picture (GOP) dependency in a typical multimedia flow is illustrated. Here, the I frames are the least compressible and do not require other video frames to decode. The P frames have to use data from previous frames to decompress and are more compressible than I frames. The B frames can use both the previous and the succeeding frames for motion reference to get the highest ratio of data compression. This research is cross layer in nature, taming the low layer communication design with regards to high layer multimedia traffic characteristics. The decoding dependency usually comes from the video codec at the application layer. Such information could be passed to lower layers in the communication protocol stack. At the link and physical layers, knowing such knowledge is crucial to improve the QoE. Of course, the link level quality typically refers to packet loss rate, while the application layer quality usually refers to the multimedia distortion reduction. However, these two qualities are strongly correlated between application layer and link layer. The application layer video is composed of various consecutive frames, and each frame is further fragmented into packets transmission across the wireless link. The packet loss at the link layer will be translated to video distortion at the application layer.
From the utility definition of BS described in Equations (1) and (2) we know that, there are two major factors that could significantly affect BS's utility and the system QoE performance: purchased transmitting power of RD (P R ) and the economical cost of this power. The task of how to satisfy UE's QoE requirement is addressed by improving the utility of BS, that is, Thus, our objective narrows down to two players in the relay resource allocation game, considering the equilibrium between the BS and the RD. It is worth noting that we investigate a new problem that D2D relay in the second time slot may interfere with the regular uplink communication, and thus the BS should consider requesting "moderate" transmission power of the D2D relay, rather than requesting the highest possible power from the relay. Therefore it is reasonable that only one D2D relay link is considered for each BS' request to delegate traffic to relay. Of course, this does not prevent the existence of other D2D relay links.

Utility modelling of relay device
The role of RD is relaying (retransmitting) multimedia data between a certain UE and the BS. We assume the utility gain of the RD comes exclusively from the BS (i.e. the RD does not charge the UE additional money for downlink or relay data service). Thus, the utility function of RD can be defined as the difference between its reward and its communication energy consumption where L is the total data length, and R denotes the average data rate during the relay service period. The objective of the RD is to achieve a large revenue for maximising its utility. This problem of RD can be formulated as It is necessary to set proper cost and to request appropriate amount of transmitting power in order to keep the utility of RD positive, that is, c > L∕R. We formulate the above problem as a two-stage Stackelberg game by considering the utility maximisation of the RD and the BS. In the first stage, the leader BS proposes to request a certain amount of relay power based on UE's QoE requirement. In the second stage, the game follower RD sets up the cost of the transmitting power based on the QoE requirement and the wireless channel condition. The solution is to find the best amount of power that the BS would like to acquire and the best response cost that the RD sets up, so that the system reaches the equilibrium state of desirable QoE in order to satisfy the UE. Thus the best possible utility maximisation of both BS and RD could be attained.
In fact, the model is dynamic in long term, and static in short term. Stackelberg game is a multi-stage game. In each round, the BS as the leader determines its amount of power to request from RD, that directly impacts the follower RD's response with a cost per unit of power in the non-cooperate game. There could be many rounds of such a leader-follower game, and thus in the long run it is dynamic. In each round of the game, the input factors are static. The input factors could change between rounds.

Stackelberg game equilibrium analysis
The game theory is chosen as the primary solution to this problem, because it naturally fit for such a D2D cache relay content delegation problem. The BS needs in the first time slot broadcasts the multimedia packets to all the mobile users. Some mobile users receive these packets successfully, but some mobile users may lose or miss certain packets. Since the packets could have been cached by some mobile users, these mobile users could serve as the relay in the second time slot. As the D2D relay could consume certain communication resource and energy from the relay users, some incentives could be given by the BS. In addition, the BS is liberated from retransmission of the same packets in the second time slot to those mobile users who missed these packets in the first time slot. This could be a win-win situation formulated in Stackelberg game if mutual benefit could be attained. Of course, in certain cases the Stackelberg game Nash Equilibrium clearly exists, which could be proved mathematically. In such situations the computational complexity of the game theoretical solution is fairly practical for resource constrained mobile devices. In some other situation the Nash Equilibrium may be hard to find, or hard to prove. In those situations the computational complexity may be high to find a reasonable solution. We have a separate next section dedicated to the discussion of the existence of the Nash Equilibrium.
After the UE sets up the quality requirement of multimedia service during each transmission, the BS then determines how much power P R it would request from RD, so as to maximise its own utility in terms of both multimedia quality and revenue. With the P R given by BS, the RD determines the labeled cost c with the target of maximising its utility U R . Therefore, it is a typical two stages leader-follower game which can be addressed by using the Stackelberg game framework. More specifically, the BS as the leader of the game, optimises its utility based on the knowledge of the impacts of its decision on the behaviour of the follower RD. The BS and the RD would reach their best possible utilities, and neither would want to deviate from the Stackelberg game equilibrium state. To derive the Stackelberg game equilibrium, we solve the utility optimisation problems for the BS and the RD, respectively.
When the power purchasing cost c is fixed and the purchased relay power P R ∈ [P min , P max ], the utility function of BS could be concave with certain choices of parameters. The study of the concavity of U BS is shown in next section. Six cases with regards to different source coding distortion reduction ratios and various scenarios will be discussed in details. The concavity of the BS's utility and the existence of equilibrium values of P * R and c are determined by the actual service setup. The first order derivative of the BS' utility function with respect to P R is: The optimal P * R is determined by letting the first order derivative of the utility function equal to zero, U BS = 0. The optimal cost of the power c * would be obtained in a similar way. From U BS P R = 0, we know that the optimal purchased power has a fixed relationship with the power cost, which can be expressed as a complex function of P * R (c ). For each determined cost c, the best possible purchasing power would be obtained easily by using global search method or Newton method.
Furthermore, we can rewrite the relationship between the cost c * and optimal purchasing power P * R based on P * R (c * ) as: Since the purchased transmitting power satisfies the min-max power constraint P min ≤ P * R ≤ P max , we can obtain the viable cost interval [c min , c max ] accordingly.
Recall that we have to ensure the RD's utility being positive as long as P * R ≥ 0. Since we assume the RD is rational and selfish, thus U R must be positive when it provides cached content delegation relay service, which means P * R ≥ 0. Now another problem that we need to solve is to obtain the equilibrium power cost c * in order to obtain the maximum U R . Based on previous analysis, it is complicated to solve the maximisation problem of U R (c * ) (here we consider U R as a function of c * ). In fact, P * R (c * ) is an implicit function of c * without a close form expression, and it is intricate to obtain the root of U BS P R = 0. But we can find it if we treat the RD's utility function as a function of P * R . By introducing Equation (9) to Equation (6), we can simplify the problem as a one dimensional function optimisation problem of U R = U R (P * R ). More specifically, the objective becomes maximising U R (P * R ), where In order to find the solution, we introduce the following Lemmas in a way similar to [26]. First, a real function which is differentiable must be a continuous function. Sec ond, a continuous real function on a closed interval must contain a maximum value and a minimum value. Therefore we would like to validate if the utility of the RD has a maximum value on the closed interval [P min , P max ]. In order to validate it, we perform the first order derivative of U R (P * R ) with respect to P * R : Utilising the introduced lemmas, we can see that the utility function U R has its maximum value in the interval of [P min , P max ]. Let P opt R denote the optimal transmitting power in Step 1: Define Inputs: The consumed data frame length L i ; The distortion reduction Q i of each frame; The fixed revenue coefficient w; The wireless communication parameters , d , T , B i and P i ; The number of interference users M ; The total consumed data packets N ; The number of maximum iteration steps K .
Step 2: Define Outputs: The equilibrium state of power and cost {P opt R , c opt }; The utilities U BS and U R at the equilibrium state.

Part 2: Stackelberg Game Equilibrium Search:
Step 3: Let p = linespace[P min , P max , K ]; Step 4: this interval, we obtain the optimal power cost c opt according to Equation (9). We now solve the utility maximisation problem, that is, arg max{U BS , U R }, by finding the Stackelberg game equilibrium state {P opt R , c opt }.

Utility maximisation algorithm
Algorithm 1 below shows the process of finding the Stackelberg game equilibrium using a light-weighted global search method. The computational complexity of this algorithm is mainly determined by the number of users with mutual interference M , the number of multimedia packets N , and the range of the power interval [P min , P max ] which in turn determines the maximum number of steps of iteration. Since the gap between P min and P min is small, we would reach the P * R in a few steps under fixed M and N . The algorithm will con-verge much faster when the Jacobin can be computed which is a prerequisite condition for using Newton method [27]. As shown in Figure 3, it only needs a couple of iterations to reach the converged stationary point where the first derivative equals zero, when we apply the Newton method in the algorithm in the case of one uplink interference source case (M = 1). Obviously, the utility function meets the following two conditions: First, a real function which is differentiable must be a continuous function. Second, a continuous real function on a closed interval must contain a maximum value and a minimum value 4. The computational overhead of this propose game theoretical approach is reasonable for D2D power request. When the Nash Equilibrium exists, the computational complexity is almost linear with regards to the granularity of the power allocation. The requested D2D power increases in steps from low to high, and at each step the optimal cost could be calculated from the power value, by letting the first order derivative equal to zero for the utility function. Therefore the computational cost is manageable when the Stackelberg game-theoretic model is properly formulated.

DOES THE EQUILIBRIUM EXIST? CASE DISCUSSION OF DISTORTION AND PRICES
The purpose of the proposed strategy is to enhance the QoE of the UE, and to improve the utilities of the BS and the RD by investigating the potential interference between delegated relay and succeeding uplink traffic. To explore the relationship among the power purchasing cost of BS, transmission power of RD and the distortion reduction gain on D2D relay link, we discuss six diverse scenarios and two practice activities as follows. We also evaluate the concavity of the BS' utility function by means of mathematical methods and graphical analysis in this section. The rationale behind the scenario discussion can be interpreted as the business transactions among three players. Activity 1, if the multimedia data quality (i.e. distortion reduction) that the UE experiences from the D2D relay link is poorer than it can experience from the BS directly, the BS would not delegate to any relaying service and there is no economic transaction between the BS and the RD. Activity 2, if the cost of relay power is too high, it will affect the BS' utility and the game equilibrium state. In this case, the transaction between the BS and the RD may not take place.
All factors such as transmitting power, relay delegation power cost and the gain of distortion reduction should be quantitatively analysed since they have impacts on the existence of the equilibrium in the game. Based on the first order derivative of U BS , we continue conducting the second order derivative with The discussion starts from a simple case. First we set the number of uplink M = 1, which means there is only one cached data relay delegation link and a single uplink in the system having high level mutual interference. Based on equations and analysis, the curves of the BS' utility, the BS' first order derivative and the second order derivative can all be obtained. Based on wireless system setup parameters shown in Table 2, we discuss the distortion reduction ratio between the delegated relay link and the succeeding regular uplink in six scenarios, respectively. Figure 4 shows the shape of the BS' utility function (black curve) and its first order derivative (red curve) and second order derivative (green curve). As we can see from the figure, the first and the second order derivatives hold the same pattern in all three scenarios (Scenarios 1, 2, and 3). In Scenario 1, the impor-tance of multimedia data carried via the D2D relay link is 7% of the importance of multimedia data carried via regular uplink link. This is to say, the regular uplink carries much more important multimedia data than the D2D relay link. In Scenario 2, the ratio is 8%, and 10% for Scenario 3. When the ratio is lower than 7%, the curves' trends are the same as Scenario 1; when the ratio is higher than 10%, the curves' trends are the same as Scenario 3. The utility function of the BS remains concave no matter what distortion reduction ratio is. But the optimal utility point (where the first order derivative equals zero) depends on the ratio. More specifically, when the ratio is less than 8%, the power of delegated relay P R needs to be negative to obtain the optimal utility, which does not make sense in practical cases. P R = 0 would be the best possible solution indicating no relay power is purchased. When the ratio is bigger than 8%, there is one optimal P R (with regards to the preset cost) that maximise the BS' utility. The BS and the RD reach the subgame perfect Nash equilibrium state with the current cost c and power P R . The interpretation of this case can be stated as follows: when the QoE requirement by UE is low, the BS should not pay the RD for data relay delegation service. Otherwise, when the QoE requirement by UE is high, the downlink wireless video service could be delegated to the RD and there is a Stackelberg game between the BS and the RD. Then the equilibrium is the best possible solution between BS and RD.
In addition, as previously mentioned in Activity 2, the cost of delegated relay power also plays an important role in the equilibrium analysis. Figure 5 shows the changing shapes of the BS' utility, and its first and the second order derivatives along with the cost c. The RD determines the cost c according to the BS' power purchasing request P R . In Scenario 4, the price is set at c = 18; in Scenario 5 the price is c = 14.6; in Scenario 6, the price is set at c = 10. In fact, the curves' trends of any condition that c > 14.6 follow Scenario 4, and the curves' trends of any condition that c < 14.6 follow Scenario 6. Scenario 5 shows a boundary situation. As illustrated in Figure 5, there is no positive power selection options for BS when c is greater than 14.6 (of course, this value may be varied along with the preset system parameters). This means the delegation relay power is too expensive-the BS would rather choose not to request the relay service when the RD sets very high relay power cost. The RD's decision should be rational and should also be constrained by the proposed system model for ensuring the existence of a game equilibrium.
After showing the concavity of the BS' utility function in these scenarios in a simple case where only one uplink has high level mutual interference with the D2D relay link, now we consider more generic situation with M ≥ 2. When there are many uplinks that have high level mutual interference with the D2D relay link, the utility of the BS has the same concavity property as with the simple case where M = 1.
The concavity property of the BS' utility function can be obtained by deriving the first and second order derivatives with respect to power P R . The terms  (2), since the first order derivative of c * P R is a constant and the second order derivative of c * P R is zero.  In addition, we assume the high level mutual interference caused by each newly added uplink UE into the system should not have big difference compared with other UEs, for the reason that all UEs use the similar power to transmit and receive data in similar distance. Hence we can rewrite the multimedia service quality Equation (2) in the following approximate form: where f and g are log functions of variable P R as shown in Equation (2): When an additional UE is added into the communication system, the high level mutual interference caused by the new UE would slightly change the dominator terms in f and g, and M increments by 1. This only affects the value of the optimal point of equilibrium state, but the concavity property of BS utility function would not change.

SIMULATIONS AND RESULTS
In this section, we perform simulations to evaluate the system performance and unveil the impact of system characteristics on the optimal transmission scheme. The "Foreman" standard video sequence with 30 fps is used as the input video data. Other system parameters used in the simulation experiments refer to Table 2. We first evaluate the impact of different values of data quality ratio, packet size and distance on the utility performance in the one interference uplink system. The simulation results are shown in Figures 6-8. We then explore system performance in complex cases (i.e. multiple interferences scenarios). There is a distortion reduction ratio threshold between the D2D relay link and the regular uplink to ensure the game equilibrium. The major comparative study in this paper is the scenario with D2D interference and the scenario without D2D interference. We are especially interested in the situation that D2D relay in the second time slot uses the same frequency of the regular uplink traffic to the BS. The advantage of this approach is to save a resource block in the second time slot, which is shared by the D2D relay and the regular uplink. Of course this D2D link causes interference to the regular uplink traffic back to BS. Since the BS is the initiator to request D2D user to help its relay, the performance in terms of interference impact deserves investigation.
Based on the simple case with one D2D relay link and a single regular uplink, we explore the utility performance of the BS in interference and non-interference scenarios. The results are showed in Figure 6. We obtain the optimal P R and c according to the proposed algorithm and use them to calculate the utility of BS in the interference case. For the non-interference scenario, the interference part (i.e. P i h i ) is ignored when computing the data service quality. As depicted in Figure 6, the utility of BS without interference outperforms the utility in interference case. Furthermore, it increases exponentially when the distortion reduction ratio grows from 10% to 50%, and then keeps growing linearly after the ratio is greater than 50%. This result implies that the BS would like to encourage the UE to require higher media service quality from the beginning, since both the UE and the BS benefit more in the higher service requirement system (ratio more than 50%).  The utility gain of BS and Relay with different quantity of data Next, based on the utility models of the BS and the RD, we numerically show how the utilities would change along with the packet size. As illustrated in Figure 7, the utility of the BS increases when the UE handles larger packet length. But for the RD, the utility decreases based on values from Equation (5). Especially, when a mobile device acts as RD in the relay service, the service time is strictly limited by the capacity of its power (eventually determined by both battery life and residual energy). The proposed scheme ensures the data relay service can continue as long as U R > 0 and U BS increase.
Besides the transmitting power, the distances (the distance between the UE and the BS, and the distance between the RD and the UE) are other factors that affect the system performance. Figure 8 illustrates the effects of different distances on the BS's utility gain. We study the following three cases in our simulation. Case 3: Transmitting distance of the regular uplink (UE-BS) is 1m shorter than the distance of D2D relay link.
As depicted in Figure 8, the data service quality decreases rapidly with the increasing distance between transmitter and receiver. In the non-interference scenario, the maximum utilities of the BS in varies distances have a significant impact on the equilibrium. While in the interference case, the data service quality of the D2D relay link and the regular uplink could be affected by each other, and therefore the overall utility of BS does not change much by choosing the game equilibrium solution.
The above simulations only considers one uplink device in the system. We now consider multiple interference uplinks in the system. Figure 9 shows the changes of data service quality with the increasing number of interference links. We study data quality in the delegated D2D relay link and regular uplinks as well as the overall service quality of the BS (as defined in Equation 2, where the value of bandwidth B equals to normalised unit '1') in the simulation. In the non-interference case, the data quality almost reaches the maximum where the x-axis value equals to 0. But it decreases significantly when interference occurs. The multiple regular uplinks will affect the data quality in the D2D relay link significantly, even if we choose the power and relay cost from the equilibrium state. As it can be seen from this figure, the overall data service quality gradually decreases when two or more uplinks join the system. It is worth nothing  Illustration of data service quality in the various sources of interference scenarios that besides the interference to the D2D relay link, the regular uplinks also interfere with each other. When there are more than four sources of interference, the quality of relay service over the D2D link tends to be the same as the average quality over regular uplinks. This represents a practical case that facing many sources of interference, the UE can not experience high media quality through the RD's relay service. In other words, the BS would choose to serve the UE directly instead of assigning the D2D relay service when there are too many sources of interferences.

CONCLUSION
In this paper, a new QoE-centric game theoretical resource allocation mechanism is studied considering cached wireless multimedia relay delegation. The proposed cached content relay delegation scheme jointly considers the physical interference of allocated power introduced by the previous time slot, the QoE of UEs, and the utility gain of the BS. In order to properly analyse the cached content relay delegation strategy and the resource allocation scheme, we model the BS and the RD as rational players in a Stackelberg game and construct their utility functions by considering purchased power interference, communication resource expenditure and economic rewards of cached content relay delegation. The multimedia quality of source data and mutual interference from other regular uplinks are consider in the system model. The best possible QoE and desirable utilities at the Stackelberg equilibrium state are determined in the proposed system. Furthermore, the existence of Stackelberg equilibrium was discussed and analysed. We analytically determined that the game between the BS and the RD depends on the QoE requirement of UE and the relay power cost of the RD. The simulation results show that all players would benefit from our proposed utilities maximisation algorithm in equilibrium state. The RD gets positive utility rewards to provide the multimedia relay service, the UE experiences higher QoE in the media service and the BS obtains better utility through offloading certain work to the RD. Future work could be focused on various emerging research issues in D2D relay considering interference mitigation. In certain situations as we discussed in this paper, it might be difficult to find the Nash Equilibrium of the Stackelberg game theory. In some other situations, it is hard to prove the existence of the Nash Equilibrium. Recent advances in machine learning research attracted attention from the community. Future focus could be investigating reinforcement learning or generative adversary network learning to attain sub-optimal power allocation solutions. Future work could also investigate the reliability and fault tolerance issues.