Overall computing offloading strategy based on deep reinforcement learning in vehicle fog computing

: In order to solve the problem of network congestion caused by a large number of data requests generated by intelligent vehicles in LTE-V network, a brand-new fog server with fog computing function is deployed on both the cellular base stations and vehicles, and an LTE-V-fog network is constructed to deal with delay-sensitive service requests in the Internet of vehicles. The weighted total cost combines delay and energy consumption is taken as the optimisation goal. First a reinforcement learning algorithm Q -learning based on Markov decision process is proposed to solve the problem for minimising weighted total cost. Furthermore, this study specifically explains the setting method of three elements for reinforcement learning-state, action and reward in the fog computing system. Then for reducing the scale of problems and improving efficiency, the authors set up a pre-classification process before reinforcement learning to control the possible values of actions. However, considering that as the number of vehicles in system increases, Q -learning method based on recorded Q values may fall into a dimensional disaster. Therefore, the authors propose a deep reinforcement learning method, deep Q -learning network (DQN), which combines deep learning and Q -learning. Experimental results show that the proposed method has advantages.


Introduction
With the development of intelligent transportation applications, the Internet of vehicles (IoV) technologies, which provide communication support for information transmission between vehicles, are attracting increasing attention. IoV is a specific application of the Internet of things (IoT) technologies in the field of intelligent transportation. It utilises an efficient vehicle-tovehicle information collaborative communication platform to improve the efficiency of traffic operation and improve the safety of vehicle travel effectively [1][2][3]. In recent years, people have put forward higher requirements for vehicle computing and storage capabilities with the development of applications such as vehicle safety early warning, autonomous driving and in-vehicle multimedia. At the same time, the limited resources of vehicles and uneven distribution of resources between vehicles increase the difficulty of popularising a large number of in-vehicle applications [4,5].
With the increasing demands of network users for network applications, network access terminals are no longer limited to traditional computers and mobile phones. Use vehicles as network access terminals to form a vehicle-vehicle interconnection IoV, providing network connections for the vehicle to vehicle (V2V), vehicle to infrastructure (V2I), vehicle to pedestrian (V2P) and vehicle to network [6][7][8]. Supporting intelligent new applications such as autonomous driving, in-vehicle entertainment and intelligent grooming has become the main development direction of IoV in the future. The research on the vehicular network has also become a hot issue in current research [9].
Early research on vehicular network mainly focused on vehicle ad hoc network (VANET) [10]. Institute of Electrical and Electronics Engineers (IEEE) has formulated the 802.11p standard for VANET networking. However, the problems of VANET's network reliability, node hiding ability and unlimited delay have not been improved very well. Long-term evolution-vehicle (LTE-V) is considered to be able to solve the above problems effectively, and it provides a solution for real-time IoV application services. Due to the huge amount of data generated in the IoV, in order to effectively handle these data, in order to solve the problem of network congestion caused by a large number of data requests generated by intelligent vehicles in LTE-V network, this paper considers a new fog server with fog computing function, which is deployed on both the cellular base stations and vehicles. Furthermore, an LTE-V-fog (LVF) network is constructed to deal with delay-sensitive service requests in IoV.

Related work
Authors in [11], defined LTE-V as two communication modes, LTE-V-direct (LVD) and LTE-V-cell (LVC). LVD and LVC can coordinate with each other to provide an integrated vehicle to X solution. Rafael and Javier [12] compared LTE-V with IEEE 802.11p, and introduced the standard research and evolution of LTE-V. Besides, it also showed the industry development and typical demonstration applications of LTE-V. Finally, the development strategies of LTE-V and its future technological evolution route is prospected.
In IoV, in order to ensure intelligent driving and safe driving of vehicles, and at the same time to ensure the passengers' entertainment experience, vehicles need to interact with objects such as base stations, other vehicle nodes, roadside unit and other objects. In a short period of time, real-time information with a huge amount of data will be generated. In a cloud-based network structure, these data need to be uploaded from vehicles to clouds through a wireless access network. After the data is processed and analysed by the cloud the processed results are returned to vehicles. Data upload processing will increase the transmission delay of tasks. At the same time, the influx of massive data will inevitably lead to queuing of services, which leads to higher network delays. For vehicles that are using autonomous driving in IoV, high time delays are likely to cause serious traffic accidents excessively.
Fog computing can effectively solve the above problems by deploying edge servers with storage and computing functions at the edge of networks close to users for receiving and processing user data. Since the deployment location of fog servers is closer to users, a large amount of information transmission time can be saved. The deployment of fog computing nodes in IoV network has become a popular trend in the research of IoV. In order to promote the good integration of fog computing technology in IoV, [13] first introduced a collaborative and distributed computing architecture built with vehicles as fog computing resource nodes. The author proposed a cooperative task offloading and output transmission mechanism to ensure the network performance of low-latency applications. Finally, simulation analysis showed that this solution can reduce the perception reaction time while ensuring driving experience in 3D scenes. Li et al. [14] considered the introduction of fog computing into the route sharing service model in in-vehicle network, and fog nodes preprocessed the data of users. Hou et al. [15] discussed four scenarios of moving and parking vehicles as communication and computing infrastructure and quantitatively analysed the fog computing capabilities of vehicles. In addition, the characteristics of parking behaviour patterns were obtained from recognition for the use of vehicle resources. Yao et al. [16] considered the integration of computing and storage resources for parked vehicles to form a vehicle fog service. They proposed a new mechanism consisting of a vehicle fog construction method and a vehicle fog service access method. This mechanism ensured the reliability and safety of vehicle fog services without sacrificing performance. Sookhak et al. [17] designed the architecture on application layer software for fog computing of vehicles. The architecture was divided into three levels: application and service layer, strategy management layer and abstraction layer.
However, migrating tasks to fog nodes will inevitably increase the power consumption of system. How to choose the task migration method under the condition of limited power is also a content that needs to be studied in depth. For minimising the terminal's energy consumption [18] chose to optimise the wireless resource allocation and computing resource allocation together. The energy consumption minimisation problem was expressed as a mixed-integer non-linear programming problem. However, this problem can be affected by the delay constraints of specific applications. In order to solve this problem, the author redeveloped branch and bound method based on linear technology. Therefore, the optimal result or sub-optimal result can be obtained by setting the solution accuracy. Considering that the complexity of the above method cannot be guaranteed, the author further designed a greedy heuristic algorithm based on the Gini coefficient. By transforming the problem into a convex optimisation model and solving it, it was proved by simulation that this algorithm can save system power expenditure effectively.
Wang et al. [19] studied the off-board edge task offloading scheme based on parked vehicle relay. In order to reduce the communication service delay of IoV network and in order to effectively use the idle resources of parked vehicles at the same time, they proposed a solution capable of offloading real-time traffic management. This solution minimised the average response time of vehicle incident reports. Moreover, a distributed urban traffic management system is constructed by using parked vehicles as edge nodes. The system relieved the pressure of data transmission on the Backhaul link, which reduces the network delay effectively. Liu [20] considered a network structure LVF in which a fog computing server is deployed at both macrocell base stations and vehicles. The author also established a mathematical optimisation model that minimises power overhead with time delay and power consumption as constraints. Besides, the simulated annealing algorithm was used to solve the optimisation model.
The authors in [13][14][15][16][17][18] studied how to efficiently offload tasks to base stations in IoV network. However, in various applications of the future IoV network, users are extremely sensitive to the requirements of delay, such as collision warning and departure warning. If tasks are migrated to base stations, the task migration method in which base stations execute tasks and then return the result is bound to increase network delay. The task migration method in [19] reduced network delay. However, due to the fact that the engine of a parked vehicle is in a stopped state, it cannot provide a lasting power output. The network deployment with the aid of parked vehicles as relay nodes needs further study in terms of network stability. In addition, the method in [20] was rough. How to choose the task migration method under the condition of limited power is also a content that needs to be studied in depth. In recent years, deep learning technology has been developed rapidly and has made great achievements in many research fields. For example, [21,22] in this study, automatic semantic segmentation based on cell type is proposed for the first time in the literature using novel deep convolutional networks structure (DCNN). DCNN is applied to image semantic segmentation, and it is effective. Therefore, we propose a deep reinforcement learning method, deep Q-learning network (DQN), which combines deep learning and Q-learning.
Although the current research on fog computing and LTE-V has become a popular research direction, there are a few works that combine them. The main contribution of this paper consists of the following two parts.
(i) In LTE-V network, a brand-new fog server with fog computing function is deployed on both cellular base stations and vehicles. We construct an LVF network to deal with delay-sensitive business requests in IoV and also lists the delay and energy consumption methods in the computing process in detail. (ii) In the above LFV network structures, the weighted total overhead that combines delay and energy consumption considerations is taken as the optimisation goal. In view of the fact that Q-learning method of recording Q-values may fall into a latitude disaster, this paper proposes a deep reinforcement learning method, DQN, that combines deep learning and Q-learning. Finally, simulation results show that our proposed algorithm can better achieve the effect of reducing the total system overhead under different system parameters.

System model
The LFV system model is shown in Fig. 1. There are two ways for vehicles in this model to maintain communication. One is based on the LVC mode from vehicles to cellular base stations, and the other is based on the LVD mode between vehicles. Among them, vehicles are directly connected between vehicles for V2V scenarios in LVD mode. LTE physical layer parameters can be modified as a distributed architecture. We provide short-distance direct communication on the basis of ensuring versatility, so as to ensure that the vehicle nodes have low latency and high-reliability performance requirements for networks, which mainly be used to ensure the safe driving of vehicles. In LVC mode, the vehicle uses a traditional centralised method to communicate with cellular base stations. LVC based on centralised network architecture can better support the application of V2I and V2P scenarios by optimising wireless resource management [23][24][25][26].
In LTE-V networking mode, vehicles need to have lane departure warning, forward collision warning, traffic sign recognition, vehicle perception sharing, pedestrian collision warning and dangerous driving reminder functions. In order to achieve the above functions, a variety of sensor devices such as lidar, ultrasonic radar, image sensor and millimetre wave radar need to be installed on the body. These devices can generate huge amounts of data in a short time. In order to reduce the data transmission delay and improve the responsiveness of the network, it is necessary to deploy fog nodes in IoV network [27,28]. In LVC mode, fog nodes are deployed in cellular base stations using a centralised network architecture. In LVD mode, a distributed network structure model is used to deploy fog nodes in each intelligent vehicle.
Let the coverage effect of the cellular base station be as shown in Fig. 1, and the maximum coverage distance of this cell is L. Within the coverage of macrocell base stations, the vehicle that initiated migration requests of task and fog nodes of the vehicle are in the same direction. Suppose the running speed of vehicles is V c t . The high-speed running of vehicles will cause a Doppler frequency shift, which will further affect the success rate of task migration. For the convenience of analysis, the maximum running speed of vehicles is limited to V c max t , that is, ∀V c t ≤ V c max t is established. Under this speed limit, tasks can migrate to fog nodes successfully. On this basis, a rectangular coordinate system for calculating the distance between vehicles and base stations and between vehicles and vehicles is established. Let the reference coordinate of the location for base stations be O 0, 0 . The horizontal right direction is the positive direction of x-axis and the horizontal upward direction is the positive direction of y-axis. If the current position of vehicles with the request of task offloading is L v x v , y v , the distance between vehicles and base stations can be expressed as If the current position of the i vehicle with fog computing capability is L f i x f i , y f i , the distance between mission-initiating vehicles and the i fog node vehicle can be expressed as Let the computing power of edge computing servers at the macro base station and the micro base station be F c c and F c d , respectively. Assume that the number of vehicles with fog node computing power is m, which constitutes a set M, which satisfies M = 1, 2, …, m . Suppose that the set consisting of the number of vehicle users n is N in the system, which satisfies N = 1, 2, …, n , and N ⊂ M is established at the same time. Let the task of the user n be W n , which is mainly composed of the maximum delay tolerance value of tasks, the amount of data contained in tasks, and the CPU cycles required to complete this task, that is, W n = τ n , a n , d n . Among them, τ n is the maximum delay tolerance of tasks, a n represents the data amount of tasks and d n is the CPU cycle required to complete this task. Due to the different delay sensitivities of tasks and the data volume of tasks, the tasks generated by vehicles can be executed by fog servers deployed on the vehicle body. When a task that cannot be processed by the body fog server occurs, the vehicle can choose to offload tasks to the cellular fog node or other vehicle fog nodes by LVC or LVD for execution. In order to facilitate the analysis, this paper only considers that vehicles can receive the return results of fog nodes before they leave the coverage area of this base station.

LVC mode
In this mode, vehicles relocate tasks to the macrocell base station by wireless signals. Within the coverage of macrocell base stations, in addition to vehicles, the service requests of data faced by the macro cell also include other smart IoT devices such as smart wearable devices and smartphones. Thus, the task of moving vehicles to the macro cell also needs to wait in line. After the computing task processing is completed, the macro cell returns processing results to vehicles [29][30][31]. Consequently, the total time T tol c spent from initiation to completion of tasks can be expressed as where T trans c is the time it takes for vehicles to send tasks to fog nodes. T que c is the time to wait for queue after tasks reach fog nodes.
T pro c is the processing time of the task, and T back c is the time required for fog nodes to return the data to vehicles.
Suppose the macro base station assigns a channel i to a vehicle n, then the transmission rate between this vehicle n and the fog edge server is expressed as where W i represents the bandwidth of the channel i and SINR n i represents the signal-to-interference and noise ratio of the channel i, which satisfies SINR n i = P nc ⋅ h n i / ω i + I i . P nc is the transmission power of vehicle nodes, and h n i is the channel gain. ω i and I i represent the noise and interference of the channel i, respectively. In this paper, it is assumed that the moving range of vehicles is within a certain interval, and the channel gain is constant within this interval. Therefore, the schedule required for transmission tasks can be described as T trans c = a n /r n i (4) Let y n m represent the computing power assigned to the vehicle n by the edge server of macro base stations. Then the time required by macro base stations to complete task n can be expressed as When the task is processed at the edge of the macro, the power cost required is expressed as Suppose the interval between successive data streams reaching the fog server of macro base stations follows a Poisson distribution with index λ c − task . The service time of fog servers follows a Poisson distribution with index λ c . The service model of the fog node of macro base stations can be modelled as an M/M/1 queuing model, and then the service rate is ρ = λ c − task /λ c . According to the theoretical knowledge of queuing theory, the average queuing time of tasks is expressed as After the task processing is completed, the fog node returns computing results to the vehicle n by channel j. Let the amount of data processed by tasks be a back c , which satisfies a back c = ηa n , where η is the ratio of the amount of data before and after task execution. The return time of computing results can be expressed as T back c = ηa n /r n i (8) where r n j = W j log 2 1 + SINR n j is the transmission rate from base stations to channel j between the vehicles. W j is the bandwidth of channel j; SINR n j is the signal-to-interference and noise ratio of the channel j, which satisfies SINR n j = P nb ⋅ h n j / ω j + I j . P nb is the transmit power of base stations; h n j is the channel gain. ω j and I j are the noise and interference of channel j, respectively.
According to formulas (2) and (6), considering the execution delay and energy consumption, the weighted cost calculated by the vehicle selecting fog servers with macro base stations is Here 0 ≤ I n, t , I n, e ≤ 1 represent the weighting parameters of execution delay and energy consumption for the vehicle n. Different users can choose different weight parameters. When users deal with different types of applications with different requirements, the parameters of the same user may also change dynamically [32,33]. Of course, the weight value here also has the effect of unifying the delay and energy consumption parameters of different units.

LVD mode
Similar to the analysis of LVC, the execution time of tasks on the micro-edge server is also composed of four parts: transmission time, queuing time, processing time and return time, which satisfies where T tol d represents the total time required to offload tasks to other vehicle nodes to complete calculation and return the processing result to vehicles. The transmission time, queuing time, processing time and return time are T trans d , T que d , T pro d and T back d , respectively.
In the task migration phase, the fog node of vehicles receiving tasks and the vehicle node requesting task migration are connected by the channel k. The transmission rate is expressed as where W k represents the channel bandwidth. SINR n k is the signalto-interference and noise ratio of the channel k, satisfying SINR n k = P nd ⋅ h n k / ω k + I k . P nd is the transmit power of terminals; h n k is the channel gain. ω k and I k represent the noise and interference of the channel k, respectively. The time it takes to transfer data can be expressed as T trans d = a n /r n k (12) Let y n d represent the computing power allocated to the vehicle n by fog nodes. Then the time required to perform the task n can be expressed as The required power overhead is expressed as Suppose that the time interval that task flows successively arrive at the vehicles with fog computing power follows the Poisson distribution with index λ d−task . The service process provided by fog servers follows a Poisson distribution with index λ m . Using knowledge of queuing theory for analysis, the achievement of task flow can be modelled as a M /M /c queuing model. c is the number of vehicles that have deployed fog servers and can provide task computing power, and set its service rate to a fixed value μ s . Then the service rate of the fog server is ρ 1 = γ 1 c /b 1 μ s . γ 1 c represents the data stream waiting to be processed by servers, satisfying ρ 1 < 1. According to the theoretical knowledge of queuing theory, the average waiting time of a task can be expressed as The fog node based on vehicles returns the computing result to the vehicle n by the channel l. Let the amount of data processed by tasks be a back d , which satisfies a back d = δa n . δ is the ratio of data volume before and after task execution, and the return time is expressed as T back d = δa n /r n l (16) where r n l = W l log 2 1 + SINR n l represents the transmission rate of the channel l from base stations to vehicles. W represents the bandwidth of channel l; SINR n l represents the signal-to-interference and noise ratio of the channel l, which satisfies SINR n l = P nv ⋅ h n l / ω l + I l . P nv is the transmit power of base stations; h n l is the channel gain. ω l and I l represent the noise and interference of the channel l, respectively. Therefore, for the vehicle n selected to be offloaded to other vehicle fog node calculations, the weighted total cost of the integrated delay and energy consumption is

Problem modelling
From the previous section, we learned the weighted total cost of vehicle selection to offload to the macro base station fog server calculation or offload to other vehicle fog nodes calculations. The vehicle can only choose one of the calculation methods. The goal of optimisation is the sum of weighted total costs of all users, as shown in the following formula: The work in this section is carried out under the overall offloading conditions. Task R n cannot be split into two parts to perform calculations on the fog server of macro base stations and the fog nodes of other vehicles at the same time. That is, each vehicle n can only choose to offload the entire task to the fog server of macro base stations for processing or offload tasks to the fog nodes of other vehicles for processing [34]. We use 0-1 variable α n ∈ 0, 1 to represent the offloading decision of vehicle n and define A = α 1 , α 2 , …, α n as the offloading decision vector of the entire fog server system. If α n = 0, vehicle n chooses to offload tasks to the fog server of macro base stations for processing. α n = 1 means the vehicle n chooses to offload it to other vehicle fog nodes for performing the calculation. The offloading decision and resource allocation must be made while satisfying the maximum delay constraints and server resource constraints. The problem of minimising the weighted total overhead of all users can be expressed by the following formula In the above formula, A = α 1 , α 2 , …, α n is the offloading decision vector, and f = f 1 , f 2 , …, f N is the resource allocation vector. The purpose of the optimisation problem is to minimise the weighted total overhead of all users. Among them, constraint C 1 ensures that each user can only choose to offload tasks to the fog server of macro base stations for processing or offload tasks to the fog node of other vehicles for processing. Constraint C 2 ensures that whether the offload of the entire task to the macro base station fog server processing or other vehicle fog node processing delays must satisfy users' maximum tolerated delay. Constraint C 3 guarantees that under the condition that the computing resources of a single fog computing server are limited, the sum of computing resources provided by fog computing servers to users will not exceed the computing capacity limit of the fog computing server itself. Constraint C 4 ensures that the computing resources allocated to a single application do not exceed the computing capacity limit of the fog computing server itself. By finding the optimal value of offloading decision vector A = α 1 , α 2 , …, α n and computing resource allocation

Model solution based on deep Q network
In this section, three important elements of DRL agent are first defined in detail: state, action and reward. Then, the reinforcement learning algorithm Q-learning is introduced to transform the abovementioned problem of minimising weighted total cost into a problem of finding the maximum Q-value. Due to Q-learning itself is prone to dimensional disasters, a DQN method is proposed. This method uses the deep neural network (DNN) to estimate the behaviour-value function of Q-learning, and can obtain the approximate maximum Q-value directly.

State, action and reward
There are three key elements in reinforcement learning methods, namely state, action and reward. Specific to the reinforcement learning environment of this paper, they are defined as follows: State: The system state s consists of two parts, s = tc, ac , where 'tc' is defined as the total cost of the entire system. tc = C all is the weighted total cost to be minimised, and the cost value of each state needs to be observed to determine whether the best state is reached. 'ac' is defined as the current number of idle computing resources of fog computing server, and the calculation formula is ac = F − ∑ n = 0 N f n . Observing its role is to satisfy the computing capacity constraints.
Action: In our system, each action a consists of two parts, which are the uninstallation decision A = α 1 , α 2 , …, α n and resource allocation f = f 1 , f 2 , …, f N of all users. Thus, combining the possible values of A and f , the motion vector α 1 , α 2 , …, α n , f 1 , f 2 , …, f N can be obtained.
Return: For each step in the process of finding the best state, the agent will get a return R s, a after performing a possible action a in the state s. The goal of reinforcement learning is to get the maximum return [35]. Generally speaking, the reward function should be related to the objective function. If you want to use reinforcement learning to solve the problem (18), you must connect the reward function to the objective function to be optimised. Since the goal of optimisation is to minimise total cost, and the goal of reinforcement learning is to maximise the return. Therefore, the set reward function should be inversely related to the original objective function. The instantaneous return is defined as R s, a = (tc LVC − tc s, a )/tc LVC , which means that the current weighted total cost is reduced by less than the weighted total cost calculated by full macro base station fog server. The larger R s, a , the smaller the tc in the current state. Obtaining the maximum return is equivalent to obtaining the minimum weighted total cost, which transforms this problem successfully.
It should be noted that as the number of users increases, the possible values of actions composed of offloading decisions and resource allocation will increase rapidly. This is not conducive to the operation and convergence of algorithms. In order to limit the size of action space, a pre-classification step is proposed before the learning process. For vehicles, we first determine whether the fog server calculation of macro base stations can satisfy the time limit. If T tol c ≤ τ n cannot be satisfied; then set vehicles to choose to offload the equipment processed by other vehicle fog nodes, i.e. α n = 1. In addition we must allocate enough computing resources to satisfy the time limit, i.e. f n ≥ D n /(τ n − (B n /R n )). In this way, the possible values of A and f are reduced, which improves the efficiency of algorithms.

Solution of deep Q network method
Q-learning is a classic reinforcement learning algorithm, a method of recording behavioural values (Q-value). Each state-action group has value Q s, a . For each step, the agent calculates and stores Q s, a in the Q-table, and this value can be regarded as the expectation of long-term return. Then the Q s, a update formula can be expressed as Q s, a = r s, a + γ * max Q s′, a′ (20) where Q s, a is the current state and action and s′, a′ is the state and action of the next time slot. We define γ as the learning rate, γ is a constant that satisfies 0 ≤ γ ≤ 1. It is worth noting that if γ tends to 0, it means that the agent mainly considers the current instantaneous return. If γ tends to 1, the agent is also very concerned about future returns. For each iteration of the value of Q s, a , you can get the best A and f . The Q-learning algorithm can solve the problem (19) by obtaining the maximum return, but this also has a lot of defects. If Q-learning algorithm is used, this means that for each state-action group, the corresponding Q-value needs to be calculated and stored in a table. However, in practical problems, the possible state may reach the scale of 3 161 , such as the chessboard. If all Q-values are stored in Q-table, the matrix Q s, a will be very large. This makes it difficult to get enough samples to traverse each state, which leads to the failure of the algorithm. Therefore, a DNN is used to approximate Q s, a instead of calculating the Q-value of each state-action group. This is the basic idea of deep Q-network. The execution process of DQN algorithm is shown in Algorithm 1 (see Fig. 2).

Simulation settings
In this section, the performance of overall computing offloading strategy based on deep reinforcement learning in the proposed are cyclically symmetric complex Gaussian distributions with mean m μ and variance σ μ 2 . σ is the path loss factor, which is set to three in the simulation. The specific calculation formula of distance d is given by formula (1). The maximum running speed V c max t of vehicles is set to 30 km/h. The actual running speed V c of vehicles follows the uniform distribution between 15, 30 , and the base station coverage radius L is set to 500 m. The transmission power of vehicles with mission request is 2 W and the CPU cycle d n follows the uniform distribution among 100, 1000 × 10 9 (cycle). The computing power of all vehicles is 10 × 10 9 (cycle/s), and the computing power of macro base stations is 500 × 10 9 (cycle/s). The delay tolerance τ n is between 0.1, 0.5 (s).
To simplify the problem, we ignore the differences between the CPUs of different devices and set the maximum CPU frequency to f max l = 1 GHz/s. According to [36], it is assumed that the data size of uploaded data obey the uniform distribution between (300 and 500). The number of computing resources required follows an even distribution between (900 and 1100). Each time slot is equivalent to one experiment, and the average of multiple experiments is taken for the relevant performance indicators. Fig. 3 evaluates the impact of the learning rate γ, service rate μ s and weight value θ on the system traversal capacity. We run 10,000 iterations, and repeat each iteration 100 times to get the average result. The dotted line at the top of Fig. 3 indicates the optimal traversal capacity of the network (without interference). As can be seen from Fig. 3, when μ s = 0.95, γ = 1 and θ = 0.9, the network performance is the best. This means that in order to achieve optimal performance, the direct return (γ = 1) of a given action must be considered, not the previous information. μ s = 0.95 indicates that there is a sufficient difference between the best action and the current action. In addition, when μ s ≃ γ and θ > 0.5 we can get better performance of the system.

Experimental results and analysis
We set different system parameters, and compare the proposed DQN algorithm performance with simulated annealing algorithm in [16,20], full LVC calculation and full LVD calculation. 'Full LVC computing' means that all users choose to offload to the fog server with macro base stations for computing. 'Full LVD calculation' means that all users choose to offload to the fog node of other vehicles for computing. In addition, at this time, the computing resources of the fog servers are evenly distributed to each vehicle.
The abscissa in Fig. 4 is the number of users' equipment, and the ordinate is the weighted total overhead. This group of experiments explores the change in the weighted total cost of a fog server system as the number of vehicles increases. The limit computing power of fog servers is F c = 5 GHz/s. On the whole, when the number of vehicles continues to increase, the total cost of these four methods is on the rise. In Fig. 4, our proposed DQN method can achieve the best results. The performance of [16,20] is slightly worse than DQN, and the performance of these three methods is relatively stable. However, when the number of vehicles increases, the total cost of full LVC calculation and full LVD calculation increases rapidly. Since when the number of vehicles increases, full LVC calculations and full LVD calculation are being calculated, the fog computing server with limited computing resources cannot provide sufficient computing resources for each user. Consequently, it has dragged down overall performance. This shows that we should make trade-offs on vehicles and choose a reasonable offloading calculation to ensure the overall performance.
The abscissa in Fig. 5 represents the computing resource capacity of fog computing servers, that is, the limit computing power is F c . The ordinate is the weighted total cost of all users. The number of user devices in this set of experiments is fixed at 5. This group of experiments discusses the impact of the computing capacity for fog computing servers on the weighted total cost. It  can be seen from Fig. 5 that the special feature is that the all-local calculation curve and the weighted total cost does not change with the calculation capacity of fog computing servers. This is obvious since the number of computing resources for fog computing servers has no effect on the local computing process. The other three curves show a downward trend as F c increases. This is due to the larger F c , the server can allocate more computing resources to users, which reduces processing time and energy consumption. The curve of our proposed DQN method has always been at the bottom and performs best. Simulated annealing algorithm has a small gap with DQN when F c < 7 GHz/s, and then performs as well as DQN. Full LVC calculation curve varies greatly. When F c < 3 GHz/s, the weighted total cost is much higher than the results of the other four algorithms. For F c > 8 GHz/s, the performance is almost the same as the proposed algorithm based on reinforcement learning. The contrast between the front and back illustrates that full LVC offloading calculation cannot be simply performed under the condition of limited computing resources. When computing resources overflow, a simple benchmark method may also achieve good performance. Therefore, we have to choose the algorithm flexibly according to actual situation for solving this problem.
The abscissa in Fig. 6 is the size of uploaded data, and the ordinate is still the weighted total cost. The purpose of this set of experiments is to explore how various algorithms perform under different types of applications (usually different amounts of uploaded data). The limit computing power of fog computing servers is F c = 5 GHz/s. As can be seen from Fig. 6, as the size of uploaded data increases, the curves of all algorithms show an upward trend. Since a larger amount of data means more time is spent uploading and processing data. The process also increases energy consumption accordingly, resulting in an increase in overall system overhead. It can be seen from Fig. 6 that our proposed DQN method works best because it rises the slowest among the five lines. The upward trend of the curve calculated by full LVC is much higher than that of the other four curves, and the gap between the performances of the other four algorithms is growing. It can be seen that when the application is more complicated and a large amount of data is required to complete the calculation, selecting LVD calculation can obtain a greater reduction in the total cost index compared with LVC calculation.
Moreover, it can be seen that the increase in the size of uploaded data can lead to a significant increase in the total cost of the fog computing system. The reason is that the amount of uploaded data and the amount of processing calculation are positively correlated, an increase in the amount of uploaded data will increase the calculation cost correspondingly.

Conclusion
The amount of data generated in IoV is extremely large. For processing these data effectively, a network structure in which fog computing nodes are deployed at both macrocell base stations and vehicle nodes is considered. In this structure, the vehicle that generates computing tasks can choose to process relevant data by itself. In order to further improve processing efficiency, computing tasks can also be migrated to different types of fog nodes. However, the system power consumption and network delay corresponding to different migration methods and migration ratios are also different. We propose a deep reinforcement learning method, DQN that integrates deep learning and Q-learning. The network considers the optimisation problem in each time slot, and is actually a pseudo dynamic strategy. Considering that users' position at each moment is dynamic, the task requirements also change dynamically. Thus, future work may consider combining business load forecasting to do long-term optimisation strategies.
Mobile edge computing has broad application prospects in the future 5G era. The research of computing offloading and resource allocation algorithms is also a research hotspot in mobile edge computing. Based on reinforcement learning and deep learning technologies, a new algorithm that is different from traditional optimisation is proposed to solve computing offloading problem, which achieves good results. However, there is still a direction worthy of further research in the future. There are still unsolved problems in many aspects, and there is still room for further improvement. The following will list the existing problems and propose some possible solutions. Future research will focus on these aspects for improvement: (i) In addition to providing computing power, fog computing server also has certain storage capacity, which stores some popular resources in the local server. If the resources that the user expects to obtain through the calculation uninstall already exist, the calculation process does not need to be executed any more. Therefore, the combination of cache and compute offload is a worthy research direction. (ii) At present, the vehicle sample library only contains cars, buses and trucks, and the number of categories is still small. The subsequent work can collect more categories of vehicles and increase the sample image of various vehicles. (iii) In the field of pattern recognition, the generalisation and generalisation ability of the model is also very important. We can not only pursue the fitting ability of the model in the training data but also improve its generalisation and adaptive ability. This is also an aspect that needs to be studied and improved in the follow-up work.

Acknowledgments
This work was supported by the education science planning project of the 13th five years from Guangdong Provincial Department of Education (No. 2018GXJK356).