City buses’ future velocity prediction for multiple driving cycle: A meta supervised learning solution

Vehicular velocity prediction is of great signiﬁcance to intelligent transportation system, as it provides a possible future velocity sequence for vehicle’s decision-making system. A velocity prediction method via meta learning is proposed, which provides an adaptive and generative framework for multiple-driving cycles. The prediction model is devised using a deep neural network structure. The model’s training is performed by the recently proposed meta-supervised learning, which ensures that one trained model could meet the adaptability to multiple driving cycles. The complete framework consists of three parts: Pre-training, ﬁne-tune-training and real-time prediction, which is tested to predict the hybrid electric city buses’ future velocity in a variable trafﬁc scenario. The average prediction accuracy of 3, 5 and 10 s horizons is 0.51, 0.63 and 0.88 m s − 1 , which is 25.9%, 16.78% and 7.47% higher than that trained by the conventional supervised learning method. As suggested, the proposed prediction method is effective and could meet the requirement of energy-saving control for hybrid electric city buses. With further study, potential application of this method may also exist in the ﬁeld of driving behaviour prediction and transportation mode recognition.


INTRODUCTION
Velocity prediction provides a possible future velocity sequence for vehicle's decision-making system. The predicted velocity is of great significance to vehicle intelligence, safety assisted driving and vehicle dynamics control [1,2]. Especially in the field of hybrid electric vehicles, a suitable velocity prediction model is very helpful for performing energy-saving optimization in the framework of model predictive energy management [3][4][5][6]. Specifically, the concept of velocity prediction in this paper mainly refers to predicting short-term future velocity during a city bus's driving mission. In the absence of additional traffic information, future velocity sequence can be inferred by historical driving cycles [7]. Although driving behaviour itself has great uncertainty and randomness, some hidden rules are still involved in the driving process [8]. Such a rule is reflected in the following fact: Different future velocity sequences derived from similar historical velocity sequence are also similar in probability [9]; the fact is also the theoretical basis for constructing a velocity prediction model. As city buses usually travel in a specific area, the variation of buses' velocity is believed to follow several fixed patterns. Therefore, the velocity prediction models in existing related research are often constructed through a representative driving cycle. In the situation that city buses travel in similar traffic scenarios, the performance of velocity prediction model does satisfy the requirements. However, city buses may experience multiple, different and varying traffic scenarios, where the actual driving cycle is unlike the assumed typical driving cycle, velocity prediction model based on one single representative driving cycle would not continue to perform well.
Meanwhile, with the development of intelligent transportation system, relevant research data is more readily available than ever before, which has led directly to the emergence of some meaningful research on variable traffic scenarios [10,11]. In ref. [12], the influencing factors of urban carbon emission have been studied, which helps to energy conservation and emission reduction of traffic system; in ref. [13], lane changing behaviour on urban street is studied, so more details about lane changing have been added to traffic simulation system. Thus, from the perspective practical applications, how to construct a velocity prediction model for multiple driving cycles has naturally arisen and become an actual demand which is worthy of further study.

Literature review
Short-term velocity prediction models currently used in energy management strategy are achieved through two main approaches: Markov chain and artificial neural network. In ref. [14], a Markov Chain Monte Carlo method is adopted to forecast velocity sequences for energy management system of a plug-in hybrid electric bus. Based on root-mean-square error of predicting accuracy, the author thinks that a sub prediction method called "multi-scale and single-step" is better than the other method called "single-scale and multi-step". In ref. [15], Markov-based velocity prediction method gets further studied; the proposed method with variable prediction horizon could improve prediction accuracy by about 7%, compared with the traditional fixed-horizon method. Typically, a fact is that the more historical data used, the more accurate the prediction. In order to make full use of historical information for a better prediction, the stage of Markov chain must be increased. However, a multistage Markov chain usually has a high computation cost [16], which limits the application of such a method. So, how to use more historical traffic information without increasing the computation cost, an artificial neural network should be a better choice.
Artificial neural networks, as well as deep neural networks, are successful methods for time series forecasting [17][18][19]; it has strong capability in predicting non-linear dynamic behaviours. In ref. [20], three kinds of neural networks are applied to construct the velocity prediction model. The results suggest that prediction accuracy of all three neural networks is better than that of Markov chain; none of them is obviously different from the other two networks. Although computation cost of training the model could not be ignored, performing velocity predicting using a trained network seldom consumes time.
In addition to directly using neural networks for velocity prediction models, some combination methods are also proposed [21,22]. In ref. [21], a BP neural network is used to compensate for the prediction error of Markov-based methods. In ref. [22], two Markov-based prediction models for different driving cycles are used alternately; the switching is determined by a driving cycle classifier constructed by neural network.
By using these combination methods, the actual performance of the velocity prediction model enhances the adaptability to multiple driving cycles. In other words, these proposed methods also reflect the demand for constructing velocity prediction models for multiple driving cycles. Taking the city bus as an example, the periodically varying traffic conditions will at least correspond to two states of city buses' driving cycle: Congestion state and non-congestion state. Under different states, the corresponding typical driving cycles are also different. Thereby, predicting accuracy of a model constructed through one specific driving cycle is usually lower than our expectation when the model is used for another different driving cycle. Obviously, existing traditional methods are difficult solve the problem of velocity prediction for multiple driving cycles. A direct solution method of velocity prediction for multiple driving cycles has not been proposed yet.
As an extension of regular machine learning methods, one of the main purposes of putting forward meta-learning theory is improving the generalization ability of machine learning models [23,24]; this primary motivation is consistent with our expectation of the velocity prediction model. According to the latest literature, many related approaches have been proposed for different specific purposes [25][26][27][28]. Among them, a general solution for learning multi-tasks is proposed in ref. [29]. By learning multiple tasks in parallel, the model parameters are updated by the direction of synthesis optimization of multi-objective functions, thus a basic model with the best adaptability for all tasks could be obtained. The idea could be further generalized to the area of velocity prediction.

Motivation and innovation
Stated thus, the work of constructing a velocity prediction model for multiple driving cycles has obvious potential in practical applications. Meanwhile, meta learning theory provides an effective approach to realize multi-task learning. So, a multipletask training method based on meta learning theory was first adopted to train the velocity prediction model, which reflects a new exploration of applying machine learning in the field of velocity prediction for hybrid electric vehicles. The main contributions are as follows: a. An online application framework of city buses' velocity prediction model for multiple driving cycles is proposed. The framework covers a complete workflow from prediction model's training to future velocity's online predicting. b. A multi-task training method for velocity prediction model based on meta-learning theory is proposed. The method could realize model's training under the condition of multiple driving cycles. c. Influence of iteration times on the prediction accuracy is fully discussed and a principle in terms of the selection of appropriate iteration time is proposed, which assists the continuous stability of prediction performance in real traffic scenarios.

Organization of this paper
The contents of this paper are as follows. Section 2 describes the whole framework of velocity prediction model. Section 3 proposes the complete training process. In Section 4, training results of two testing driving cycles are given, as well as, the discussion of selecting a suitable iteration time. Then, testing results under actual traffic scenario are shown in Section 5. The conclusions are finally given in the last section.

Velocity prediction framework
A framework to fully describe the velocity prediction method is shown in Figure 1. The whole process consists of three parts: pre-training step performed offline, fine-tune-training step performed online and velocity prediction in real-time.
In the pre-training step, a multi-task data set is constructed by collecting vehicle's historical driving information in various environments. The data set is applied to the pre-training process through meta-learning and the primitive model is trained as a base model.
In the fine-tune-training step, vehicle travels in a specific scenario and a corresponding training data set through real-time traffic information could be collected. By fine-tune-training, the basic model is updated to a fine-tune model with higher adaptability to current driving condition.
In the real-time prediction step, the fine-tune model which meets the accuracy requirement would be used as a velocity prediction model for online application. Once prediction accuracy gets obviously worse, fine-tune-training would be activated again to reupdate model's parameters using the latest training data collected in real traffic scenario.
The mapping from vehicle's historical velocity sequence to future velocity sequence is expressed as a function shown in Equation (1) where F ( * ) is defined as a deep neural network, V (t ± * ) is the velocity at different time, ΔH and ΔP represent the length of historical velocity sequence and prediction velocity sequence, respectively.
The network consists of one input layer, one output layer and several hidden layers. The specific structure is shown in Figure 2. The number of input and output layer neurons is  (2).
where, V t +i and V t +i r is the reference and prediction velocity at time t , respectively.

The training process for multiple driving cycles
The training process mainly includes the first two steps in our framework: Pre-training and fine-tune-training. Pre-training is a task-level training, aiming at updating model's parameters to an optimal state. Fine-tune-training refers to a further training based on the well-trained base model.
Pre-training is achieved through a model-agnostic metalearning method [29]. The idea was originally used for solving image classification problems; it has not been used for training   Table 1 guides the training for multiple groups of supervised learning tasks.
where is the model's generalized parameters; ∇ * is the gradient of * to ; f ( ) is the output value with parameters being ; L Ti ( * ) is the ith task's loss function.
Specifically, our tasks are supervised learning with historical velocity and future velocity from different driving cycles. For the purpose, the training process is described in Figure 3 and the details are as follows.
For a more convenient description, all the weights and offsets of the network are expressed as one unified variable: . To begin with, training-data pools should be defined; each driving cycle corresponds to an independent data pool. The data pools are named separately as: Pool i , i = 1, 2, 3, … , n. Each velocity vectors for training has the following format: In order to update , the following steps should be performed: a. For each data pool, performing the following three substeps: (a:1) By random sampling, extract a batch of data vector (including several sets velocity vector) from the trainingdata pool. Then, divide them equally into two sub-sets: A support set and a query set. (a:2) Using support set, update the original parameters from k to k i by gradient descent. (a:3) Based on the updated parameters, use F ( k ) and training-data provided by query set to calculate the loss function L i as Equation (2) shows.
a. With losses of each data pool, the total loss could be calculated by Equation (3) where is a positive reference coefficient.
a. Calculate the derivative of L task to and perform gradient descent to update the original parameters as Equation (4) shows.
where meta is the learning rate.
a. Repeat the above steps until a satisfying base-model is received.
As pre-training has completed the learning for multiple driving cycles, the following fine-tune-training only needs to be performed several times by training data collected from specific traffic scenarios. In this step, parameters updating is still performed through gradient descent, but the training data used here just corresponds to one single data pool. Thus, fine-tune Relationship between pre-training and fine-tune-training step is a single-task training, the purpose is to enhance prediction model's adaptability to specific traffic scenarios. Figure 4 describes the relationship between pre-training and fine-tune-training. Through pre-training, the model's parameters are updated to a sub-optimal area in parameter space, where the parameters are very close to the optimal parameters for each driving cycle. Through fine-tunetraining, the parameters could be easily updated to the optimal area.

Driving cycle section
Normally, the route of city buses is basically fixed. Thus, the variation of velocity is mainly affected by traffic conditions which have periodic variation regularity. To demonstrate the performance of the proposed method, we assume that buses' driving scenario is just switched between two different states: Non-congestion state and congestion state. Based on the analysis of characteristic parameters, the two states could be represented by two different driving cycles: UDDS (urban dynamometer driving schedule) and WVUSUB (West Virginia suburban driving schedule). These two driving cycles and their characteristic parameters are described in detail in Figure 5 and Table 2, respectively. As is shown, both the duration time and driving distance of the two selected driving cycles are basically the same, which indicates that the driving process represented by these two driving cycles is likely to have a greater similarity in space domain; while the proportion of accelerating status, decelerating status, cruising status and parking status are not quite the same, which suggests that the specific traffic condition corresponding to the driving cycles are different.
In particular, the 95th velocity of UDDS and WVUSUB are 22.71 and 17.23 m s −1 , respectively, which shows that the former driving cycle has a higher common velocity than the latter. RMS acceleration of the two driving cycles are 0.3863 and 0.1840 m s −2 , respectively, which further proves that the traffic condition of UDDS is better than that of WVUSUB.
In addition, the maximum acceleration of both driving cycles are also similar to each other and the values do not exceed the maximum limit (1.5 m s −2 ) of urban buses under normal condi-  tions, which shows a similar ultimate dynamic performance of the city bus. Therefore, the two driving cycles selected in this section could be considered to represent the driving law of the city bus in two different traffic scenarios. Figure 6 shows the variation of the loss function. As iteration time increases, the loss function of meta-training decreases, For conventional supervised learning, the decrease of loss function usually means the improvement of prediction accuracy; while in a meta-learning process, such an inference could not be directly made. A more convincing approach is to test velocity prediction accuracy with those data not involved in metatraining.

Analysis of training results
Therefore, a periodically executed model-testing step is added to validate whether the model's predicting accuracy for specific driving cycles improves with the increase of iteration times. As shown in Figure 6, with the increase of meta-learning's iteration times, the prediction accuracy under both driving cycle is also improved at the same time.
To facilitate the latter discussion, an indicator called loss difference is defined in Equation (5).
where L train and L test are loss values of training data and testing data, respectively. During pre-training process, loss difference ΔL represents the influence of training methods on testing results. During model's training, an ΔL approaching zero means the testing error is gradually approaching the training error, which further indicates that the adopted training method indeed improves the prediction accuracy. For an ΔL that gradually deviates from zeros, prediction accuracy would get decrease.
The following two figures show the variation of ΔL in two different cases. In Figure 7, meta-training is performed; prediction accuracy of two driving cycles are tested. In Figure 8, supervised learning is performed; the two driving cycles are used for training and testing in turn.
As seen in Figure 7, both error differences are gradually approaching zero, which suggests that parameters' updating via meta-training do help to improve the predicting accuracy of each driving cycle. In Figure 8, loss difference has an opposite trend. Through regular supervised training, model's pre- Fine-tune-training could be performed with data sets provided by any driving cycles; the main purpose is to further improve the adaptability of velocity prediction model to the actual driving cycle. Loss function of both pre-training and finetune-training are recorded in Figure 9.
As is shown, loss function decreases rapidly during pretraining; during fine-tune-training, loss function decreases slowly. As a result, the updating of model's parameters mainly occurs in pre-training stage, which shows the importance of pretraining in constructing the velocity prediction model.
The effect of fine-tune-training on loss function is shown in Table 3. For a more general conclusion, the complete training is repeated 100 times and the average results are recorded. (5000 iteration times of both pre-training and fine-tune-training) As the results show, the average error under UDDS and WVUSUB cycle can be reduced by 4.0153% and 4.0113%,  respectively. Although the declination ratio is not large, continuously decreasing loss functions indicate that fine-tune train step still can further help to reduce the predicting error. Prediction results of selected driving cycles are shown in the following two figures. Figure 10. gives the predicting results of vehicle's future velocity and Figure 11. records the error.

Selection of iteration times
Since fine-tune-training should been performed online, its iteration times should be controlled in a reasonable range to reduce the time cost. Based on the same initialized parameters, 10 sets of complete training process are performed, with each set corresponding to a different iteration time of pre-training. The variations of testing errors during fine-tune-training are shown in Figure 12. For better observation, results are express by fitting curves. With a different pre-training iteration times, the effects of fine-tune-training on the variation of testing error are different. In some cases, although the initial testing error being large, it would be effectively reduced through fine-tunetraining; In another case, fine-tune-training seems to be of little help for improving prediction accuracy. Thus, selecting an appropriate iteration time for pre-training will directly improve the training efficiency of fine-tune-training, which helps to reduce time cost of the prediction model's online application.
Here we assume the maximum iteration times of fine-tunetraining for online application is limited to 10,000. Based on such a condition, testing errors for different iteration times of pre-training are recorded. The corresponding iteration times of the 10 groups are in the following range: [5000:1250: 16,250].
Testing error of the last 100 iteration times is recorded in Figure 13. The three endpoints on the line segment correspond to the maximum, minimum and average testing error, respectively.
where n is the number of driving cycles;L i is an equivalent error for the i th driving cycle defined by Equation (7) where k is the grid number for dividing iteration times in pre- Based on the testing data in this section, n = 2 and k = 10;L with different pre-training iteration times are shown in Table 4.
With iteration times being 11,250, the minimum and average values of testing errors are both the minimum, which suggests that such an iteration times of pre-training is suitable.

Testing route selection
Testing data used in this section is collected by a hybrid electric city bus which travels in an actual traffic environment. The    Figure 14) The total distance of this route is over 30 km. Along the route, the city bus will pass through several roads with different velocity-grades in turn; the traffic conditions of each region along the route are quite The cumulative percentage of prediction error different as well. Therefore, this route is suitable for collecting driving data with different traffic conditions, which just meets the requirements of this study. The testing data is collected via one hybrid electric city bus and the driver is also the same person. Thus, it can be considered that both driving style and vehicle status are reflected in the variation of velocity.

Testing results and discussion
Based on the above route, we collected real-time velocity information for several periods. Four short driving cycles (each lasts for 1500 s) recorded under different traffic conditions are extracted to generate training data for pre-training; a long driving cycle (lasts for 2.5 h/9000 s) is extracted to perform fine-tune-training and test the real-time prediction. Before the testing, pre-training has been performed offline. In the first 1000 s, only fine-tune-training was performed; in the latter 8000 s, a complete prediction process is executed. Velocity prediction results of four different times are shown in Figure 15, which indicates that the prediction model maintains high prediction accuracy during the long driving cycle. (prediction results of whole testing time-period are shown in Figure 18.) The cumulative percentage of prediction errors are shown in Figure 16. As can be seen, 90% error values are less than 1.7 m s −1 ; the percentage of error values lower than 1 m s −1 is 60%. Compared with the selected driving cycles, the variation of vehicle's velocity in actual traffic environment is more complex, which leads to the declination of prediction accuracy in real traffic scenario.
Besides, several comparative tests are also performed. The results are shown in Figure 17. Figure 17(a) gives the effects of historical horizon on prediction accuracy. As shown, 20 s should be a better choice for historical velocity sequence. According to Figure 17(b), the proposed velocity prediction method via metatraining and periodic fine-tune-training has obvious advantages over the other comparative methods.
In addition, we found that the prediction results of two incomplete training methods are similar (one is meta-training with fine-tune-training; the other is meta-training without finetune-training), which suggests that fine-tune-training conducted only once does not help too much to further improvement of the prediction results. A periodically performed fine-tunetraining is very useful for velocity prediction which lasts for a long time.
Compared with supervised learning, prediction errors of the proposed method are shown in Table 5. The accuracy of 3, 5 and 10 s horizons are improved by 25.9%, 16.78% and 7.47%, respectively. Such an improvement indicates that the proposed method in this paper could meet the requirements of velocity prediction mission in actual traffic scenarios and the prediction accuracy is better than that of the commonly used supervised learning methods.
In the end, it must be clear that the prediction method proposed in this paper is not only applicable to future velocity prediction of urban buses with hybrid electric system. The method itself is a relatively general method, which may be used for velocity prediction of other types of driving vehicles in the traffic environment. Moreover, it could be applied in other prediction fields, such as, traffic-state recognition based on traffic flow parameters.
As far as the research of predictive energy management is concerned, this method is effective. If the method is applied in other scenarios, the following details should be carefully considered: 1. Generally speaking, the driving rules of public transport buses travelling in urban environment are easier to be recognized than that of passenger cars. As a result, the prediction accuracy for the former may be better than the latter. 2. The neural network structure proposed in Section 2 is suitable for velocity prediction problem. If the complete prediction framework is expected to migrate to other areas, the network should be adjusted to a new structure which may be more suitable for new problems.
Last but not least, we would like to explain the application boundary of the proposed method, which is no less important than the method itself. For city buses' velocity prediction tasks in variable traffic scenarios, both the similarities and differences among tasks are co-existing. The similarities among different prediction tasks could be captured into the model during  pre-training process as far as possible, which effectively reduces the time cost of fine-tune training process and is conducive to the online application of the whole method. When the multitask prediction problems with almost no common characteristics are required to be dealt with, the proposed method may not work.

CONCLUSIONS
A velocity prediction model with adaptability to varying driving cycles is commonly required by the model predictive energy management strategy for hybrid electric vehicles. For this purpose, a city buses' future velocity prediction method via meta learning for multiple driving cycles is proposed in this paper.
The complete prediction framework includes three main parts: Pre-training, fine-tune-training and velocity prediction, which are performed offline, online and in real-time, respectively. In pre-training step, a multi-task training method based on meta-learning is proposed to train the model with multiple driving cycles. Training results of selected driving cycles suggest that meta-training in pre-training step does help to improve the prediction accuracy of each driving cycle at the same time.
For online application, the influence of iteration times on the prediction accuracy is fully discussed and a principle of selecting a suitable iteration time is given. Then, complete testing is executed for real traffic scenarios. Testing results show that the proposed framework works very well for actual scene. The prediction effect remains stable during the testing which lasts for a long time.
Compared with supervised learning, prediction errors of the proposed method which includes meta-training and periodic fine-tune-training are much lower in practice. The accuracy for 3, 5 and 10 s horizons is improved by 25.9%, 16.78% and 7.47%, respectively.