Evaluation of emergency driving behaviour and vehicle collision risk in connected vehicle environment: A deep learning approach

In the latest connected vehicle (CV) message standards, including SAE J2735-2016 and T-CASE 53–2017, the basic safety messages (BSMs) are designed speciﬁcally as effective measures for trafﬁc safety management and applications. In this study, a testbed on the Nanchang-Jiujiang Intelligent Highway in Jiangxi, China is illustrated as an example, and the basic architecture and key technologies is introduced for a proactive trafﬁc safety util-isation, where the core basic safety message (BSM) data are sorted and implemented to perceive and predict risky driving behaviours in a ﬁeld environment. On this basis, an accurate insight into time-critical driving safety issues can be achieved by investigating raw BSM data, such as the inter-vehicle distance, driver manipulation, vehicle speed, and acceleration/deceleration. Furthermore, to effectively take advantage of connected vehicle information and perceive the high uncertainty of driving behaviours during an emergency situation and evaluate the driving safety in mixed trafﬁc scenarios, a long short-term memory (LSTM) based deep learning framework is introduced to build a multi-horizon vehicle crash risk prediction model using continuous BSMs as the inputs. The experimental results demonstrate the signiﬁcance of connected vehicle data and deep learning algorithms for improving driving safety and promoting widespread deployment and application of connected vehicles.


INTRODUCTION
Sustainable road safety is a high priority for traffic management agencies, auto parts suppliers, and freight services enterprises. To eliminate serious accidents and mitigate the severity of their outcomes, improvements have been made, ranging from enhanced roadside infrastructures to the development of active vehicle-based systems. A typical solution is based on advanced driver assistance systems (ADAS), which employ vehicle-embedded sensors to monitor ago-vehicle movement and surrounding traffic to detect risky driver behaviours and provide appropriate warnings. Comparatively, the connected vehicle (CV) system applies vehicle-to-vehicle (V2V) and vehicle-to-infrastructure (V2I) communication to dissimilate real-time information between connected vehicles, such as vehicle motions, driver manipulations, and traffic dynamics, which are advantageous for enhancing the driver's perception, and ADAS for examining emergency situations that unilateral approaches cannot immediately recognise. With access to the information of the surrounding vehicles and infrastructures, the safety-related events and accident causation mechanism can be evaluated and more accurately predicted. Although it has been illustrated that 60% of vehicle accidents could be avoided if predicted 0.5 s ahead of time and the pertinent information forwarded to drivers in advance [1], it is challenging to capture the uncertain and nonlinear movements of the vehicle in complex and changeable road traffic as well as extreme actions of the driver, such as hard braking and turning of the steering wheel during an emergency. Most methods applied in this field evaluate and predict the driving safety at the immediate moment by inferring the probability of a vehicle collision by collecting the vehicle motion and dynamics, driver behaviours, and other factors at the present moment, rather than exploiting time-series information. A basic safety message (BSM) is defined as an essential subset of connected vehicle standard protocols for collecting and transmitting safety-related information through V2V and V2I communication, which in turn, realise proactive vehicle collision avoidance functions, such as forward collision warning [2], intersection collision warning [3], blind spot warning [4], and lane change warning [5]. Although these CV-enabled solutions have been repeatedly inspected and demonstrated in typical emergency scenarios, few of these have fully taken BSM messages into account. In addition, most of these CV safety applications are implemented based on limited BSM information, that is, the calculated time to collision (TTC) index, or take instant BSM data into account while not comprehensively investigating the complete series of BSMs.
In most human-driving processes, drivers manipulate agovehicles moving at a safe distance against the surrounding vehicles mainly based on their visual sight and subjective judgment. In the case of an emergency, drivers brake through a series of measures to increase the safety distance with approaching vehicles so as to mitigate the collision risk immediately. Therefore, in this study, we assume that emergency braking is a straightforward criterion to reflect the possibility of a vehicle crash or precollision situation and can be predicted to provide an effective warning to the driver in time and reduce the probability of accidents. To address this problem, a comprehensive BSM dataset was collected and integrated in a real-world connected vehicle testbed, where vehicle dynamics and the motion status and driver behaviours are recorded. Furthermore, a long short-term memory (LSTM) based deep learning framework is proposed to forecast the driver behaviour series and vehicle collision risk in multiple time horizons. In particular, the mechanism of the memory gate and forgetting gate of the LSTM model is more suitable for excavating meaningful information from a series of continuous BSMs to predict the accelerations/decelerations conducted by emergency braking. These findings and implications are referred to deploy a proactive traffic safety system using BSM messages.
The main contribution of this paper, which is the V2X enhanced proactive driving safety solution, will be fully described in Section 4. One of the goals of our completed study is to illustrate our established testbed on the Nanchang-Jiujiang Intelligent Highway in Jiangxi, China as an example, and introduce how the core BSM data are sorted and implemented to capture the risky driving behaviours in connected vehicle environment. On this basis, we can achieve an accurate insight into time-critical driving safety issues by investigating raw BSM data, such as the inter-vehicle distance, driver manipulation, vehicle speed, and acceleration/deceleration. Another goal of our research is to take advantage of a long short-term memory (LSTM) based deep learning framework to perceive the high uncertainty of driving behaviours during an emergency situation and evaluate the driving safety in mixed traffic scenarios, which has also been developed as a built-in ADAS prototype and tested in a practical situation.
The remainder of this paper is organised as follows: Section 2 summarises related studies on vehicle crash risk detection and prevention. A description identifying the overall connected vehicle framework and BSM dataset necessary to achieve vehicle collision avoidance is provided in Section 3. This is followed by a description of the LSTM algorithm applied for vehicle crash risk prediction in Section 4. Section 5 describes the experimental data and the results of this study. Finally, summary and concluding remarks are given in Section 6.

RELATED STUDIES
A large number of solutions have been designed in a CV environment to address driving safety issues. Most of these applications take advantage of the BSM information to obtain the TTC of the vehicle with the preceding vehicle and provide warning messages to the driver [6]. In addition, these assessment methods have been explored using real-time vehicle kinematics or partial BSM information for driver, vehicle, and environment (D-V-E) surveillance and a vehicle collision risk evaluation. Headway safety is one of the most common criteria for evaluating vehicle crash risk. The safety distance and TTC are two typical headway safety indexes that have been extensively applied in vehicular embedded safety systems [7] and CV safety technology [8]. These models have been demonstrated to enhance the maintenance of a reasonable headway and speed of a vehicle by providing timely reminders to the driver as any abnormal events are detected. However, these algorithms are typically dependent on vehicle kinematics measurements, without considering driving manoeuvres (e.g. a constant speed, acceleration, braking, and steering), such that they are susceptible to prompt false warnings under complex traffic scenarios.
The BSM dataset involves comprehensive driver, vehicle, and traffic environment information, which provides an opportunity for more precise detection and prediction of safety-related events [9]. Based on this motivation, some practical tools, such as Logit-based models, have been used to express the relationships of partial or comprehensive "driver-vehicle-road" efforts with vehicle crash accidents [10]. For example, Yu examined significant impact factors of driver injury severities from a sevenyear crash dataset and applied a likelihood ratio test to evaluate the temporal stability of the model specifications. The proposed model associated with multiple independent variables can capture the individual-specific heterogeneity across crash records [11]. As the advantages of these formulations, a matrix-variate distribution is employed to achieve a general, intuitive, and flexible parameterisation. In addition, machine learning models have been widely illustrated to evaluate vehicle crash risk by using instant or time series vehicle motion, driver behaviour, or other relevant conditions as inputs. This significantly reduces the false warning probability compared to derivative formulas for estimating the safe inter-vehicle distance [12,13]. These models have also demonstrated the advantages of plotting complex and non-linear driver characteristics from a large volume of sample data and precisely inferring actual driving situations. Several reasoning models have been investigated to extract the significant driving-safety related attributes from the generic driving process, that is, perception, analysis, decision-making, and action. In these studies, the fault driver intention and behaviours are largely investigated and regarded as one of the remarkable factors in critical vehicle crash accidents [14]. Reasoning models have also been applied to naturalistic driving scrutiny and have provided useful supplements to effectively examine driving safety in simulated and field trials, which further demonstrate the significant impacts of impaired driving on atfault road traffic accidents [15]. Other studies have developed reasoning models for exploiting driving risk features through a comprehensive D-V-E inspection. For instance, Kim explored an adaptive leading time model for determining the safety threshold in car-following situations by simulating driver behaviours in normal or evasive manoeuvres from static and dynamic test data. The probability of a vehicle head-on collision can then be accurately estimated [16]. Hubschneider applied a supervised learning algorithm to capture human driving patterns based on collected vehicle CAN bus data and recorded video data and found that driving safety conditions and related sensor information can be collocated with intuitive steering [17]. Bender presented a tree-based model framework for extracting a multitude of driving patterns from a number of naturalistic driving data collected in large-scale real-world trials, which in turn, allowed a high-level understanding of driving safety status under each condition [18]. A rough set-based algorithm was also investigated to model a transparent relationship between relevant D-V-E variables and examine the most significant attributes affecting driving safety in emergency cases, and help achieve a retro-design of a proactive collision avoidance system (CAS) [19]. These studies took into consideration driving profiles, such as coasting, turning, acceleration, and braking, and illustrate that driver behaviours can be reliably associated with an upcoming vehicle crash in near-crash scenarios.
An artificial neural network (ANN) is another popular method for evaluating and predicting the potential rear-end crash risk through a high-dimensional, nonlinear, multi-layer architecture, that is, one or two hidden layers, and is capable of extracting underlying risky driver behaviours by inferring different patterns in certain amounts of sample data and adjusting network parameters to infer the input-output relationships [20]. It has also been claimed that any complicated input-output relationships can be determined using a three-layer neural network with a sufficient number of hidden neurons [21]. Several evolved neural networks have been employed for proactive driving safety applications. For example, Selmanaj developed a selforganised neural network model to detect dangerous driving by using accelerometer and gyro samples as inputs [22]. Although it has been conducted under simulated conditions, the neural network model outperforms other benchmark models in fusing different types of sensors, and more accurately recognises the dangerous driving events, particularly with fewer samples for training. Despite ANN models having strong nonlinear fitting capabilities, these models present a partial review of instant vehicle motion or driver action information for a driving safety analysis.
An LSTM neural network is derived from a recurrent neural network, and is able to solve the problem of a gradient disappearance or explosion caused by a long-term expansion time series by introducing the mechanics of gate components, that is, an input gate, a forgetting gate, and an output gate [23,24]. Thus, it outperforms other neural network models in inferring the characteristics of time-series data on various time horizons. It was also found that LSTM-and CNN-based neural network models have been applied to classify and estimate typical driver behaviours featured by acceleration, deceleration, and steering [25][26][27].
Owing to the capability of the LSTM model to infer the characteristics of time-series data on various time horizons, we can investigate the early awareness of driver behavioural changes and vehicle crash risk continuously through interactions with the surrounding vehicles under complex scenarios. The strengths of the LSTM model compared to existing approaches can be identified as follows: (a) Apart from early contributions in vehicle crash risk prediction, a spatio-temporal data series can be correlated as a coherent structure and capture significant and steady features over short-term time intervals. (b) We predict the extreme vehicle braking behaviour multiple time-steps ahead in the short-term using a recurrent structure and an encoder-decoder architecture that allows the time-steps ahead to follow more complex patterns. (c) The data involved in the model as inputs are easily accessible from raw BSM data that the connected vehicle systems can collect and provide in a continuous way.  Following previous experiences, the intuitive reaction of driver behaviours in an emergency would adopt a braking manoeuvre to avoid potential crashes. In particular, the more likely a crash risk is to occur, a braking manoeuvre is conducted with greater urgency during a near-crash. Thus, we can assess the driving-risk level by considering the braking process characteristics that could affect the drivers and their responses. In particular clustered data on the braking process characteristics have been investigated to evaluate the involvement of driving risk under different extents of near-crash events [30], which categorises the driving risk level into low-risk, moderate-risk, and high risk accordingly when the driver's immediate decelerations are distributed within the ranges of [−2, 0] m∕s 2 , [−5, −2] m∕s 2 and [−8, −5] m∕s 2 .

METHODOLOGY
In this section, we illustrate a time-series model for an extreme vehicle deceleration prediction that uses the LSTM framework described in Section 2. Our proposed algorithm based on the LSTM network topology is composed of two parts, that is, BSM data feature extraction and prediction. Note that the vehicle motion and driving behaviours usually present the characteristics of continuity and linearity in a time-series analysis and are relatively dependent on each other in spatial and temporal spaces. For example, if a driver does not change his or her volition (constant speed or acceleration) within a short-term interval, the vehicle movement can be accurately simulated using existing formulations, such as a constant velocity or constant acceleration model. However, if the driver's volition changes, the operation rules of these models will be unable to accurately capture the behavioural nature affecting the vehicle motion, such as heavy acceleration/deceleration or turning. From this perspective, it appears that using BSM sequences to fit the potential driver characteristics together with the vehicle motion status is conducive to modelling real-world driver behaviours. Thus, we examine the continuous BSM messages in car-following scenarios, which can be further processed and described as a matrix x s = {x s,t +1 , x s,t +2 , … , x s,t +k } , where the vectors x s,t = {v s,t , a s,t , d s,t } ′ ,t ∈ [1, N t ] indicate the vehicle speed, acceleration, and headway to the preceding vehicle at timestamp t when running on road segment s. In the V2X environment, the vehicle's driving parameters v s,t and a s,t can be directly estimated and complied in BSM format, and broadcast to the surrounding vehicles and roadside units (RSUs). To further process the vehicle motion on curved road segments, we draw the vehicle movements in the Cartesian coordinates ( x , y ), which can be reformed in vehicle-based coordinates ( x , y ), to calculate the real inter-vehicle headway along the road, as shown in Figure 2. We define a time discrete vector S k to represent the vehicle motion status as s i k (i = 1, 2, … , n) and road geometry parameters r k on the current segment. The vector S k can be described as follows: where the subscript k remarks the time stamp kth that the vehicular OBU generates, transmits, or receives regarding the vehicle position, motion, and manipulation information at time point t = t k ; in addition, each time interval between two consecutive samples is denoted as a constant T s = t k − t k−1 , which is often set as 0.1 s. Based on the vehicle trajectory s i k and road geometry parameters r k collected by a connected vehicle system, a clothoid model is used to estimate the road curvature parameters [31], which is formed as follows: In Equation (2), 0 (k) indicates the road curvature at the kth vehicle's position, whereas 1 (k) denotes the segment curvature change rate. The coordinate system ( x (k), y (k)) and ( x (k + 1), y (k + 1)) represent the vehicle's location at time k and k + 1, respectively, and Δd is the distance traversed during [k, k + 1], which can be described as follows: Within a connected vehicle system, 0 (k) and 1 (k) can be easily obtained by a vehicular OBU according to the vehicle's location. Thus, the travelled distance during [k, k + 1] can be estimated using Equation (3). These three-dimensional features x s,t = {v s,t , a s,t , d s,t } ′ are extracted from the BSM data and are pre-processed using the moralisation and windowing process before inputting the LSTM neural network, which is shown in the left part of the Figure 3. The connected vehicle BSM data such as position, vehicle speed, driver operation etc. are collected through vehicle CANbus and onboard GPS/accelerometers, which are processed by the OBU and continually broadcast in terms of BSMs. When the vehicle equipped with OBU travels in the communication range of RSU, and receives MAP broadcast from RSU, the MAP match begins, to determine the exact location (Segment, Direction, Lane etc.) the vehicle is in. The BSMs are to be continually broadcast and be received by surrounding vehicles equipped with OBUs and RSUs in range of that broadcast. These messages will be collected to study the driving safety awareness, and generate warnings to drivers through HMI Apps. Once the connected vehicles receive the message, they will adjust their driving speed to comply with the advisory speed.
We assume that the hidden features of different driving styles are represented distinctively in the feature series. We further applied the LSTM cell structure to predict the vehicle acceleration/decelerationã s,t +k+1 in vehicle near-crash scenarios, as illustrated in Figure 3. A sequence of temporal observations x s,{k} = {x s,t , x s,t +1 , … , x s,t +k+1 } are generated and fed into a cell of the LSTM network module as input and to infer the temporal evolution of these sequences through a non-linear where, the output vector h t is the hidden variable that describes the distribution of the observation, which should be denormalized to reflect the predicted vehicle acceleration/deceleration in the next k + 1 time interval. Here, c t −1 is the previous cell state in time interval t − 1 and can be passed through for the next prediction, whereas h t −1 is related to the estimated vehicle acceleration/decelerationã s,t +k−1 . The core idea of using the LSTM to forecast the bus link speed information is mainly based on the special gate structure of this model, which is capable of optionally selecting the correlated historical patterns in previous time-series data though the cell state, and passing them to the next inferring procedures. Thus, this is an applicable means to capturing and predicting the changes in driving behaviour over time.
We train the network weights and biases using the Adam optimiser with a learning rate of 1e -3 , and the maximum epoch was set to 500. All these parameters can be modified through supervised training to achieve a better prediction performance based on our sample dataset X = {x s,t } TL t =1 , which has been separated as a model training dataset where TL is the total sample size. We conduct the vehicle acceleration/deceleration prediction on different time intervals (e.g. 0.5 or 1.0 s) by setting k groups of original sample data as a single training sample x train_input = {x s,t , x s,t +1 , … , x s,t +k−1 } The model parameters will be optimised based on the max_epoch times until the mean square errors ofx train_output and x train_output achieve the minimum value: In the model training process, the cost function L(x train_output ,x train_output ) is defined as in Equation (6). In addition, we apply the Adam optimiser to iteratively update the network weights based on the training data until they are minimised within the expected range.

Experiments and data collection
In this study, we conducted field experiments and collected BSM data on a test section of the Nanchang-Jiujiang Intelligent Highway in Jiangxi, China, as shown in  conditions. The communication delay of each roadside unit is evaluated to be less than 100 ms, whereas the stable coverage range in the real-world environment is no less than 500 m. It is also equipped with a 5G base station and millimetre-wave radar monitors, which can realise a high-definition transmission of 5G videos and the detection and analysis of lane-level traffic events. By developing a segment backend management node-frontend roadside infrastructure platform, it further takes advantage of V2X technologies for improving traffic safety and regional traffic efficiency, and widely supports applications such as autonomous driving, high-precision Beidou system-aided tracking and emergency services, and cooperative vehicle and infrastructure systems.
In the practical deployment of V2X RSUs on the Nanchang-Jiujiang Highway testbed, two RSU types, that is, a simple version and an argument version, were distributed every 1 km along the roadside. The simple version of an RSU with C-V2X antennas is only accountable for transmitting and receiving messages between the RSU and vehicles embedded with a V2X OBU, whereas the argument RSU having both C-V2X and DSRC antennas is able to identify and access more connected vehicles embedded with different communication modules. At the site of the argument RSU, it is also equipped with a Beidou positioning system and traffic monitoring sensors, that is, radar, a real-time weather information system (RWIS), a video camera, and a toll collection system, which are connected to the argu-ment RSU through an ethernet hub. These data are then transmitted back to the traffic management centre (TMC) over the ethernet network. By integrating such multi-source information on an RSU, the data source for the multimodal transportation safety optimisation algorithm will be collected from a set of infrastructures. The data will be from weather sensors, incident management systems, and speed and flow detection systems, among others. In addition, the connected vehicle data are another type of data source for improving the model accuracy. The optimisation results from the multimodal model will then be obtained by the traffic management centre. Subsequently, these results will be transmitted to the RSU and then broadcast to the OBU. The drivers will be informed by the message of the advisory driving speed, weather, and incident warning ramp metering and route guidance. The OBU provides wireless connectivity in an automobile environment with a high rate and low latency communication between vehicles (V2V) or between vehicles and road-side units (V2I). The OBU helps provide safety and data services to the driver, and has an integrated Beidou receiver, which can be used as a navigation device for the vehicle. By default, the device transmits its position data in a continuous service channel, encoded as a per BSM data format.
The proposed vehicle safety system has been equipped on three experimental automatic and connected vehicles (ACV) with LTE-V onboard units, which support vehicle BSM data collection, driving safety advisory and assistant driving for selected vehicles, and receive road traffic information for select vehicles. The connected vehicle environment will be implemented within both the TMC and roadside unit to support the collection of data along with other V2X applications. The BSMs are to be continually broadcast and be received by surrounding RSUs and vehicles equipped with OBUs in range of broadcast. These messages will also support situational awareness insights from connected vehicle fleets.
In this study, one-day BSM samples in the field environment were collected by three connected vehicles driving on a test segment of the Nanchang-Jiujiang Highway testbed in the morning at 9:00-11:00 am and afternoon at 13:00-16:00 pm on July 16, 2019. During the experiment process, the overall traffic flow in the morning was relatively heavier than it was in the afternoon, particularly with more freight traversed through this segment. The quantitative statistics of selected driver behaviour and vehicle motion under accelerations, decelerations, and lane changes were recorded in the BSM dataset at a frequency of 10 Hz. Observations with deceleration hitting a threshold of −1.5 m∕s 2 or a TTC of less than 3 s were marked, and the immediate and previous sampling points were extracted from the raw BSM samples to capture the driver's extreme acceleration as a key criterion to assess the driver's instantaneous decisions under near-crash scenarios, that is, how drivers avoid a risk of a crash. In the meanwhile, the video sequence from the vehicular embedded data collection system were also analysed to decide whether an event triggered by kinematic thresholds was actually safety critical, if not, such an event was not defined as effective sample and was deleted from the dataset.
Totally, 184 groups of BSM series in near-crashes were recorded throughout the field experiment period. Nearly all the near-crashes had large longitudinal deceleration, implying that the drivers tended to adopt the rapid braking maneuver to avoid potential crash. Hence, the driving-risk level was represented by the braking process characteristics. Intuitively, the driving risk is higher if the braking maneuver is performed with greater urgency in a near-crash. The clustering braking process characteristics data were investigated to evaluate the involvement of driving risk in a near-crash event.

Performance evaluation
In this section, the performance of the LSTM framework for driver behaviour and vehicle motion prediction in shortterm time intervals is evaluated and compared against benchmark methods, that is, an ANN, an extended Kalman filter (EKF), and dynamic vehicle models. Then, the extreme braking events will be identified in accordance with instantaneous vehicle motion and driving contexts to understand the driving safety from the perspective of human experience, for example, measuring the driver's heavy braking behaviour to reflect the potential crash risk level. Note that the emergency driver behaviours can also be represented by other BSM variables, such as the throttle position or steering rate. In this work, we illustrate how to use BSM samples in longitudinal emergency cases as an example to infer vehicle collision risk in near-crash scenarios.

FIGURE 5 Instantaneous vehicle longitudinal and lateral acceleration
We compare the overall performance of the vehicle acceleration/deceleration prediction based on the models mentioned above, as shown in Table 2. The experiments are carried out under the optimal model parameter set. In this study, we configured the ANN model as a single hidden layer neural network, and the number of hidden layer neurons is set in coherence with the number of LSTM-based models. The LSTM models were trained using an Adam optimiser with a learning rate of 0.001, a batch size of 20, and an epoch of 500. The input features are defined as four sequent BSM series from previous time intervals and each time interval is set to 0.5 s, based upon which, we predict the upcoming vehicle motion and driver behaviour on horizons of 0.5 and 1.0 s, respectively. The prediction results are shown in Table 2, where we can see that the LSTM model generally outperforms other models on forecasting the vehicle acceleration/deceleration within short-term periods, whereas the vehicle dynamic model achieves the worst prediction effort, particularly on long-term time horizons, that is, 1.0 s. This is most likely due to the LSTM model being more adaptive than other models to the features of driver behaviour under the impact of a dynamic traffic environment. Although a nonsignificant difference appears between the LSTM model and the baseline models during a 0.5 s prediction, it should be noted that the evaluation indexes, MAE, RMSE, and R 2 , are counted during the entire experimental process, where most of the driver FIGURE 6 (a) Overall performance of vehicle crash risk assessment models compared to observed dataset in model training procedure and (b) testing procedure, the vertical axis is the TPR, which indicates the correctly predicted crash cases, whereas the horizontal axis is the FPR, indicating the incorrectly predicted crashes behaviours are consistent and smooth, which averages their errors.
During the process of our driving experiments, we investigated the effort of using BSM series data to identify driving risk in near-crash scenarios. The near-crash scenarios imply that the driver may perform evasive manoeuvres, i.e. braking and steering operations, and a potential crash may be avoided. With this motivation, we compare the accuracy and reliability of our proposed method against the benchmark models. Among them, the LSTM, ANN, and EFK models are applied to directly predict the vehicle decelerations and place the vehicle near-crash cases into one of the following categories: low-risk, moderaterisk, and high-risk, which are consistent with the definition in Section 3. The TTC model threshold is also considered as a criterion for determining the vehicle collision risk.
The scoring process is conducted based on the receiver operating characteristics (ROC) curve, as shown in Figure 6(a,b), which can evaluate how competent the model is at predicting vehicle collision or non-collision risk scenarios through a true positive rate (TPR) versus false positive rate (FPR) graph. In this study, we classify the low vehicle crash risk into a negative set, whereas the high and moderate crash risk is classified into a positive set. The scoring results in Figure 6(b) show that the overall accuracy of the LSTM model for vehicle crash risk prediction is approximately 95.1%, whereas the ANN, EKF, and TTC models achieved overall accuracies of 88.7%, 95.1%, and 91.9%, respectively. These findings partially demonstrate that the LSTM framework can more accurately capture hiding patterns from a series of BSMs data. We take advantage of the ROC curve for this research because the overall accuracy and error rate of the model performance will be suspicious if the collected samples were strongly biased to the majority class. Under this situation, we use ROC curve indexes to evaluate the vehicle collision risk assessment models for different sample distributions. The TPR represents the percentage of risky driving cases that are correctly predicted as such, and the true negative rate (TNR = 1-FPR) represents the portion of safety driving cases that are correctly forecasted as having such a condition. Thus, a balance between TPR and TNR in the ROC graphs will be consistent with the ground truth even if the positive and negative cases collected are highly skewed. The overall accuracy indicates the total ratio of correctly predicted driving safety status, and the area under each ROC curve (AUC) shown in Figure 6(b), that is, AU C lstm = 0.8967, AU C ek f = 0.8883, AU C ann = 0.745, and AU C ttc = 0.83, compare the general usefulness and overall performance of each model. Consequently, the LSTM model is consistently superior to other benchmark models according to its area under the curve (AUC) in the ROC graph. The EKF model is relatively lower than the LSTM model but still provides a satisfactory performance, whereas the ANN model and TTC model ranked the lowest on these measures.
We further examined the reliability of our proposed model and the benchmark models in predicting risky driver behaviours on the 1-s time horizon, as shown in Figure 7(a-d), where in each sub-figure, we also change the cut-off values for the applied models to identify the vehicle crash risk, thereby showing the corresponding accuracy, TPR, FPR, and TNR at different cut-off points. Thus, the optimal cut-off values can be selected as the threshold for each model to achieve the highest performance in this prediction. By inspecting Figure 7(a), The vertical green line represents the case of an optimal prediction in each modelling process, and the corresponding accuracy, TPR, FPR, and TNR changes at different cut-off points we found that 96.0% of emergency braking manipulations in near-crash scenarios have been correctly targeted by the LSTM algorithm 1.0 s before the driver is involved, and 83.3% of noncrashes in these scenarios are also accurately identified, which indicates that the probability of false warnings is approximately 16.7%. In addition, the overall accuracy of the LSTM-based vehicle collision risk prediction is 93.5%, which presents a slight reduction compared to its application to cases of a 0.5 s time interval. The prediction performance of the EFK, ANN, and TTC models are presented in Figure 7(b-d), respectively. In Figure 7(b), almost all of the driving safety states are accurately targeted; however, only half of the true negative cases are effectively achieved, which means an increase in false warning for risky driving. Although the overall accuracy of 90.3% in the 1.0. s prediction with a 50% false positive rate is considered reasonable, the EKF model is less accurate than the LSTM model. The results in Figure 7(c,d) show that the overall performance decreased by applying the ANN and TTC models for a vehicle crash risk prediction on the 1.0 s horizon. We correctly targeted 90.0% of the harsh deceleration behaviours based on the ANN model, and 98% based on the TTC model, while only 66.7% and 41.7% of the safety states were predicted by each model, respectively. The above results illustrate that driving safety under near-crash scenarios is complex and diversified because the uncertainty of driver manoeuvres influences driving safety in most emergency cases, even incurring the same headway (TTC) before the driver takes a different action. The LSTM framework is more effective in taking advantage of BSM series data to capture driver behaviours under realistic traffic situations, consequently improving a vehicle near-crash risk assessment.

CONCLUSIONS
In this study, we proposed an LSTM framework for extracting risky driver behaviours and predicting the risk of a vehicle crash based on real-time BSM series under a connected vehicle environment, which has been illustrated and evaluated by conducting a study on a testbed along the Nanchang-Jiujiang intelligent highway. We conducted measurements by collecting a dataset of vehicle motion and dynamics, as well as a series of driver operations according to the core BSM dataset used in the SAE J-2735 standard, which is capable of truly reflecting the typical characteristics of driving behaviours, vehicle motion, and interactions with other surrounding vehicles under highway conditions. During the experiment, the drivers were advised to maintain a state of natural driving to avoid generating abnormal driving samples. The involvement of vehicle crashes under typical emergency situations is related to the BSM series (e.g. driver behaviour and vehicle motion) through an LSTM model, which can provide effective judgment and warning information. It should also be noted that different types of drivers may weigh safety and comfort extremely differently during the carfollowing and lane-changing procedures, which have been limitedly investigated in our study, and only a longitudinal driving safety assessment was evaluated. In addition, the influence of multiple factors in the BSM dataset for driving risk awareness has not been fully addressed. Despite such limitations, the proposed LSTM framework quantifies extreme driver behaviour prediction under near-crash situations, which can be extrapolated to a driving risk assessment based on BSM datasets. Furthermore, we will consider involving more BSM attributes in an evaluation of risky driver behaviours, and pre-detecting complicated driving safety under complex scenarios in urban areas.