Comprehensive analysis of UK AADF traffic dataset set within four geographical regions of England

Traffic flow detection plays a significant part in freeway traffic surveillance systems. Currently, effective autonomous traffic analysis is a challenging task due to the complexity of traffic delays, despite the significant investment spent by authorities in monitoring and analysing traffic congestion. This study builds an intelligent analytic method based on machine‐learning algorithms to investigate and predict road traffic flows in four locations in the United Kingdom (London, Yorkshire and the Humber, North East, and North West) with a range of relevant factors. While aiming to conduct the study, the dataset ‘estimated annual average daily flows (AADFs) Data—major and minor roads’ from the UK government was used. Machine‐learning algorithms are used for this research and classification applied consists of Logistic Regression, Decision Trees, Random Forests, K‐Nearest Neighbors, and Gradient Boosting. Each of these algorithms achieves an accuracy of over 93% and the F1 score of over 95%, with Random Forest outperforming the other algorithms. This analytical approach helps to focus attention on critical areas to reduce traffic flows on major and minor roads in the area. In summary, the findings on traffic analysis have been discussed in detail to demonstrate the practical insights of this study.


| INTRODUCTION
Traffic flow detection is a component of a freeway traffic surveillance system that detects data such as traffic volume, vehicle type, speed, and possession ratio to control and ensure traffic flow to be unobstructed and safe and provide supervisors with an accurate forecast basis.Traffic congestion is a major concern in today's metropolis (Lingras et al., 2000).They result in severe economic losses, higher pollution, and longer travel times.Authorities spend heavy investments in monitoring and analysing traffic congestion, but it is difficult due to the intricacy of traffic delays.
One of the disadvantages is its pervasiveness.There are traffic delays occasionally, but sometimes there are not.This intermittent pattern of congestion makes it difficult to devise effective and proactive congestion management strategies.Another issue is that traffic congestion is everchanging and interrelated.Impediments in traffic could, for example, stretch from one road to the next (Lubbe et al., 2018).Autonomous traffic analysis is challenging due to these complexities and necessitates a high level of skills, experience and understandable awareness.
Therefore, there is a need for a comprehensive understanding of traffic congestion patterns and identifying specific inefficiencies in congestion management in the crucial regions of the United Kingdom.This research gap has prompted the need to explore the dynamics of traffic congestion, and the factors contributing to congestion in urban areas (Ji & Hong, 2019).In addition, the effectiveness and applicability of machinelearning techniques for traffic analysis has not been well-documented compared with other domains (Quek et al., 2006).By addressing these research gaps, the study can contribute to the development of more efficient and targeted strategies for managing traffic congestion in urban areas.This study creates a visual analytic technique for investigating traffic sequences and circulation in four locations of the United Kingdom in this research (London, Yorkshire and The Humber, Northeast, and Northwest).
London is by far the largest city, with a population of approximately 9 million, compared to the second largest city in the United Kingdom, Birmingham, with 1.1 million.The population and infrastructure in London can be used as a forecast for future development in other regions of the United Kingdom.Therefore, by observing possible causes of congestion in London, this study can predict and avoid certain inefficiencies when planning the construction of major and minor roads, local congestion taxes and investments in public transport.The local authorities of London can be separated into five sections, Central, North, East, South, and West.
Figure 1 shows the Principal London Road network.The network is made up of two subsidiaries, Transport for London (TfL) and the Road Network represented by the Red Route and Borough Principal Road Network.The combination of the two networks makes up 11% of London roads, but during weekdays, between 7 am and 7 pm, these roads account for 54% of the London traffic.As shown in Figure 1, Central London acts as a focal point, representing the location where all roads converge.Around 790,000 people commute from England and Wales to London; among these numbers, nearly 400,000 workers commute to Westminster, and 230,000 workers commute to the City of London (Chow et al., 2014).
Moreover, an analysis of London would be conducted individually by separating the data from the AADF dataset, into five sections, with the 'local_name' of 'London (Central, North, East, South and West)'.Individual analysis using visual techniques will be conducted for each region, highlighting regional outliers, and a general comparison of traffic congestion among the five regions will be conducted.Additionally, the data for the two most congested roads in London, 'A406' and 'A23', will be shown.Drivers lose an average of 117 h per year due to traffic congestion between the two roads; thus, highlighting any potential correlation of congestion in 'A406' and 'A23' would provide more insight into possible reasoning for congestion present in local authorities.
The method combines intelligent analysis and insights.This study begins by extracting the interesting areas from the 11 regions included in the United Kingdom Road AADF Count dataset and then creating exploratory data analysis (EDA) propagation for the average annual daily flow.
Utilizing the major and minor road categories, this study conducts a systematic examination to determine the annual average daily traffic flows in the chosen locations.This method of analysis assists in focusing attention on crucial regions that will lessen traffic flow on major and minor roads around the region.In this study, machine-learning classification techniques utilized in AADF traffic prediction, including Logistic Regression, Decision Trees, Random Forests, K-Nearest Neighbor, and Gradient Boosting, are used to establish a correlation model of road traffic flow with a set of related factors such as pedal cycles, two-wheeled motor vehicles, cars and taxis, buses and coaches, light goods vehicles (LGVs), and heavy goods vehicles (HGVs) which in turn are used to predict the AADF traffic flow within the Four regions of UK.The F1 score and accuracy of the trained models are then used to assess the effectiveness of the developed model as well as the detrimental impact of each prediction model.
The ROC curve technique is utilized to evaluate the performance of certain minor elements and the prediction model.Finally, the improved model may be used to forecast AADF traffic flow within the regions and the local authorities associated with them.
This research contributes to the practical implementation of data-driven decision making in traffic analysis.By using traffic flow data from the AADF and employing machine-learning algorithms, this study provides an evidence-based approach to understanding traffic patterns.By demonstrating their effectiveness in analysing traffic flows and making accurate predictions, this research provides city planners, traffic authorities and drivers with a valuable tool for improving traffic conditions and optimizing route choice in real-world scenarios, as well as helps them to make more informed and effective decision-making processes.In addition, this research bridges the gap between academic research and practical applications in the field of traffic analysis.By training and testing the advanced machine-learning algorithms, this study provides practical results that can directly influence traffic planning and decision-making processes, which can ultimately lead to more effective solutions for traffic management.
The breakdown of this paper is as follows.Section 2 describes the related works and literature, and Section 3 presents the methodology for traffic analysis research.Section 4 details the implementation and results of traffic analysis, showing different results for four selected regions.
Section 5 provides details on performance evaluation and Section 6 presents the discussion related to the findings.Finally, Section 7 sums up this paper and research contributions.

| RELATED WORKS
The literature on traffic flow analysis forecasts is extensive.This work looks at previous technical assessments in the realm of traffic flow explanatory analysis and prediction models.Andersson and Chapman (2011) employed the Kendall correlation coefficient to evaluate road traffic flow and investigate the link between road networks for road traffic injuries and road obstacle features such as traffic light systems, poor roads, and total road distance population.In a similar study, Fang and Shen (2012) suggested a regression model for estimating road traffic fatalities in London province based on data including local vehicle parks, route distance, and population.Cai et al. (2015) used the signal strength of cellular phones to calculate the average traffic flow speed by comparing the signal strength traced on the cellular phones to the known trace of roads and computing the average traffic flow.They proceeded to estimate speed using a handover method.They looked for a base station that could handle a high number of road users and calculated the difference in access times between two successive base stations to anticipate traffic flow on the road in real-time.Even though this approach might cover most minor roads, it fails to accurately monitor changes in trace speed based on the road category.Ramesh et al. (2019) implemented time-series for average traffic flow analysis, predicting the future based on historical parameters associated with numerous predictive techniques.Time-series models find patterns in historical data and extrapolate those patterns into the future.Ramesh et al. constructed and evaluated traffic flow based on road latitude, longitude, and direction using optimization algorithms such as historical average count data, time series, neural networks, and nonparametric regression models.Kalair and Connaughton (2021) proposed an up-to-date nonparametric segmentation method to classify and detect traffic anomalies by identifying the atypical fluctuations in the association between flow and density.They applied their method to the data relating to London's M25 motorway from the UK National Traffic Information Service (NTIS) and evaluated the method by several metrics, including time-to-detect, detection rate, and false alarm rate.The results indicate that their approach outperformed other statistical approaches, particularly on the multi-modal dataset.Ulbricht (1994) used multi-recurrent neural network models to predict the number of traffic flows going through a highway checkpoint between the early hours of 5 am and 9 am and compared the results to other regression models.Shafiei et al. (2022) introduced a graphical model comprising information from major roads within the local authority, which was the first time Bayesian networks (BN) were used for traffic flow prediction.According to Jomnonkwao et al. (2020), the findings revealed that the BN model outperforms other condensed approaches such as the Random Walk (which considers current traffic flow conditions), the AR model, and a fuzzy-neural model.Likewise, Shepelev et al. (2020) and Monfared et al. ( 2013) created a graphical model that linked the average daily flow to the congestion status of each road network, as well as a theoretical traffic model that reproduced the delay frequency within a highway.In the study of Azimjonov and Özmen (2021), they developed an innovative vehicle-tracking algorithm based on the bounding box (Bbox) to improve the performance of Yolo, a general-purpose object detector, in classifying vehicles.They prepared a dataset containing over 7000 images from highway videos and used them to train 10 machine-learning classifiers, with one of them using the CNN algorithm.The classifier with the best performance was then combined with Yolo to develop a new vehicle detector.Their results show that the accuracy of the detector was nearly 40% higher than that of Yolo.In addition, the integration of the detector with the Bbox-based tracking performed great in vehicle counting tasks, contributing to real-time traffic flow detection.

| METHODOLOGY
The methodology followed in this paper for developing an effective traffic analysis model is illustrated in Figure 2. The procedure relies on machine-learning techniques and is divided into five crucial steps: data collection, EDA, feature engineering, model training, and model evaluation.
The dataset used in this study is traffic flow data collected from the UK government.Following data collection, EDA is performed to understand the data's characteristics.This includes data cleaning, data transformation, evaluating data distribution, investigating relationships between variables, and identifying potential anomalies.Feature engineering is the third step, which aims to refine and optimize the input features for the models to enhance predictive performance.After preparing the dataset and dividing it into a training set and a test set, several traffic analytics models, including Logistic Regression, Decision Tree, K-Nearest Neighbors, Random Forest, and Gradient Boosting, are trained.The final step in the methodology involves assessing the performance of the trained models.Evaluation metrics such as accuracy, F1-score, and ROC-AUC curve/ score are used to quantify how effectively the trained traffic analytics models have learned from the data and predict the target variable.

| Problem statement and dataset
The aim of this research is to predict which road type will encounter a particular traffic flow rate (major or minor road) using classification models (Logistics Regression, Decision Tree, Random Forest, K-Nearest Neighbor, and Gradient Boosting).
To realize the research purpose, the dataset employed in this study is a primary dataset collected from the UK government.It includes traffic flow data by vehicle type on different roadways in four separate geographic regions of the United Kingdom.The dataset was subjected to EDA to

Model EvaluaƟon
learn more about it, the key reasons for traffic, and how traffic flow has changed or not changed over time.The challenge is to estimate which road type has the largest traffic flow.Major and small roads are the two categories of roads discussed in this article.

| Exploratory data analysis
EDA is a step in the Data Analysis Process that employs a variety of techniques to comprehend the dataset being used properly.EDA allows the analyst to get a better look at the data, and how each attribute affects the other and to derive better insights for the prediction.
To better analyse the dataset, this study first extracts the data samples related to the four regions that need to be studied and then checks for the missing and duplicated values.After data cleaning, the values in text format are transformed into numerical numbers, followed by statistical analysis and visualization.This study analyses the data balance and attribute distribution through the EDA to better understand the datasets.

| Feature engineering
After EDA, this study performs feature engineering to understand the relationship between the attributes and the dependent variable.Feature engineering is employed to reduce the number of features when constructing classification and prediction models.It helps to identify the variables that contain the most relevant information for a given task (Thakkar & Lohiya, 2021).Moreover, it can also be used to remove redundant and irrelevant variables from a dataset.Through feature selection, classification and prediction models can be more cost-effective, faster, and more accurate simultaneously.This study computes a correlation matrix to learn the relationships and decides whether to drop any variables.
After the data pre-processing, we split the dataset into a training set (60%) and a test set (40%), which is a commonly adopted data split ratio, so that there is a higher amount of training data to ensure diversified inputs to make the model learn the hidden features in the data.

| Machine-learning classifiers
This study employs five different machine-learning algorithms to build the classifiers using the training set, including Logistics Regression, Decision Tree, K-Nearest Neighbors, Random Forest, and Gradient Boosting.These five algorithms were chosen based on their practical application for traffic analysis.In comparison to neural networks such as RNN and ANN, although they may exhibit slightly lower predictive performance, these algorithms are generally regarded as more easily interpretable and understandable (Díaz-Rodríguez et al., 2022;Garre et al., 2020).The aspect of interpretability is particularly valuable in transportation planning and decision making, as it helps stakeholders comprehend the reasoning behind the algorithmic recommendations.Moreover, they offer cost-effective advantages.These algorithms are computationally efficient and can handle large datasets with reasonable resource requirements compared to deep learning algorithms (Dargan et al., 2020).In addition, they have well-established implementations in popular machine-learning libraries, and their parameter tuning and model selection can be relatively easier.The algorithms are explained as follows in Table 1.
By comparing the output to the data set, an analysis with a confusion matrix would be used to validate the correctness of an ML model.
When making correct forecasts, True Negatives, False Positives, and False Negatives are all possible outcomes.These measures are used to evaluate the efficiency of the models and are primarily used to assist in selecting the best-suited model.The number of correct predictions in both classes divided by the number of characteristics in the training set determines the accuracy of a classification model.In this case, the positive class is 1 (main road), while the negative class is 0 (minor road).Based on the confusion matrix, this study evaluates the classifiers' performance by comparing their Accuracy, F1-score, and Roc-AUC curve/score.

| Data cleaning
The first step is to get a list of the dataset's attributes or features' names.This process is used to provide a broad overview of the data to determine how normalized it is.There are 489,159 samples in the dataset, with 33 different features (7 categorical features and 26 numerical features).
The road type features attribute is the focus variable in this study, and it has a categorical variable with values of 'Major' or 'Minor'.
In the next step, this study extracted the four geographic locations needed for this research work from the complete AADF dataset.After this, the newly extracted dataset was inspected for errors and null values.It was observed that there are five columns with missing values.Before this was treated, the unnecessary columns were removed to maintain the data integrity and usefulness and remove redundant data.Then, the dataset was inspected for duplicates.There was only one duplicate value in the dataset, and it was dropped.In this stage, it was observed that the dataset has reduced to 166 k samples while retaining 26 features.

| Data transformation
Before proceeding to the next stage, the values of dependent variables need to be verified in numeric form.The values of the dependent variables are encoded so that machine-learning algorithms can understand and process the input (Brownlee, 2022).
The first step is converting the road type into numeric numbers 0 and 1.For 0 is a major road and 1 indicates a minor road.Apart from the type of road, all the other categorical attributes are encoded.To be specific, all the categorical values in each column are transformed to numerical values (replace object type with integer type) for training and testing.Efficient integration is also achieved before splitting the data for training and testing.

| Data balance analysis
This study first checked the balance of the data samples relating to the road type.According to Figure 3, there is an unequal traffic flow; 80% of the average annual daily flow of traffic occurs on the major road in the four regions within this analysis.
As shown in Figure 4, among the four regions, the Northwest has the highest count of road usage and traffic flow, with over 56,000 records.
In comparison, the Northeast has the least traffic flow counts, with over 24,000 records.
As the capital of the United Kingdom, London has the highest local authority count, comprising of 32 boroughs and the City of London.Conversely, the North East region has the lowest number of local authorities with 12, see Table 2.
AADF is an estimated measure of the full-year average of the number of vehicles passing a particular point in the road network per day.
Investigations were conducted to analyse the AADF of vehicles across different road types.
T A B L E 1 Machine-learning models.

ML classifier Summary
LR Logistic Regression is a supervised ML algorithm that is used for classification tasks.This algorithm is mostly employed in binary classification.The sigmoid function is applied in the process of Logistic Regression to return the likelihood of a label (Hussein et al., 2021).

DT
The structure of a Decision Tree is similar to a tree.Each internal node represents an attribute test, and each leaf node represents a label.This algorithm builds a tree through the segmentation of the source set into subsets based on the judgements of attributes (Pappalardo et al., 2021).This process is repeated recursively on each derived subset and stops when instances on each node all have the same value or when the partitioning does not add value to the prediction.

KNN
The K-Nearest Neighbour (KNN) is a nonparametric supervised learning algorithm that assumes that similar points can be found in the vicinity of each other.It has been utilized extensively in classification tasks due to its straightforward implementation and few hyperparameters.The KNN algorithm determines the nearest neighbours of a given point by calculating the distance between that point and other data points, thus assigning a category label to that point (Wani & Roy, 2022).

RF
Random Forest is a supervised machine-learning algorithm based on the Decision Tree.It improves the performance of many weak learners by voting for the majority (Gupta et al., 2021).The samples in the original dataset are used as input for each tree in the classifications.After that, features are chosen randomly to be employed in developing the tree at each node.The algorithm does not prune any trees in the forest until the activity has been completed and a decisive prediction is made.By doing so, the Random Forest can construct a powerful classifier based on any classifiers with weak correlations.
GBDT Gradient Boosting is one technique that excels at capturing high-dimensional relationships.It can automatically identify sophisticated data structures, including non-linearities and higher-order interactions.It can recursively fit a weak learner to the residuals to enhance a model's performance as the number of iterations increases (Zhang et al., 2019).The similarity between GBDT and RF is that they are both ensemble models with Decision Trees as weak learners.The biggest difference is that GBDT uses the boosting technique while RF uses bagging (Li et al., 2021)

| AADF of vehicles in each road type
The analysis from Figures 5-8 shows that London has the highest annual average daily flow of traffic count on major roads due to its high congestion and commercial activities compared with other regions in the study.It was also observed that buses and coaches have high traffic flow compared with other regions because most citizens prefer to use public transportation over private transportation due to policies and regulations.well-known for its industrialization activities compared with other regions.Another reason is that due to its central locations, HGVs that transport goods from north to south and south to north use major roads in Yorkshire and the Humber.
As can be observed from Figures 17 and 18, most vehicles in the four locations prefer going through the major roads rather than the minor roads.This means that the traffic flow on minor roads will be less than on major roads.

| Top 5 busiest locations in each region
This study performed an analysis to identify the top five busiest locations in each region.F I G U R E 2 AADF trends of pedal vehicles.
F I G U R E 2 AADF trends of all motor vehicles.

| London regions
Figure 26 shows a similar distribution of road types between London as in Figure 3. Thus, this study can make relevant inferences using London and apply any possible findings to the rest of the United Kingdom.
Currently, the most congested roads within the United Kingdom are present in London, 'A406' and 'A23'.The most common vehicles traversing these roads are shown in Figures 27 and 28.Highway A406 and A23 circle around central London and connect all of North, East, South, and West, acting as a 'ring'.Initially, there were meant to be several 'ring' highways, but due to construction costs and pushbacks, only one was completed.That 'ring' highway was split into two sections, A406 and A23; thus, these highways now accommodate most vehicles, resulting in heavy congestion.
The following Figures 29-31 4.4 | Machine learning models

| Logistic Regression
The LR algorithm is shown in Figure 32 below.

| Decision Tree
The DT algorithm is shown in Figure 33 below.

| Random Forest
The RF algorithm is shown in Figure 34 below.There are several distance metrics to conduct the calculation, including Euclidean distance, Manhattan distance, Minkowski distance, and Hamming distance.In addition to the distance metric, the other aspect is the K value, which defines the number of neighbours to be checked.It is essential to determine a suitable K as it can cause overfitting or underfitting and then influence the performance of the classifier.

| Gradient Boosting Decision Tree
The GBDT algorithm is shown in Figure 36 below.

| Feature selection and engineering
The label encoder turns all the category values in each column into numerical values for training and testing.The column road type was then standardized before the data was trained and tested.A heatmap of the correlation matrix was constructed to check if there was a positive or negative correlation between the attributes and the goal (i.e., the type of road), as well as if there was an association between attributes.
As shown in Figure 37, all the features positively correlate with the target label.Moreover, the correlation coefficient between any two features does not reach 1 or À1, indicating that these features are not perfectly correlated with each other.Therefore, this study keeps all the features because they contain different information related to the dependent variable in different degrees.These features will be used for creating the machine-learning model.
F I G U R E 3 6 Gradient Boosting Decision-Tree algorithm.
F I G U R E 3 5 K-Nearest Neighbor algorithm.
After understanding the relationship between the attributes, the dataset was then divided into training and testing data.60% of the data was used to train the model and 40% of the dataset was used to test its performance.Following that, due to the class imbalance, the training and testing data were oversampled and normalized separately before training the algorithm.

| Machine learning and analysis of results
The dataset is then exposed to the machine-learning classification procedure after it has been encoded and separated.Machine-learning algorithms are trained and developed using divided data.Logistics Regression, Decision Tree, K-Nearest Neighbors, Random Forest, and Gradient Boosting are the methods employed.
The accuracy and consistency of an ML model can be evaluated by comparing the output to the dataset and analysing the results utilizing a confusion matrix.A score of 1 means the model is great at predictions, while 0 means the model has no predictive power.The most used assessment metric is accuracy.It determines how well a model can anticipate outcomes.Its disadvantage is that when there is a class imbalance, the accuracy will compute for only one class while disregarding the other, making it a less-than-ideal metric for class imbalance.The F1 score reflects the balance of accuracy and recall.It is a measure of the model that is particularly useful where the dataset has imbalanced classes.The accuracy and recall values obtained from the machine-learning method are used to calculate the F1 score.38 show that the classifier with the highest F1-score is Random Forest, followed by Gradient Boosting, K-Nearest Neighbors, Decision Tree, and Logistic Regression.
The ROC curve is created by plotting the true-positive rate against the false-positive rate at various threshold levels.The true-positive rate is also known as sensitivity, memory, or likelihood of identification in machine learning.The false-positive number, which may be specificity, is also The ROC curve is created by plotting the true-positive rate against the false-positive rate at various threshold levels.The true-positive rate is also known as sensitivity, memory, or likelihood of identification in machine learning.The false-positive number, which may be specificity, is also known as the fall-out or probability of a false alarm.The Random Forest classifier appears to be the best on this ROC curve since it has the greatest area under the curve.Because it just shaded the poorest-curve rows, the Decision Tree model is the worst performance.

| DISCUSSION
It was necessary to research road networks and the classification of local government road networks within each region.The study was narrowed to analyse the five major and minor roads within each region and the determinant factor of traffic flow.It was discovered that the most obstruction of traffic flow on major roads in London is caused by Buses and Coaches due to high commercial activities.High policies for parking cars, whereas in the Northeast, obstruction of traffic flow is caused by taxis and cars.Similarly, in the Northwest, studies show that these regions are more residential settlements than other regions in this study.In contrast, the Yorkshire obstructions are caused by HGVs, and this arises because of the regional knowledge of industrialization activities compared with other regions in the study.The study has broadened the understanding of the British transportation system, including the top five busiest local authorities in each region and certain driving precautions to be aware F I G U R E 3 8 Confusion matrices for ML models.
T A B L E 3 ML model performance evaluation. of since they differ.The current traffic analysis research is focused on the frequency of usage and congestion status on the major and minor roads in each selected region.Although most vehicles tend to avoid usage of major roads in peak hours, avoidance of traffic congestion in some regions such as Yorkshire and the Humber is unlikely and improvements to improve traffic flows will be highly crucial.The futuristic algorithms should focus more on finding the paths that can get the least amount of time to the destination, rather than finding the shortest path that can result in congestion.Introducing policies similar to central London may help the current situation to some extent.A better long-term solution is to enable predictions on the traffic status and use intelligent algorithms to suggest how to use the roads to avoid congestion rather than funding the shortest path, which is widely used in current traffic analysis and GPS systems.

| CONCLUSIONS
The goal of this study was to understand traffic analysis achieved by developing algorithms to demonstrate traffic flows in focused regions in 2021.This study gathered data relating to four geographical areas from the AADF traffic flow data in the United Kingdom and used it to train five machine-learning algorithms.When combined with the other algorithms, this study performed a deep analysis.Finally, the Random Forest method had the greatest results, scoring above 96% on all performance parameters.The computational analyses of this research can be over 96% certain of the predictions if it is put into production.As traffic congestion can be problematic, it is essential to understand the patterns of traffic flow and recommend ways to improve traffic conditions with the smarter utilization of different routes.In summary, the research contributions of this study are in two folds.First, this study provides comprehensive reviews of traffic analysis in the United Kingdom and present insights useful for planning, traffic improvement, and decision making.Second, this study develops intelligent algorithms to provide useful and up-to-date analysis on traffic analysis, so that drivers can decide on their routes, particularly in peak hours.
This study also has limitations.One potential limitation could be the generalizability of the study.The study focuses on traffic analysis in specific geographical areas within the United Kingdom.The findings and performance of the algorithms may not be directly applicable to other regions or countries with different road infrastructures, traffic patterns, and driving behaviours.Another limitation is that the execution time of the algorithms was not tested in this study to analyse their performance from the perspective of efficiency.The future work may include developing more advanced algorithms to understand traffic analysis better and present results in more interactive ways through analytics and visualization.In addition, future research may expand the scope of the study to include traffic analysis in various regions or countries, providing a broader understanding of the performance and applicability of the algorithms across different contexts.The real-time data sources will also be integrated into the analysis as well as the execution time of the selected algorithms will be tested to allow timely decision making.
Based on the edge nodes,Chen et al. (2020) developed a traffic flow detection scheme by utilizing deep learning algorithm.In their study, a vehicle detection algorithm and a multi-object vehicle tracking system were first constructed based on the You Only Look Once (YOLO) model and Deep Simple Online and Real-time Tracking (DeepSORT), respectively.After that, they realized traffic flow detection by developing a realtime vehicle tracking counter by combining these two algorithms.Mehrannia et al. (2023) also employed a deep learning algorithm in their research on traffic accident detection.Based on the real traffic flow and accident data from the Twin Cities Metro freeways of Minnesota, they utilized long-short term memory (LSTM) network to extract the features of traffic flow data and train their model to label the data samples as crash or non-crash.The LSTM is a modified version of RNN, and it can easily remember past data in memory.According to their results, the LSTM model was able to detect accidents within 18 min and achieved better performance than other machine-learning models, such as CNN, RNN, AdaBoost, and so forth.

Figures 9
Figures 9 and 10 show that London has the highest traffic flow of LGVs.According to the study, this is due to the local movement of goods and services within the region using the road network.In contrast, other regions with high landmarks use train transportation to supplement road transportation.HGVs use major road networks rather than minor roads to transport industrial products from one location to another, as shown in Figures11-16.The study also discovered that Yorkshire and the Humber have the highest traffic flow of HGVs, indicating that the region is Figures[23][24][25] show that traffic flow on major roadways has significantly increased since 2007, affecting all pedal, motor, and HGVs.The above trends show that traffic congestion on major roads remained steady from 2007 to 2020, while after 2020, it began to decrease due to the , show a distinct difference between the distribution of vehicles passing major and minor road types.East and West London had more vehicles in all three categories compared to central London, likely due to the larger population present in outer London (Outer London contains a population of approximately 4.4 million while inner London has a population of 2.7 million;Brownlee, 2022).Surprisingly, central London does not have the highest number of vehicles passing on its roads in the three categories.In other words, there is an effective management of restricted usage, particularly restricted usage of HGVs.High fees are also applied to all vehicles if passing central London in peak hours.From Figure29, the busiest local authority is shown to be Westminster, located in central London, followed byBarnet, located  in North London.Next is Hounslow and Hillingdon in West London and Tower Hamlets in East London.This study then analyses the AADF of all motor F I G U R E 2 6 Analysis of road type in London.F I G U R E 2 5 AADF trends of all HGVs.vehicles, pedals, and heavy ground vehicles.Figures 29 and 30 show that East London and West London have the highest and second-highest vehicles on the major road, and West London has the highest number of vehicles on the minor road.Results show that the traffic controls in Central London have worked well on the major road but not on the minor road.More people have moved to East London since before London Olympics in 2012.Figure 31 shows East London and North London have the highest and second-highest heavy ground vehicles.North London has more offloading of goods and services since it is nearer to some industrial areas nearby.Therefore, it has the second-highest heavy ground vehicles.F I G U R E 2 9 AADF of all motor vehicles.F I G U R E 2 8 Recording of large ground vehicles.F I G U R E 2 7 Recording of cars and taxis.

F
I G U R E 3 1 Heavy ground vehicles include 2-6-axle ground vehicles.F I G U R E 3 0 AADF of all nonmotor transport.4.4.4| K-Nearest Neighbor The KNN algorithm is shown in Figure 35 below.F I G U R E 3 Random Forests algorithm.F I G U R E 3 Logistic Regression algorithm.F I G U R E 3 Decision Tree algorithm.
Analyses between Figures 19-22 show the top give busiest average annual daily traffic flow within each region, with Leeds being the busiest in Yorkshire, Durham with the busiest in the Northeast, Lancashire the busiest in the Northwest and Westminster the busiest in London.

Table 3 and
Figure