Robust temporal low-rank representation for trafﬁc data recovery via fused lasso

Achieving complete and accurate trafﬁc data as input is crucial for most intelligent transportation systems. However, due to hardware or software malfunction, trafﬁc data is inevitably faced with missing and noise problems. Most of the existing representation-based trafﬁc data recovery methods adopt sparse representation theory, which well models the local association properties of trafﬁc data, but ignores their global correlation. To overcome this shortcoming, a robust low-rank representation method that incorporates temporal prior information to impute the missing trafﬁc data is proposed. Speciﬁcally, the low-rank representation theory is ﬁrst employed to model the global spatial correlation of trafﬁc data, and then the fused lasso regularisation is utilized to ﬁt the temporal correlation of trafﬁc data. In addition, to make the proposed model more robust, F-norm regularisation is used to smooth the Gaussian noise of trafﬁc data. Furthermore, an efﬁcient optimisation algorithm based on ADMM is developed to solve the proposed model. Finally, the extensive experiments performed on real dataset validate the effectiveness of the proposed method.


INTRODUCTION
With the increase of car ownership year by year, urban roads have become more congested and the traffic efficiency is lower. In this context, Intelligent Transportation Systems (ITS) play a huge role and have received more and more research attention. [1,2] Complete and accurate historical traffic data is the basis of ITS, but in actual conditions, the data collected by the traffic sensors suffers missing or corrupted value problem is almost inevitable. For example, in Melbourne, Australia, about 8% of the traffic flow sensor has a data loss rate as high as more than 56%, while in Beijing, China, about 10% of traffic data is incorrect. [3] There are many reasons for the above matter, mainly including storage device failure, transmission failure, data abnormality, and so forth. [4] However, most ITS must take complete and precise traffic data as input, such as short-term traffic flow predictions, [5,6] so how to recover missing data with high accuracy has become a great challenge. In order to address the problem of missing data mentioned above, scholars have proposed solutions from different perspectives based on the study of the characteristics of traffic This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited. © 2021 The Authors. IET Intelligent Transport Systems published by John Wiley & Sons Ltd on behalf of The Institution of Engineering and Technology data, among which the most representative is the Mean Imputation (MI), [7] K-nearest Neighbour Regression (KNNR), [8] Probabilistic Principal Component Analysis (PPCA), [9] Singular Value Threshold algorithm (SVT), [10] and so on. These methods have achieved good results, and some algorithms even have landmark significance. However, some of these proposed methods could not make full use of the spatial-temporal correlation of traffic flow data, moreover, some did not consider the impact of noise on missing value estimation. Considering the low-rank property of traffic flow data, the current advanced Low-rank Representation theory (LRR) can be used for modelling. In addition, low-rank representation also has the advantage of noise robust. Further, through an in-depth analysis of real traffic flow data, it is found that the traffic flow data has temporal stability property, that is, the traffic flow value changes between two neighbouring moments very small or even unchanged, so fused lasso [11] can be adopted to increase the precision of recovery.
Inspired by the above discussion, in this article we propose a novel noise robust traffic flow estimation model based on LRR. The contributions of this article are summarized as follows: • By analysing a large amount of real traffic flow data, the hypothesis that the traffic flow matrix has a low rank is verified, and further analysis finds that the traffic flow data has temporal stability property. • Based on the theory of LRR, and taking full advantage of the low-rank and temporal property of the traffic flow matrix, a noise robust completion model for missing traffic data with a temporal regularisation constraint term is proposed, named RTLRR (Robust Temporal Low-rank Representation). • An optimized algorithm for solving the proposed model is designed based on the ADMM algorithm. This algorithm has the advantage of fast convergence. • Based on the pre-processed raw dataset, the superiority of the proposed model compared to other methods is verified through comprehensive simulation experiments.
The organisational structure of the article is as follows: In Section 2, related works will be introduced. Section 3 will show our verification of prior hypothesis information on traffic flow data. In Section 4, we will propose a novel traffic flow missing data recovery model and give an optimisation algorithm. Then, in Section 5, we will compare the estimation results of the proposed model with other methods on outiler-free and raw datasets by simulation. Finally, the research work of this article will be summarized in Section 6.

RELATED WORK
So far, the methods for recovering missing data can be roughly divided into the following three categories.

Methods based on statistical learning
Statistical learning methods include simple interpolation methods, regression methods, and probabilistic model methods. The typical representative of naive interpolation method is MI. MI is to replace the missing values with the average of the observed samples. Tak et al. used such a method. [7] The advantage of MI is particularly simple, but because all missing values of the same sample use the same mean. Instead, the original distribution of the sample is seriously distorted, and the performance is often poor. The typical representative of naive interpolation method is MI. MI is to replace the missing values with the average of the observed samples. Tak et al. used such a method. [7] The advantage of MI is particularly simple, but because all missing values of the same sample use the same mean. Instead, the original distribution of the sample is seriously distorted, and the performance is often poor.
The goal of regression is to establish a mapping relationship model between missing values and estimated values. Using such mapping relationships can easily estimate missing values. Regression can be divided into linear regression and non-linear regression. The representative of linear regression model is Least Square Regression (LSR). Non-linear regression mainly includes KNNR , Support Vector Regression (SVR) [12] and so on. The KNN algorithm finds the k samples closest to it based on the complete variables of missing samples in the data, which has been proved to have achieved good results in traffic flow data recovery. For example, Batista et al. proposed a data-driven improved KNN based regression algorithm called KNNR to estimate missing traffic values which achieved good results. [8] However, KNN recovers missing data in isolation, so it does not capture the global characteristics of the data. In addition, KNN has the disadvantage of being sensitive to noise. Shang et al. [12] proposed a traffic data estimation model based on Fuzzy Clustering Method (FCM) and SVR, and designed a solution method based on Particle Swarm Optimisation (PSO). This model not only uses the temporal characteristics of traffic flow data, but also takes into account the correlation between different sensors, so it has achieved good results. However, the model does not consider the noise factor in the flow data.
The probabilistic model method is based on the assumption that the data follow a certain probability distribution, such as uniform distribution, Gaussian distribution, chi-square distribution, and so forth, and then the missing values are estimated according to the probability model. PPCA is the representative of the probabilistic model method, Batista et al. [9] used this method for modelling. The model assumes that traffic data x is generated by the hidden variable z, x and z are mapped as the relationship x = Wz + , and the edge distribution p(z ) and conditional probability p(x|z ) of z both obey Gaussian distribution. In such a linear Gaussian model, p(x) and p(z|x) also Obey Gaussian distribution. The model optimisation goals are W and . Based on the prior assumptions, the model parameters can be optimized by using Maximum Likelihood Error (MLE) or Expectation Maximisation (EM). Recently, Bayesian Optimisation (BO) has also been widely used to adjust parameters. [13,14] The disadvantage of PPCA is that it relies too much on a priori assumptions. However, in actual situations, it is difficult to determine which distribution the true data obeys. For example, traffic flow data often does not obey the normal distribution for the double peak traffic period.

Methods based on matrix completion
Matrix completion theory is derived from Compressed Sensing Theory (CS). [15] Candés et al. [16] proved that the incomplete matrix can be recovered with higher accuracy only if the matrix to be restored is low-rank. For traffic flow matrix, this assumption is reasonable, because each item in the same road network are often related, which is called spatial correlation. Spatial correlation provides a basis for our LRR modelling. The traffic flow values at different times on the same road are also related, which is called temporal correlation. In general, the temporal correlation is represented by temporal stability, that is, the values at adjacent moments do not change much, and the temporal-spatial correlation is reflected by the low-rank property of the matrix. This will be demonstrated experimentally in the next Section 3. Commonly used matrix completion algorithms include SVT, OptSpace, [17] Subspace Evolution and Matrix completion SVT [10],OptSpace [17],SET [18], ILRMD [19] Sparse representation SSRp [22] Transfer (SET), [18] and so on. In, [19] Luo et al. proposed an improved low rank matrix decomposition (ILRMD), which fully utilizes the spatiotemporal correlation characteristics among traffic data, but it could not care the impact of stochastic fluctuation of traffic data on estimation results. In recent years, research interest has gradually shifted from matrix completion in twodimensional space to tensor completion in three-dimensional space. [20] proposed a flow estimation model combining matrix completion and tensor completion, and designed a flow estimation model. Based on the optimisation model of CP decomposition, the experimental results show that the method has a fairly high estimation accuracy, and it still has a good performance even on holidays with complex traffic conditions. However, this method cannot concern the local properties of samples and is sensitive to noise.

Methods based on sparse representation
In recent years, Sparse Representation theory (SR) [21] has been applied to the restoration of missing traffic flow data. Chen et al. [22] proposed a missing traffic value interpolation model based on lp norm regularisation called SSRp. In the theory SR, each sample can be represented by a linear combination of other samples. The goal is to find such a sparse coefficient matrix, which is different from the 1 norm constraint used in traditional sparse representation. [22] adopted constrained by the lp norm, a more sparsity was achieved. The simulation results show that the model has a high accuracy estimation performance. However, sparse representations have the inherent disadvantages of not being able to make good use of global properties and poor anti-noise performance. What is more, SSRp does not take into account the temporal stability characteristics of traffic flow. Table 1 shows the summary of the related works mentioned in this section clearly.

RESEARCH ON REAL TRAFFIC FLOW DATA
The real traffic flow data set used in this study is derived from 20 sensor nodes in the road network formed by the intersection of the I205 Freeway and SR14 Highway in Portland, Oregon, USA (http://portal.its.pdx.edu/). The map of the road network  To visually show the characteristics of traffic flow data, we select four typical figures in Figure 2 which is a visualisation of traffic data collected by 4 different sensors in the same day. The horizontal axis is time, and the vertical axis is vehicle flow. It is obvious that traffic data collected by the 20 sensors show some comm on characteristics. First, the traffic data collected by most sensors have two peak periods in the morning and evening, and the two peak periods appear basically the same, that is, traffic flow is periodic. Secondly, the change of the traffic values at most times is relatively small, but at some moments a sudden change (surge and decrease) will occur.

Low-rank property
As mentioned earlier, the spatio-temporal correlation is manifested by the low rank of the matrix. Low-rank property of the matrix is the basis for applying LRR. In this section, we will experimentally verify that the traffic flow data does have low rank property. We will use Singular Value Decomposition (SVD) to evaluate whether the traffic matrix is low-rank. Any traffic matrix X ∈ ℝ T ×N , where T is the number of time slots and N is the number of traffic flow detectors set at different location, X can be decomposed into the following form where U ∈ ℝ T ×T and V ∈ ℝ N ×N are two unitary matrices, and ∈ ℝ T ×N is a diagonal matrix whose diagonal elements are singular values. The singular value of X (in descending where r is the rank of the matrix, obviously the rank of the matrix can use the number of non-zero singular value to represent. The definition of the rank of the above matrix is actually the exact rank of the matrix. It is worth noting that in the real world, there is almost no absolute low rank. In addition, calculating the exact rank of a matrix is a morbid problem, because little changes of each element may lead to great changes in the rank of the matrix. [23] According to PCA theory, [24] if matrix X is low-rank, then the first k(k ≪ T ) singular values occupy the majority of all singular values. [25] In order to show the low rank of the traffic matrix on different working days in detail, we show the proportion of the first k singular values of the traffic matrix X collected over 5 working days, which is shown in Figure 3. The top 50 singular values occupy over 80% variance of the sum variance captured by all the singular values. In addition, the low rank of Tuesday, Wednesday, and Friday is better, which indicates that the traffic flow matrix has a good low rank property.

Temporal stability property
Generally speaking, the difference between each pair of adjacent time slots monitored by a same traffic flow sensor node will not be too large, or even hardly changed. We call this characteristic temporal stability, in order to prove it, define the gap between each pair of adjacent time slots at a same location where, t represents time and n represents the detector location. Obviously, there are 2 ≤ t ≤ T , 1 ≤ n ≤ N , and the smaller Δ(t, n) indicates that the n-th sensor node collects data at time slot t compared to time slot t − 1. In order to better reflect the overall characteristics of Δ(t, n), calculate the normalized gap where max{|x 2,1 − x 1,1 |, … , |x T,n − x T −1,n |}is the maximal gap between any two adjacent time slots of the n-th sensor, thus the smallerΔ(t, n) indicates that Δ(t, n)is relatively smaller.

FIGURE 4 Temporal stability property
Similarly, we have analysed the temporal stability property of the traffic flow over five working days, and Figure 4 shows the Cumulative Distribution Function (CDF) of differentΔ(t, n) from Monday to Friday, it is obvious that approximately 90% of the normalized adjacent gap is less than 0.4. Concern the proportion of equal adjacent data is more than 10%, which indicates that the gap of double adjacent traffic data collected by the same sensor is small, hence traffic flow matrix is temporal stable.

RTLRR
where T represents quantity of the time slot, that is T = 288. In this article,in order to fully explore the value of traffic data, N is the product of the number of flow sensors and the number of sampling days, therefore, the model we proposed takes not only the spatial correlation of different sensors into account, but also the temporal correlation of each sensor, so N = 63 × 20 = 1260. In representation theory, each sample in X can be represented by a linear combination of samples in a dictionary A = [a 1 , a 2 , … , a z ]∈ ℝ T ×Z , that is, where W = [w 1 , w 2 , … , w N ] is a coefficient matrix, and w i corresponds to x i . Therefore, there are where w i ( j ) represents the j-th element of w i . However, dictionary is often overcomplete, therefore formula (5) is prone to achieve a trivial solution. A good solution is to replace Awith X . This is Self Representation theory, [21] then formula (4) is rewritten as follows Note that X replaces A, and it is easy to know that W ∈ ℝ N ×N .
In sparse representation theory, W is a sparse matrix. However, sparse representation has the disadvantage of not being able to capture the global properties of X which means it cannot make full use of the inherent information of traffic data in our problem, what is more, sparse representation is sensitive to noise. Therefore, Liu et al. [26] proposed LRR theory which is different from SR which solves the sparse representation of each sample independently. The goal of LRR is to jointly solve the representation of the lowest rank. W is a low-rank matrix, so the problem can be modelled as a rank minimisation problem However, the optimisation problem of formula (7) is an NP problem. Fortunately, Candés et al. [16] proved that the above rank minimisation problem can be relaxed as a nuclear norm minimisation problem where ‖ • ‖ * is the nuclear function, which is the sum of nonzero singular values. Further, the above formula can be written in unconstrained form through penalty function [27] as where 1 is a hyperparameter. Notice that the original X is incomplete in fact, note the observed X as M ∈ ℝ T ×N , that is, the sampling matrix, then P Ω (X ) = P Ω (M ), and P Ω (•) is an orthogonal projection operator, defined as where, Ω is the index of element observed in X . Thus problem (9) becomes min X,W,E Note that there is a solution target X in the projection operator, which is not conducive to subsequent solutions, so equation P Ω (X ) = P Ω (M ) can be transformed into M = X + E, where P Ω (E ) = 0, so (11 ) is equivalent to the following min X,W,E where E is the missing error.
In Section 3.2, it was experimentally proved that the traffic flow data has temporal stability. Thus, we utilize this prior information via fused lasso, there are min X,W,E By minimizing the L1 norm regularisation term ‖RX ‖ 1 , the performance of temporarily stabilizing the elements in the normal traffic flow matrix in the time dimension can be obtained.
Furthermore, sampling data probably has noise, we note noise matrix as C ∈ ℝ T ×N , to make our model robust, we add the L2 norm regularisation over C min X,W,E In addition, the traffic flow data is non-negative, so X ≥ 0 must be added.Therefore, by adding this constraint, problem (15) is further improved as min X,W,E

Optimisation algorithm
For the detachable convex optimisation problem, the simplest and most efficient solution is the Alternating Direction Method of Multipliers (ADMM). ADMM converts the complex original problem into several easy-to-solve sub-problems. By seeking the local optimal solution of the sub-problems, the global optimal solution of the original problem can be found. We use ADMM proposed by Lin et al. [28] to solve the optimisation problem in this which do not need to adjust optimal penalty parameters.
To solve the optimisation problem of Equation (16), we first define the following two indicator functions and Then, the above optimisation problem can be transformed into the form by penalty function as Equation (19), where 3 is a penalty parameter for violating the constraint term. Therefore, the optimisation process can be equivalently converted to solving the following three sub-problems iteratively of (20), where X k , W k and E k respect their own k-th iteration result respectively.
Obviously, the above sub-problems 2 and 3 have closed-form solutions, while sub-problem 1 is difficult to solve due to the existence of ‖RX ‖ 1 and g(X ).
For sub-problem 1: In order to efficiently solve, we may introduce variables D and S , let D = X , S = RX , then subproblem 1 can be transformed into the following equivalent constraint optimisation problem min X,D,S We apply ADMM to solve the above sub-problems, the augmented Lagrange function can be defined as Equation (21), where Y 1 and Y 2 is Lagrange Multipliers, provided H 1 =

the optimisation steps are
Step1: Update X : see Equation (22) Step2: Update D: then we get Step3: Update S : this is the nearest neighbour operator of L1 norm.
Step4-Step5: Update H 1 , H 2 : Step6: Update t : The closed-form solution analysis of update X is as Equation (23) shown (for simplicity, the superscript of the number of iterations has been ignored), where, I is an identity matrix of N×N. Make then we have then the closed solution of sub-problem 2 is Proof. In order to minimize 1 ‖W ‖ * + F , make the first order differential If W 's SVD is W = U W Σ W V T W , then the sub-differential of W 's nuclear norm 1 , and then substitute (37) into (36), then Since Substitute Equations (39) and (40) into Equation (38), we get

For sub-problem 3: Its closed form solution is easily known as
So far, all the optimisation algorithms are proposed, the above solution process is named RTLRR-ADMM. It is worth mentioning that in order to speed up the optimisation speed of the algorithm, we skilfully updates X (this process is called inner loop), and then the obtained X is used to update W (this process is called outer loop), and the specific description is summarized in Algorithm 1.

Data preprocessing
Due to factors such as travel willingness, weather environment, and traffic accidents, traffic flow tends to show significant stochastic fluctuations or even surge, [29] such phenomenon is obvious in Figure 2. Data with the above characteristics can be regarded as an outlier. We regard the raw data as "corrupted" data which assuredly have noise and outliers, and use it to evaluate the estimation performance of each algorithm on trueworld stochastic traffic dataset. In order to reduce the impact of such outliers on the traffic flow estimation performance,

Input:
Sampling matrix M, the indexes of sampling entries Ω, Toeplitz matrix R, the maximum iteration number of inner loop K 1 , and the minimum iteration number of outside loop K 2 and parameters 1 , 2 , 3 , and max .

Output:
Estimated result X opt ; 1: for t = 1 to K 1 do

3:
Update X t according to (

FIGURE 5
The raw data and filtered data we preprocess the raw data by filtering. The original data and outlier filtered data within one day is shown in Figure 5. It can be seen that filtering process only significantly suppresses the phenomenon of fluctuations and surges, that is, outliers, which maintains stochastic nature of the raw data. In this article, raw data preprocessed by the filter is seen as "filtered" data.

Experiment configuration
For the purpose of comprehensively evaluating the model proposed in this article, we compare RTLRR with some wellknown baseline methods, including MI, KNN, SVR, PPCA, ILRMD, and SSRp. Besides, in order to verify the role of the temporal regularisation term, the model without temporal regularisation is named RLRR. The parameters of several baseline methods are set according to the related work in [30], and the optimal parameters are determined by grid search.
The proposed algorithm and comparison algorithm were implemented in MATLAB 2016a, and the hardware environment is Intel (R) Core (TM) i5-8300H with 16GB RAM. In order to evaluate the estimation performance of each model on the missing value of traffic data, we used RStudio (package: http://www.stat.boogaart.de/compositions/) to design three patterns of missing data [22] as (a) Missing Completely at Random (MCAR) Occurrence of data missing is completely random, independent of the existing data or other missing data, hence appears as isolated data points generally.
(b) Missing at Random (MAR) The occurrence of data missing is related to its neighbour points, which is generally reflected as continuous data points as a result.
(c) MIXED Half of the missing data is MCAR and the other half is MAR. Figure 6 intuitively reflects the characteristics of these three patterns of missing data. Each column represents the traffic data sampled by one detector in one day and each row denote a time slot, the white squares represent the observed data points, and the black stand for the missing data points.

Evaluation criterion
The evaluation indicator used in this article is Root Mean Squared Error (RMSE), which depicts the absolute error of the estimation, and is defined as follows where, Total is the number of missing data, x i is the original data,andx i is the estimated data. Obviously, smaller RMSE indicates the better estimation performance. In addition, in order to compare the estimation performances of each algorithm under different missing rate, we define the missing ratio as , then the total number of missing data is × (N × T ).

Comparison of estimation performances
In this article, eight algorithms are executed 10 times on the same data set, and the average value of the results is used as the basis for evaluation. Tables 2-4 list the estimation performances of the above algorithms under three different missing patterns of and different missing rates . Where note the input sampling matrix M without outliers as "filtered", and M contains noise and outliers as "corrupted". With respect to estimation performance, it can be clearly seen from the tables that each algorithm tends to be worse as the data missing rate increasing. It is reasonable that a higher data missing rate means much more serious information loss. What is interesting, MI's estimation performance may not obey this law. Secondly, no matter which algorithm is used, data outliers will cause a decrease in estimation performance. In particular, the sensitivity of each algorithm to noise varies greatly. For example, ILRMD and SSRp are very sensitive to noise, while RLRR and RTLRR are robust to noise which are based on LRR. It is particularly interesting that MI is also insensitive to noise, but MI's estimation performance is the worst. What is more, MCAR missing pattern is the easiest situation to handle with, while MAR missing pattern is the toughest, MIXED missing pattern is centered.Also,RMSE of every model on corrupted dataset is higher than on filtered dataset, which indicates that outliers have a certain negative impact on estimation performance. Finally, no matter which type of data and missing type, the performance of MI among the eight algorithms is the worst, and the gap towards other algorithms is very obvious, then KNNR and SVR, PPCA and ILRMD have the similar performance, respectively. It is particularly noteworthy that as a state-of-art method, the performance of SSRp based on the sparse representation theory is close to RLRR based on the LRR theory, but is inferior to RTLRR, which indicates that utilizing the temporal correlation of traffic data via fused lasso can improve the estimation accuracy. As we can conclude, the proposed RTLRR achieves the best performance. In order to compare the run time of different models, Table 5 shows different run time of the 8 models under MIXED missing pattern with the missing ratio = 0.3. As we can see, MI performs fastest among all the methods for its simplicity, while methods which require iterative optimisation like SSRp, RLRR and RTLRR run slower obviously. Also, probabilistic based model PPCA requires more run time for its large amount of calculation.Benefit from ADMM optimisation algorithm, RTLRR we proposed runs faster than the similar model SSRp.
Considering the estimation error of different methods, we may draw the conclusion that RTLRR achieves signifcant improvement in estimation performance with moderate increase in run time.
For the purpose of visually comparing the performance of the presence or absence of a temporal regularisation term on the estimation performance, Figure 7 shows the estimation performance of RLRR and RTLRR on the clean data set and the corrupted data set under MIXED missing pattern. It can be seen from the figure that the RMSE of RTLRR is always lower than that of RLRR regardless of filtered or corrupted dataset, which verifies the significance of the addition of temporal regularisation term. In addition, it can be seen that the temporal regularisation term could improves the robustness of the model to a certain extent.

CONCLUSION
In this article, we propose a novel model RTLRR, which makes full use of two inherent prior information, low-rank property and temporal stability property of the traffic matrix, thus has achieved excellent results on real datasets. Different from the previous various modelling methods, we use the low-rank representation theory to model the spatial correlation of traffic data, furthermore, we model the temporal correlation via fused lasso. This method can effectively estimate the missing values in noisy traffic data due to the noise robustness of LRR. In order to solve this model, an optimisation algorithm based on ADMM is developed to solve alternately, which greatly accelerates the optimsation speed. Simulation results confirm the efficiency of the proposed method.
Our future work will involve graph structure of road networks into modelling in order to further improve estimation performance. We also plan to combine other machine learning models with our proposed RTLRR to further reduce estimation error.