SEARCH

SEARCH BY CITATION

Keywords:

  • traffic forecasting;
  • real-time predictions;
  • threshold regressions;
  • adaptive LASSO

Abstract

  1. Top of page
  2. Abstract
  3. 1 Introduction
  4. 2 Modeling traffic variables using threshold regressions and penalized estimation
  5. 3 Numerical experiments and analyses
  6. 4 Concluding remarks
  7. Acknowledgements
  8. References

Smart transportation technologies require real-time traffic prediction to be both fast and scalable to full urban networks. We discuss a method that is able to meet this challenge while accounting for nonlinear traffic dynamics and space-time dependencies of traffic variables. Nonlinearity is taken into account by a union of non-overlapping linear regimes characterized by a sequence of temporal thresholds. In each regime, for each measurement location, a penalized estimation scheme, namely the adaptive absolute shrinkage and selection operator (LASSO), is implemented to perform model selection and coefficient estimation simultaneously. Both the robust to outliers least absolute deviation estimates and conventional LASSO estimates are considered. The methodology is illustrated on 5-minute average speed data from three highway networks. Copyright © 2012 John Wiley & Sons, Ltd.

1 Introduction

  1. Top of page
  2. Abstract
  3. 1 Introduction
  4. 2 Modeling traffic variables using threshold regressions and penalized estimation
  5. 3 Numerical experiments and analyses
  6. 4 Concluding remarks
  7. Acknowledgements
  8. References

Short-term forecasting of traffic variables in a road network is an essential capability that modern traffic control centers need, so that decisions on optimal control actions (e.g., signal timing and variable message signs) are based on expected future traffic conditions and not on information soon to be obsolete. In many of the world's most developed cities, the collection of real-time traffic data has been in the past years a foremost goal. Now, having real-time traffic data and information warehouses, many of these cities are taking it to the next level by leveraging the vast stores of data for real-time, forward-looking traffic information and tools.

Because of increasing demand for road traffic predictive tools, the body of literature on traffic forecasting has increased substantially in the past decade. The intensely oscillating behavior of traffic variables, reflected as frequent and sudden shifts to extreme values (and backwards), lead investigators to assume nonstationarity and nonlinearity as basic characteristics of traffic variables. In fact, as demonstrated in [1], nonlinearity in traffic dynamics can be detected even from data aggregated to 15-minute intervals. Hence, nonlinear models like neural networks are frequently adopted for short-term traffic forecasting. Overview comparisons including a detailed literature review are available in [2, 3]. Unfortunately, calibration of these models is a tedious step which makes them difficult to handle in a large-scale, real-time scenario that would require a relatively frequent recalibration of the forecasting model.

Some recent works decomposed nonlinear features of the daily traffic cycle via classifying traffic states into clusters of distinct characteristics [4]. An alternative approach to neural networks is to adopt fully parametric models, in which different states or regimes are defined and dynamic behavior is regime dependent and linear within a regime. An advantage of such models over neural networks is that their parameters are interpretable and may provide useful insights in the investigation of causal relationships in traffic dynamics at different locations of a road network. For some applications of regime-switching models to traffic variables, the reader may consult [5] and [1].

In this paper, we use fully parametric time series models as a starting point and emphasize the nonlinearity and potential collinearity of the data to develop methods to effectively handle those common features of traffic data. Computability and scalability of the approach is of critical importance here, as in [6]. The next section presents the methodology, whereas Section 3 provides numerical examples based on 5-minute speed readings collected from three highway networks. Lastly, we present our conclusions and recommendations for further work.

2 Modeling traffic variables using threshold regressions and penalized estimation

  1. Top of page
  2. Abstract
  3. 1 Introduction
  4. 2 Modeling traffic variables using threshold regressions and penalized estimation
  5. 3 Numerical experiments and analyses
  6. 4 Concluding remarks
  7. Acknowledgements
  8. References

2.1 Overall approach and research questions

A linear stationary normal process is completely characterized by its mean and autocovariance function, and the reversed process (in time) has the same distribution as the original process (see for instance [7], Chapter 15). In general, traffic variables appear to be time-irreversible. For example, the levels of traffic volumes usually rise precipitously but decay gradually: the typical shape of the daily profile is an asymmetric M (whereas for speed, it is an asymmetric W). The class of threshold regression models [8] has been widely employed in the literature to explain empirical phenomena similar to the ones discussed earlier (a list of applications is presented in [9]). The major features of such models are limit cycles, amplitude dependent frequencies, and jump phenomena [10]. Despite their primary use as forecasting tools in this paper, threshold regressions may be used to examine some important research questions in vehicular traffic dynamics as follows:

  • For a given location in a road network, how many traffic states (regimes with linear dynamics) characterize the traffic cycle in each day and which time intervals correspond to each regime?
  • Do regimes occur in different times at different locations in the network?
  • Do specific regimes within the daily cycle differ significantly for different weekdays?

Given a forecasting horizon, in each measurement location and for each traffic regime, some past information from upstream or downstream neighboring locations is expected to be useful with regard to short-term forecasting performance. In fact, significant neighbors for a specific location may differ across regimes and significant time lags of measurements collected at a neighboring location may also differ across regimes. In previous works, the set of possible statistically significant neighbors was specified a priori using regime-dependent spatial weight matrices (see, for instance [11] and [6]), and the autoregressive order of the model was decided using statistical information criteria [12].

This work avoids a priori specification of spatial weight matrices by using l1-penalized methods that simultaneously perform estimation and model selection. Essentially, given a relatively large autoregressive order and a single matrix that represents general features of the road network topology, penalized estimation shrinks nonsignificant predictors to zero. To the best of our knowledge, this is the first application of penalized estimation methods to space-time regression models; a modified version of the basic l1-penalization algorithm adapted for spatial regression problems, different to the ones considered here, appeared recently in [13]. For some recent applications of penalized estimation methods in transportation problems, the reader may consult [14, 15] and [16].

In the application, we compare the forecasting performance of models estimated using: (i) the adaptive least absolute shrinkage and selection operator (LASSO) which performs l1-penalized minimization of squared residuals and (ii) l1-penalized minimization of the least absolute deviations (LAD) of the residuals (adaptive LAD-LASSO). Although ordinary least squares (OLS) is by far the most popular estimation method in regression models, estimation via LAD is among the most commonly employed robust techniques and has been shown to be particularly effective in terms of forecasting when the distribution of the response variable is prone to outliers [17].

Given some estimated temporal thresholds which define homogeneous traffic regimes, penalized estimation provides interesting insights in addition to its primary use as generator of parsimonious and well-specified (e.g., free from the symptoms of multicollinearity) forecasting models. Namely, the aforementioned method permits answering the following research questions:

  • What is the spatial extent of the impact from neighboring locations in the network on the forecasted traffic level on a link?
  • Are the spatial and temporal influences of neighboring locations on a road link dependent on other factors such as the distance from the link, congestion level, traffic variability, and so on?

2.2 Decomposition

Our modeling strategy divides traffic dynamics into two basic components: a location specific daily profile and a term that captures the deviation of a measurement from that profile. Forecasting using unobserved components has been frequently adopted as it can provide a better understanding of the dynamic characteristics of the series and the way these characteristics change over time [18]. Within the context of short-term traffic forecasting, such decompositions are expected to lead to superior performance compared with models applied directly to traffic variables [19, 6]. Traffic variables display nonlinear dynamics in both their mean and their variance (e.g., heteroscedasticity, see [11] and [1]); detrending simplifies the dynamics of the modeled series relative to the original ones as the daily profile will capture some of the nonlinearity in mean.

Specifically, let d be the day of the week index, s the location index, and t the time of day index. The overall model structure for a traffic variable y is

  • display math(1)

where d = 1, … ,D, s = 1, … ,S, and t = 1 … ,T. S represents the number of locations for which we seek to forecast traffic conditions, and T is the total number of time intervals per day. D may be less than seven if there is sufficient evidence of similarity of traffic dynamics for two (or more) days of the week. The profile μd,s captures the daily trend and can be viewed as a baseline forecasting model that uses only historical data and neglects information from the recent past of the process. A weighted average that weighs more heavily recent historical data can be employed for the estimation of μd,s [6]; alternative methods include principal component analysis [20] and wavelet-based decomposition [21]. In what follows, we let D = 7 and estimate daily profiles based on weighted averages with weights fine-tuned for optimal forecasting performance. It should be emphasized that the decomposition described earlier is not absolutely necessary and an alternative modeling strategy would apply models similar to the ones presented in the next subsection to the original (and not the detrended) data.

2.3 The transient model

The second part of the modeling procedure concentrates on the dynamics of the (short-term) deviation from the historical daily profile. In this work, we adopt a regime-switching modeling framework. In particular for each location s, a space-time threshold autoregressive model is employed to capture transient behavior

  • display math(2)

where inline image for rd,s = 1, … Rd,s + 1 and we use the convention that T0 = 0 and inline image. In (2), rd,s is an index that specifies the operating regime. The thresholds inline image, separate and characterize different regimes and in general may differ for different locations in the road network and different days of the week. In contrast to [6], in this work, the number of thresholds and their magnitude are unknown quantities that need to be estimated. It should be emphasized that the predictive part in (2) contains solely past information from the variable that is modeled, just for simplicity in exposition. The approach can be extended to include past information from other traffic variables as well, in a straightforward manner.

The aforementioned predictive equation contains an intercept term that varies with location, day of the week, and traffic-regime within a day. Ns is the number of neighboring locations of s that may provide useful information (at some previous time instances) with regard to short-term forecasting performance, and p is the autoregressive order (maximum time lag) of the model. Hence, the first sum in (2) contains information on the recent past of the location of interest, whereas the second sum contains information from its neighbors. The α's are unknown coefficients that need to be estimated; the statistically significant ones in the second sum signify which temporal lags of each neighboring location provide useful information with regard to short-term forecasting. Finally, ε is assumed to be a martingale difference sequence with respect to the history of the time series up to time t − 1; hence, it is assumed a serially uncorrelated (but not necessarily independent) sequence and its variance is not restricted to be equal across regimes.

The threshold model in (2) essentially dictates abrupt transitions between traffic regimes whereas it would have been more natural to adopt (logistic) smooth transitions as in [1]. However, that would have complicated things considerably as estimation of the parameter related to the speed of transition can be problematic, see for instance [22] and [1]. Furthermore, in contrast to some previous approaches (for example [19]), (2) does not contain any moving average terms. Autoregressive models with sufficiently large autoregressive order p may approximate autoregressive moving average processes (as shown for instance in [23], Chapter 2), when some stationarity and invertibility conditions are satisfied by model coefficients. The majority of linear models applied to traffic data satisfy such conditions implicitly [19].

Another feature that separates our modeling approach to previously reported ones is that we do not consider simultaneous estimation of a system of equations with a common covariance matrix (each equation corresponding to a measurement location in the network) as in [24, 11, 25, 12, 6]. Although a system in general is expected to produce more efficient estimates compared with the equation-by-equation approach, such an estimation framework cannot be applied in practice when S is as large as in the applications we consider (where S may easily exceed 300).

2.4 Threshold estimation

The predictive model in (2) defines a threshold regression per measurement location, with an unknown number of regimes; a detailed discussion of such models is presented in [8]. Time of day is the threshold variable that defines subsamples in which the relationship is stable. In general, the threshold variable can be subject to a model building procedure which chooses the traffic variable for which linearity is more strongly rejected; such a procedure is presented in [1]. In [1], it is shown that the choice of the time index (which facilitates the forecasting procedure) is effective in capturing nonlinear dynamics compared with alternative threshold variables (traffic variables in levels and differences). In the application, the number of thresholds per day/segment combination ranges from 0 (which essentially means that a linear model is adequate for capturing traffic dynamics of the particular segment at that particular day) to 4, which corresponds to five traffic regimes. An example of five regimes is two tranquil periods during the night (e.g., from midnight to early morning and from late afternoon to midnight), a morning peak period, an afternoon peak and an intermediate, not heavily congested regime that separates the two peaks.

A linear specification is nested in the threshold regression depicted in (2). Therefore, a first step is to test the linearity of the model against the piecewise linear specification. If the null hypothesis is rejected, one may proceed to estimate a threshold regression and the residuals of the piecewise linear model should be tested for significant remaining nonlinearity that could be captured by adding a regime in the model. For the purposes of our application, we performed a battery of specification tests that include White's test [26], the sequential procedure proposed in [27] and F-tests for structural change [28]. In the next section, we present a sample of results based on the tests proposed in [29] and [30].

Alternatively, the decision on the number of regimes can be based on the values of an information criterion as in [31]. A sample of comparisons based on the Akaike information criterion (AIC) is shown in the numerical experiments that will follow. It should be noted that the aforementioned methods are not based on the out-of-sample predictive performance of the models, and some fine-tuning may be required by comparing via cross-validation a small set of plausible regime selections per xd,s. Indeed, in a number of studies, although nonlinear models are suggested by statistical tests, simpler linear alternatives have been found superior in terms of forecasting performance [32]. In this work, the primary focus is on forecasting performance and although we examine the results of specification tests and information criteria, the final decision on the adequate number of thresholds is based on the results of the forecasting experiment.

Effective methods for threshold estimation for given Rd,s have been proposed in [8, 31] and [33]. These methods apply in a univariate setting, that is, for each measurement location separately. Here, we employ a strategy that focuses on computational tractability; it is based on the multivariate threshold regression models (or threshold vector autoregressions, TVAR) presented in [34]. Instead of treating measurement locations as independent, we classify them to groups of small size (e.g., three to four neighboring locations) that are expected to be characterized by the same thresholds per day of the week. This is a plausible hypothesis that simplifies considerably the computations that need to be performed as it allows simultaneous threshold estimation for each group of locations. A system of threshold regressions is estimated for each group: it comprises of one predictive equation for each group member, whereas lagged traffic variables for each group member appear as explanatory covariates.

Classification of measurement locations that form each TVAR system is based on distance. Essentially, in the application, we divide measurement locations in non-intersecting groups of size 3: for the location, which is the closest in terms of geographical distance to an existing group, the next group is formed by choosing its two closest first-order neighbors (in the sense defined in [35]) or the first-order neighbor and the closest second-order neighbor. Then, we estimate one TVAR model for each group. This reduces substantially the number of threshold estimations (and computational time according to our experiments) that need to be performed compared with the alternative that estimates thresholds for each measurement location separately. We performed the same procedure with groups of size 4; results were almost identical and were not reported for brevity.

Univariate threshold regressions and TVAR models that are comprised of up to three linear regimes can be estimated in a straightforward manner in modern statistical/econometric software and have been found adequate for our purposes. To test for the need of additional regimes, we combined standard techniques with a priori knowledge of traffic dynamics. For instance, a four-regime model can be estimated by fixing the threshold that marks the beginning of the morning peak period (which is always identified in practice) and estimating a two-threshold model using the remaining part of the data. Similarly, a five-regime model can be estimated by dividing the day in two halves and estimating a two-threshold model for each half day. The first model estimates the thresholds that define the morning peak period, whereas the second are the ones that define the afternoon peak.

Each equation in the TVAR system approximates (as it contains information from a reduced number of neighbors) the predictive model (2) for the measurement location that appears on the left-hand side of (2). The underlying hypothesis is that the timing of traffic regimes does not differ significantly for measurement locations within each group and that omitted predictors do not influence significantly threshold estimation. TVAR systems for groups of five or more locations would contain a large proportion of coefficients which are not statistically significant and hence they were not considered.

In case one would like to avoid the aforementioned approximation schemes, one may implement a combination of the grid-search procedure à la Hansen [8] with penalized estimation, as described in Section 2.5. We would like to highlight that to the best of our knowledge, such combinations have not appeared in the literature until now. Our implementation of such a procedure led to results that are practically equivalent to the ones reported in Section 3 but was substantially more demanding in terms of computational time.

2.5 Penalized estimation for automatic model selection

Within regime rd,s, the model in (2) is a linear regression that in theory can be estimated using conventional methods (e.g., OLS or LAD). However, direct estimation may be inefficient as a fraction of the predictors will not contribute significantly to the predictive power of the model. In some cases, direct estimation may be problematic, with the variances of the estimated coefficients being unacceptably high, or even infeasible because of multicollinearity. This happens especially when p and Ns are large. Without the use of any procedure for model building, that is, a selection of significant predictors, the resulting model may be unstable in perturbations and worse yet, may result to undesirable output because of ill-conditioned matrix inversion.

We thus propose the use of a penalized estimation scheme within the context of threshold regression. In previous studies, model building was either based on exploratory analyses (e.g., plots of the estimated (partial) autocorrelations), see [24, 11] and [25], or on information criteria such as AIC as in [36] and [12]. The former method is practically infeasible for large S, whereas application of the latter through an automatic (for instance general-to-specific [37]) sequential procedure is very demanding in terms of computational power.

In this work, estimation and model selection per regime take place simultaneously for each location using LASSO penalized regression which enforces sparse solutions in problems with large numbers of predictors [38]. LASSO is a constrained version of ordinary estimation methods and at the same time a widely used automatic model building procedure. Compared with classical variable selection methods, such as subset selection, the LASSO has two advantages. First, the selection procedure is continuous and hence more stable than the subset selection which is discrete [39]. Second, the LASSO is computationally feasible for high-dimensional data. In contrast, computation in subset selection is combinatorial and not feasible when the number of predictors is very large [40].

Given a loss function g(.), coefficient estimation within regime rd,s in (2) is performed by minimizing the criterion

  • display math(3)

We consider two variants of the estimation procedure, namely, one in which the sum of absolute residuals is minimized, henceforth referred to as LAD-LASSO, and a least squares objective, which we call conventional LASSO. In the former case,

  • display math

whereas in the latter

  • display math

when historical traffic data from Dw past weeks are available.

The second and third components of the sum in (3) are penalty terms which shrink the coefficients toward the origin and tend to discourage models with large numbers of marginally relevant predictors. The intercept αd,s is ignored in the LASSO penalty, whose strength is determined by the positive tuning constants λ. It is worth emphasizing that the aforementioned criterion performs the adaptive LASSO method which has been shown to be more effective than ordinary LASSO in [41]. In what is presented next, we follow the procedure justified theoretically in [42] and apply the adaptive LASSO as a two-step procedure: in the first step, coefficient estimates are derived using slight penalization and lambdas are calculated as inversely proportional to these estimates; in the second step, (3) is minimized given the lambdas from the first step.

The use of penalized estimation allows considerable flexibility with regard to the specification of matrices that define neighboring relationships in a road network. Using a modeling framework similar to the one adopted in previous studies, (e.g., [24, 11, 25, 12, 6]) we would have to define different matrices per regime and per time lag of the model at a pre-processing stage. The adaptive LASSO procedure allows to automate that process: the input to the model need not be regime nor location-specific predictors. For this reason, the number of input coefficients is fixed across regimes and days and in (2), we can use p and Ns instead of pd,s and inline image.

One expects that, depending on the characteristics of traffic data and the density of measurement locations in a road network, there is a maximum time lag and a maximum number of neighbors, above which additional predictors in a piecewise linear regression model as the one in (2), do not contribute in terms of out-of-sample predictive ability. When p and Ns take very large values, the estimation problem becomes harder to solve and the finite sample performance of the estimator degrades slightly; consequently, the out-of-sample predictive ability weakens.

3 Numerical experiments and analyses

  1. Top of page
  2. Abstract
  3. 1 Introduction
  4. 2 Modeling traffic variables using threshold regressions and penalized estimation
  5. 3 Numerical experiments and analyses
  6. 4 Concluding remarks
  7. Acknowledgements
  8. References

3.1 The data

The proposed modeling strategy is implemented using traffic speed data from three metropolitan areas in the west coast of USA (Network I), in the east coast of USA (Network II), and in South Pacific (Network III). Network I is a grid-like freeway network that consists of 6414 nodes and 7002 small links with an average length of 0.18 miles. A total of 857 inductive loop detectors are installed on 837 links and cover evenly the entire network (Figure 1). For links with multiple measurement locations, the detectors with the lowest percentage of missing data were selected. Our analysis is based on the archived 5-minute historical data for April, May, and June of 2010. Model calibration was performed using data collected until June 5, whereas data from June 6 to June 13 were used solely for evaluation of the forecasting performance. Missing data (about 2%) in the calibration dataset were imputed using a linear interpolation scheme.

image

Figure 1. Detector locations in Network I.

Download figure to PowerPoint

Network II is a freeway network that covers five boroughs of a large metropolitan area in the east coast of USA. The average length of the 153 road segments is about 2.43 miles. The raw data feed reports speed information for each segment every 1 minute, aggregated from measurements of traffic sensors installed on these segments. The live data was received and archived for the 3-month period of February, March, and April of 2011. For the purposes of our analyses, the raw 1-minute data were aggregated to 5-minute average values. The 1 week data from April 1 to April 7 was reserved for testing, and data prior to April 1 were used for model calibration.

Because of data transmission and sensor related problems, the percentage of missing data at Network II was high, about 26% on average per measurement location for the 3-month historical data. The patterns of missing measurements were non-random, with substantially increased percentages during weekends, especially on Sundays. A more complex data imputation scheme, namely k-nearest neighbor imputation (k-NN), which has been proposed in [43] for similar situations, was employed in the calibration dataset: for each detector with missing values, we find the k-NN using a Euclidean metric, confined to rows for which speed data are not missing; missing values for each measurement location are imputed by averaging the corresponding non-missing elements of its 10 nearest detectors [44].

Network III is a linear freeway stretch which leads to a major city in South Pacific. The entire 30.45-km long stretch was divided to 39 segments with average length of 0.78 km. The original traffic data consisted of information on average speed for each of the 39 segments, reported every 1 minute; these data were first aggregated from device/site observations within each segment and then temporally aggregated to 5-minute intervals. The average percentage of missing data (imputed using linear interpolation in the calibration dataset only) was about 1.80% per segment. The analysis uses data collected during August and September of 2010. Model calibration was performed using data collected until September 14, whereas data from September 15 to September 21 were used solely for evaluation of the forecasting performance. Figure 2 shows the networks of the three study regions. The blue crossings in Network III mark the boundaries of the link segments.

image

Figure 2. The three networks of the study: Network I (left) is a large freeway network from a metropolitan area in the west coast of USA; Network II (center) covers part of a freeway network from a metropolitan area in the east coast of USA, whereas Network III (right) is a freeway stretch that leads to a major city in South Pacific.

Download figure to PowerPoint

The daily profiles for each network were calculated using weighted averages of measurements taken during the model-calibration period. Specifically, a grid-search procedure was implemented for a set of initial weights for the most recent week of traffic data (which ranged from 0.2 to 0.9) and different numbers of training weeks (which ranged from 5 to 8). The weights decline geometrically on a weekly basis for information in the more distant past; this is essentially an exponential smoothing scheme [45]. The combination of initial weight and number of training weeks (found via cross-validation which uses part of the calibration dataset) which maximizes forecasting performance of μ based on mean absolute percentage error (MAPE), is chosen, and the residuals are modeled using the transient model (2).

Figure 3 depicts the estimated μ's for four locations in Network III. Although one would expect similar traffic patterns for the sequence of segments in the linear network of the study, substantial differences can be observed. For instance, although speeds drop during both the morning and the afternoon peak periods in weekdays in segment #22; in segment #12, only the morning peak can be observed. On the other hand, in segment #32, significant speed drops occur only during the afternoon peak period for weekdays. Finally, segment #2 displays a more noisy pattern which is substantially different compared with the other segments. Traffic dynamics during the weekend are different to the ones observed in weekdays, in accordance with a priori expectations.

image

Figure 3. Speed profiles in four segments of Network III. The larger the segment id, the closer it is located to the city. The time index starts on Wednesday midnight with 288 intervals per day; days of the week are separated by dotted vertical lines.

Download figure to PowerPoint

Figure 3 shows that the time-series display asymmetries that are indicators of nonlinear dynamics in mean. For instance, in segment #22, one may clearly observe the asymmetric W-shaped pattern of speed in its daily cycle. A careful observation of the raw data reveals that dynamics are heteroskedastic (nonlinear in variance), which is consistent with the observations in [46]. Figure 4 shows a subsample of the observed deviations from the historical daily profiles for a few selected segments. Some typical characteristics of nonlinear dynamics are evident in these series as well.

image

Figure 4. Deviations from the historical daily profiles: (a) detector #401215 from Network I, (b) detector #258 from Network II, and (c) #6149 from Network III.

Download figure to PowerPoint

The three test data sets are taken from expressway networks, although Network II is a network of urban expressways with conditions closer to major urban arterial networks. The primary difference between these and typical urban road networks, however, is the absence of traffic signals in these networks. Indeed, on urban road networks, with traffic lights, there is an inherent oscillation in the traffic data that can appear because of the timing of the green and red phases of the traffic lights. In such cases, traffic data is therefore more volatile and prediction accuracies are lower than in the networks presented here. From our observations, the decrease in accuracy on signalized urban road networks can be roughly 10–20%. The advantage of the method presented here is that it is scalable to large urban networks, and the use of the LASSO procedure makes the method robust to the particularities of the network structure.

3.2 Regime identification

In all computations that will follow, we take p = 10 and max(Ns) = 20 (at most 10 upstream and 10 downstream neighbors per measurement location). These choices comply with preliminary analyses which showed that significant predictors are within this spatio-temporal range in the three networks of our study. A first question is whether deviations from the historical profiles can be adequately described by linear models. Figure 5 compares p-values of tests on remaining nonlinearity in the residuals of linear versus three-regime models in Network III. It can be observed that the null hypothesis of linearity is always strongly rejected in the former case, whereas in the latter, the vast majority of p-values are above conventional significance levels (hence, the tests do not provide evidence against the null hypothesis of linear dynamics).

image

Figure 5. (a) Chan's test [29] and (b) Teräsvirta's test [30] on nonlinear dynamics for Network III. Each graph corresponds to a day of the week starting from Sunday (top left). Each circle corresponds to a segment; p-values for tests on linear (three-regime) models are shown on the x-axis (y-axis). Dotted lines depict conventional significance levels at 0.05 and 0.01.

Download figure to PowerPoint

Figure 6 depicts the values of the AIC for single-regime versus three-regime models for measurement locations located at two major freeways of Network I, and Figure 7 presents the corresponding values for Network III (computations are based on data for Wednesdays in both cases). One may observe that the AIC for the two-threshold scenarios for all links are smaller than that for the zero-threshold ones, indicating a preference for two-threshold models in virtually all locations. These patterns are typically observed during weekdays in general.

image

Figure 6. AICs of three-regime models versus single-regime models for I-80W (left) and I-80E (right) in Network I.

Download figure to PowerPoint

image

Figure 7. AICs of three-regime models versus single-regime models for Network III.

Download figure to PowerPoint

The thresholds identified by TVAR-based estimation for the road segments of Figure 6 are shown in Figure 8 as a function of location. Specifically, the y-axis marks the two time thresholds, whereas the x-axis identifies the median mile marker of detectors in each TVAR system: detectors with smaller absolute postmile values are closer to the center of the city. Two insights can be gleaned from the estimated thresholds. The first is that traffic towards the city center (westbound, from roughly mile marker 13 and beyond) has a distinct morning peak (lasting from about 6 AM till 10 AM), whereas traffic away from the center of the city, from roughly mile 13 and beyond, has a distinct afternoon peak (from 4PM until 6PM). Similarly, from the city center westbound to about 13 miles east of the city, the estimated thresholds define a wider traffic regime, which lasts from about 6AM to 6PM. Analogous results were observed for northbound Motorway 1 in Network III. In Figure 9, one may clearly observe that close to the city (up to 15 km from the center of the city), the peak period in three-regime models lasts from 6AM to 8PM. Moving away from the city, the peak period shrinks dramatically in terms of duration.

image

Figure 8. Temporal thresholds for three-regime models estimated by TVAR on a subset of Network I: Westbound, towards the city (left) and Eastbound, towards the suburbs, away from the city (right).

Download figure to PowerPoint

image

Figure 9. Temporal thresholds for three-regime models estimated by TVAR on Network III.

Download figure to PowerPoint

Although specification tests and information criteria suggest that three regimes describe traffic dynamics adequately, we examined whether more complex models could provide superior forecasting performance. The results we obtained were consistently negative: forecasting performance actually worsened using four and five regime models. Moreover in such cases, we frequently observed spatio-temporal instability on the magnitudes of thresholds. Figure 10 shows a characteristic example in which the thresholds that define the afternoon peak period are unstable in part of Network I (although the ones that define the morning peak behave in accordance with prior expectations). It is noteworthy that three regimes have been found adequate in [4] and [1]; these studies applied different methods than the ones presented here in different types of data (e.g., data on urban arterials at different levels of temporal aggregation).

image

Figure 10. Temporal thresholds for five-regime models estimated by TVAR on a subset of Network I: I-80 Westbound, towards the city midnight to noon (left) and noon to midnight (right).

Download figure to PowerPoint

3.3 Penalized estimation using adaptive (LAD) LASSO

This section uses the results of the previous analysis and focuses on the implications of three-regime models. Figure 11 shows the distributions of adjusted R2 (the proportion of explained variability penalized with a term that is proportional to the parameters used) for models estimated via adaptive LASSO across links of the three Networks, for different days of the week. Medians of adjusted R2 are located around 0.8 for weekdays, indicating satisfactory model fit, whereas substantially lower values are observed during weekends. One is tempted to guess that forecasting performance will also be better during weekdays, but as the next section will show, this is not the case: model fit statistics measure the proportion of variability captured by the predictive models (penalized by the number of model coefficients in our case) and traffic data are substantially more variable in weekdays than in weekends. The adjusted R2 also indicates some outlying links for which our models do not perform satisfactorily. Again, this does not necessarily mean that forecasting performance will be unsatisfactory in all such cases, as some of them are links in which traffic data display less variability.

image

Figure 11. Distributions of the adjusted R2 for different days of the week: Network I (left), Network II (center), and Network III (right). Models are estimated via adaptive least absolute shrinkage and selection operator.

Download figure to PowerPoint

Figure 12 shows some typically observed differences in the magnitude and number of nonzero coefficients for adaptive LASSO versus OLS estimation. In this example, a model from a single link is examined; estimated coefficients are shown on the y-axis and the spatial locations of the predictors included in the model are denoted by their position on the x-axis, for each time lag. In this and the following figures, true spatial location (in terms of miles or kilometers) is not represented; position along the x-axis reflects only the successive nature of the sensors. The link for which the model is estimated is always denoted as position 0, and positive (negative) values on the x-axis represent downstream (upstream) detectors. It is clear that OLS produced more nonzero coefficients both spatially and temporally, in terms of lag orders. On the other hand, the adaptive LASSO procedure produced a far more parsimonious model. In particular, one observes a rather heavy reliance of the OLS-estimated model on lags three and four, which is not found in the LASSO-estimated model. We shall discuss stability implications of this later in this section.

image

Figure 12. Coefficients estimated via (a) LAD-LASSO and (b) OLS for the peak period (the second regime in a three-regime model) on a link from Network I. Positive (negative) x-values represent downstream (upstream) locations.

Download figure to PowerPoint

Figure 13 provides a more detailed examination of the coefficients produced by adaptive LAD-LASSO. For selected links from Network I, the threshold identification procedure indicated a single peak period that lasts from 5:30 AM to 10 AM. Off-peak for these links is hence defined as the rest of the day. We illustrate the magnitude of coefficients for both periods. As before, the resulting models are parsimonious; lags three and four are used sparingly and predominantly for the more congested peak periods.

image

Figure 13. Coefficients estimated by LAD-LASSO for two links from Network I. Positive (negative) x-values represent downstream (upstream) locations.

Download figure to PowerPoint

Interestingly, the procedure is able to identify predictors which correspond to traffic engineering theory. Specifically, one observes that in the congested peak period, the predictors selected by adaptive LASSO include a considerable number of downstream links and fewer, smaller in absolute magnitude, coefficients that correspond to upstream locations. This follows traffic engineering theory and the presence of downstream queue build up. On the contrary, LASSO-estimated models for the off-peak period include upstream links rather than downstream ones, again conforming to traffic engineering principles. In addition, higher-lag orders are present primarily in the regime that corresponds to the peak period. This is also a result that is in accordance with prior expectations; traffic dynamics in the congested period are more complex and require a larger number of spatio-temporal predictors, whereas the off-peak dynamics are described by simpler models.

Figure 14 illustrates the same analysis for two links from Network III. The results are similar to those observed in Network I. In this case, congestion is not as severe as in Network I; one does observe that both upstream and downstream links contribute to the estimated model in the peak periods, although the balance tends towards downstream links. In the off-peak periods, upstream links are more significant in terms of predictive power. Also, as before, higher-order lags are useful in the regimes that correspond to congested traffic conditions but are suppressed by the penalization procedure in the uncongested regimes.

image

Figure 14. Coefficients estimated by LAD-LASSO for two links from Network III. Positive (negative) x-values represent downstream (upstream) locations.

Download figure to PowerPoint

In models with large numbers of strongly correlated predictors, the design matrix on which estimation is based can be ill-conditioned. In such cases, conventional estimation methods are unable to invert the design matrix, or in some cases provide inverses that lead to inaccurate estimates. A clear indication of an ill-conditioned matrix is the value of its condition number. Figure 15 shows histograms of the condition numbers of the matrices of predictors included in the models for Networks II and III. One observes unacceptably high condition numbers when conventional OLS is applied, whereas in all cases, adaptive LASSO eliminates very high condition numbers.

image

Figure 15. Histograms of the condition numbers of the matrices of predictors with nonzero coefficients: OLS versus LAD-LASSO (off-peak period).

Download figure to PowerPoint

The impact of multicollinearity is visible not only via the condition number of the matrix of predictors but also more importantly in the estimated coefficients themselves. Specifically, one can on occasion observe very large (in absolute value) OLS coefficients. Figure 16 illustrates the estimated coefficients that correspond to the second regime (peak period) for selected links of Network II: OLS coefficients can be up to 500 times as large as those estimated by adaptive LASSO. Although during the peak period, we did not observe as extreme condition numbers as the ones depicted in Figure 15 (multicollinearity was reduced due to more volatile traffic dynamics) what one observes in Figure 16 is a direct consequence of the inversion of an ill-conditioned matrix and would lead obviously to highly erroneous results. Such cases are entirely eliminated by penalized estimation.

image

Figure 16. Estimated coefficients for the peak period in Network II: OLS estimation (left) versus LAD-LASSO (right).

Download figure to PowerPoint

3.4 Forecasting performance

Tables 1-3 provide overall summary statistics of the forecast accuracy using 1 MAPE, so that 100% means that the forecasts are exactly correct 100% of the time in the validation dataset. MAPE was chosen here as it is a more appropriate measure of accuracy when forecasted variables are nonstationary [47]. We observe that LAD-LASSO outperforms both OLS and conventional LASSO. This finding complies with [17] and findings from recent studies (see for instance [48]) which show that minimization of LADs of residuals leads to superior forecasting performance for skewed variables that may contain outliers.

Table 1. Overall prediction performance (1 MAPE) for different estimation methods in Network I.
Step (minute)OLSL2-LASSOLAD-LASSO
1st Qu.Median3rd Qu.Mean1st Qu.Median3rd Qu.Mean1st Qu.Median3rd Qu.Mean
  1. OLS, ordinary least square; LASSO, least absolute shrinkage and selection operator; LAD, least absolute deviation; Qu, quarter.

596.8198.5799.4096.6896.9298.6299.4296.7797.2598.7499.4797.16
1095.8398.2199.2695.2695.9598.2699.2895.3796.3798.4199.3495.87
1595.1998.0399.1894.2495.3298.0799.2194.3795.8698.2399.2794.97
3094.3797.8299.1092.7094.4197.8799.1292.7395.0398.0399.1993.37
4593.9797.7399.0792.0493.9997.7799.0992.1294.6097.9399.1692.64
6093.7597.6899.0591.6193.7797.7299.0691.6594.2897.8699.1392.21
Table 2. Overall prediction performance (1 MAPE) for different estimation methods in Network II.
Step (minute)OLSL2-LASSOLAD-LASSO
1st Qu.Median3rd Qu.Mean1st Qu.Median3rd Qu.Mean1st Qu.Median3rd Qu.Mean
  1. OLS, ordinary least square; LASSO, least absolute shrinkage and selection operator; LAD, least absolute deviation; Qu, quarter.

590.3395.5698.1089.7690.6895.7298.1790.0191.6096.1298.3491.01
1086.4194.1497.5785.2887.1194.4397.6785.8688.3794.9697.8987.26
1583.8593.2597.2782.0984.8693.6397.4383.0986.2194.2697.6684.65
3078.7391.5996.7673.7380.8592.3597.0077.9882.1792.9697.2379.55
4575.4390.5996.4564.8178.9191.6796.8075.2279.9892.2296.9976.62
6073.2390.0396.3055.4577.8491.3596.7073.4478.5491.7596.8374.63
Table 3. Overall prediction performance (1 MAPE) for different estimation methods in Network III
Step (minute)OLSL2-LASSOLAD-LASSO
1st Qu.Median3rd Qu.Mean1st Qu.Median3rd Qu.Mean1st Qu.Median3rd Qu.Mean
  1. OLS, ordinary least square; LASSO, least absolute shrinkage and selection operator; LAD, least absolute deviation; Qu, quarter.

594.2597.0898.7294.5894.3097.1098.7394.6194.5497.2498.7994.73
1093.2196.7098.5893.0293.2796.7398.5993.0693.6196.9398.6893.29
1592.6196.5098.4992.0192.7096.5398.5092.0593.1096.7298.6092.29
3091.2896.1298.3590.0691.3996.1898.3790.1291.8396.3798.4590.37
4590.4595.9398.2889.0190.6296.0098.3189.0890.9496.1698.3889.30
6090.0395.8298.2388.3890.2095.8998.2688.4590.2796.0198.3088.59

The benefits of the adaptive LASSO approach are most evident in Network II (where the symptoms of multicollinearity were strongest) for the predictions at the more distant forecasting horizons, such as 45 and 60 minutes into the future. As the forecast horizon increases, the benefits are larger, because forecast errors from unstable models (because of multicollinearity) propagate across different horizons. It should be stressed that the forecasting performance of OLS in Tables 2 and 3 was based on reduced sample sizes ( 943 / 1078 for Network II with 1078, the number of measurement locations times the days of the week and 252 / 273 for Network III); as for the omitted measurement locations, the set of predictors was strongly correlated and conventional OLS could not provide any estimates at all.

Results in Tables 1-3 correspond to three-regime threshold models which performed best (according to mean accuracy across all measurement locations per Network) compared with linear, two-regime and four-regime models. Forecast accuracy degrades if one adopts a single-regime model but not dramatically: in all the experiments performed here, one-step ahead mean accuracy from linear models remained above 88% when estimation was performed using adaptive (LAD) LASSO. Performance of single-regime models drops more during the peak period; as an example, if one uses the LAD-LASSO coefficients from the linear model to produce forecasts in Network III during the estimated peak period, he or she obtains an accuracy of 90.1% as opposed to 94.3% when she uses the regime-specific coefficients. This difference is magnified as the forecast horizon increases.

For all forecasting horizons in the three experiments we performed, the empirical distributions of accuracies are negatively skewed, with means smaller than the corresponding medians. Means and medians are taken across measurement locations; the observed long left tails of the empirical distributions are essentially related to some ‘outlying’ locations for which the forecasting performance is unsatisfactory. In some of these locations, such performance was caused by unstable estimation as the one depicted in Figure 16. The aforementioned phenomenon is best observed by comparing mean accuracies across different estimation methods in Network II (as means are not robust to outliers).

Figure 17 shows distributions of forecasting accuracy across links of Network III for different days of the validation period and different forecast horizons. Forecast accuracy of the benchmark method that is solely based on the estimated daily profiles and neglects information on the recent past is also presented. Median accuracies for weekends are above 95%; median accuracies drop during weekdays (although they are always above 90%) and the corresponding distributions are more dispersed. One also observes how accuracies reduce as the forecast horizon increases. We produced analogous results for Networks I and II but the findings are very similar and are omitted for brevity.

image

Figure 17. Distributions of forecasting accuracy across links of Network III for different days of the validation period and different forecast horizons. The validation period started on Wednesday, September 15, 2010.

Download figure to PowerPoint

Time series models (albeit simpler than the ones we propose) have been applied in practice for missing traffic-data imputation [49]. Given the observed accuracies in the three applications of the model, one could use such forecasts to impute (also called ‘now cast’) missing traffic data in the testing datasets. This step has not been performed in our application.

4 Concluding remarks

  1. Top of page
  2. Abstract
  3. 1 Introduction
  4. 2 Modeling traffic variables using threshold regressions and penalized estimation
  5. 3 Numerical experiments and analyses
  6. 4 Concluding remarks
  7. Acknowledgements
  8. References

In this paper, we have presented a number of improvements to time series-based traffic prediction to accomplish several objectives. On the one hand, the intrinsic nonlinearity is handled explicitly in an effective, efficient, and automatic manner. On the other hand, issues of parsimoniousness of the model, stability of its coefficients, and robustness to collinearity and outliers are handled by the methods we propose.

The methods described here also allow for answering a number of important research questions about real-time traffic data, such as whether or not statistically significant distinct traffic states exist within a day, and if so, how many. We can ask and answer questions as to how these traffic states vary with location, day of the week, or seasonality. In addition to issues of temporal regimes, our methods allow for answering research questions about the spatial and temporal correlation of traffic across neighboring locations on a network. As before, this analysis can go a step further by assessing the dependencies of these correlations on time of the day, location, seasonality, and so on. Interestingly, we demonstrate that our method reproduces some known traffic engineering model-based phenomena such as queue spill back through purely statistical means.

We have demonstrated via tests on three real networks the significant benefit of using the approaches as far as the stability of the results are concerned. Indeed, without the use of our method but rather employing traditional OLS, many models on the congested urban expressway network we studied were unsolvable by standard software. The use of our method resolved that issue in almost all cases examined.

It is worth emphasizing that under the adopted framework, multiperiod forecasting involves the use of a chain rule to generate forecasts at longer horizons, based on a dynamic model for data observed at a high-frequency level (e.g., 5 minutes). Under this ‘iterated’ approach model, specification is the same across all forecast horizons; only the number of iterations changes with the horizon. An alternative approach is to estimate models for traffic variables measured h-periods ahead as a function of current information [50]. In this case, the forecasting models and their estimates would typically vary across different forecasting horizons.

Our forecasting methodology is illustrated using loop detector data which may contain a large percentage of missing values and erroneous measurements [51, 52]. Recently, there has been increased interest on data fusion methods which combine effectively data from multiple sources (including GPS data from probe vehicles and data from video cameras [53]). Data fusion aims at improved quality and increased spatial coverage [52], which are important factors that influence the capabilities of traffic prediction tools. The forecasting models we present can use such data as long as they are updated at fixed time intervals.

As a final note, we would like to highlight that our proposed method is readily able to be automated, thereby making it appropriate for large-scale problems.

References

  1. Top of page
  2. Abstract
  3. 1 Introduction
  4. 2 Modeling traffic variables using threshold regressions and penalized estimation
  5. 3 Numerical experiments and analyses
  6. 4 Concluding remarks
  7. Acknowledgements
  8. References
  • 1
    Kamarianakis Y, Gao HO, Prastacos P. Characterizing regimes in daily cycles of urban traffic using smooth-transition regressions. Transportation Research Part C 2010; 18:821840.
  • 2
    Karlaftis MG, Vlahogianni EI. Statistical methods versus neural networks in transportation research: differences, similarities and some insights. Transportation Research Part C 2011; 19:387399.
  • 3
    Vlahogianni EI, Golias JC, Karlaftis MG. Short-term traffic forecasting: overview of objectives and methods. Transport Reviews 2006; 24:533558.
  • 4
    Vlahogianni EI, Karlaftis MG, Golias JC. Temporal evolution of of short-term urban traffic flow: a nonlinear dynamics approach. Computer Aided Civil and Infrastructure Engineering 2008; 23:536548.
  • 5
    Yao ZS, Shao CF, Xiong ZH. A study on short-term traffic flow forecasting based on a nonlinear time-series model. China Civil Engineering Journal 2007; 41:104109.
  • 6
    Min W, Wynter L. Real-time road traffic prediction with spatio-temporal correlations. Transportation Research Part C 2011; 19:606616.
  • 7
    Cryer JD, Chan K. Time Series Analysis, 2nd ed. Springer: New York, USA, 2008.
  • 8
    Hansen BE. Sample splitting and threshold estimation. Econometrica 2000; 68:575603.
  • 9
    Chan WS, Ng MW. Robustness of alternative non-linearity tests for SETAR models. Journal of Forecasting 2004; 23:215231.
  • 10
    Tsay RS. Testing and modeling threshold autoregressive processes. Journal of the American Statistical Association 1989; 84:231240.
  • 11
    Kamarianakis Y, Prastacos P. Space-time modeling of traffic flow. Computers & Geosciences 2005; 31:119133.
  • 12
    Min X, Hu J, Zhang Z. Urban traffic network modeling and short-term traffic flow forecasting based on GSTARIMA model. IEEE Conference on Intelligent Transportation Systems, Madeira Island, Portugal, 2010; 15351540.
  • 13
    Huang H, Hsu N, Theobald D, Breidt FJ. Spatial LASSO with applications to GIS model selection. Journal of Computational and Graphical Statistics 2010; 19:963983.
  • 14
    Gao Y, Sun S, Shi D. Network-scale traffic modeling and forecasting with graphical LASSO. Advances in Neural Networks-ISSN 2011, Lecture Notes in Computer Science; 6676:151158.
  • 15
    Hofleitner A, El Ghaoui L, Bayen A. Online least-squares estimation of time varying systems with sparse temporal evolution and application to traffic estimation. 50th IEEE Conference on Decision and Control, Orlando, Florida, 2011; 25952601.
  • 16
    Wan K, Kornhauser AL. Link-data-based approximation of path travel time distribution with Gaussian copula estimated through LASSO. Transportation Research Board Annual Meeting, Washington, DC 20001 USA, 2010; 25 (1-25). Paper #10–2769.
  • 17
    Dielman TE. A comparison of forecasts from least absolute value and least squares regression. Journal of Forecasting 1986; 5:189195.
  • 18
    Koopman SJ, Ooms M. Forecasting economic time series using the unobserved components time series models. The Oxford Handbook of Economic Forecasting, Clements and Hendry (eds). Oxford University Press: New York, USA, 2011; 124162.
  • 19
    Williams BM, Hoel LA. Modeling and forecasting vehicular traffic flow as a seasonal ARIMA process: theoretical basis and empirical results. Journal of Transportation Engineering 2003; 129:664672.
  • 20
    Qu L, Li L, Zhang Y, Hu J. PPCA-based missing data imputation for traffic flow volume: a systematical approach. IEEE Transactions on Intelligent Transportation Systems 2009; 10:512522.
  • 21
    Boto-Giralda D, Diaz-Pernas FJ, Gonzalez-Ortega D, Diez-Higuera FJ, Anton-Rodriguez M, Martinez-Zarzuela M, Torre-Diez I. Wavelet-based denoising for traffic volume time series forecasting with self-organizing neural networks. Computer-Aided Civil and Infrastructure Engineering 2010; 25:530545.
  • 22
    van Dijk D, Teräsvirta T, Franses PH. Smooth transition autoregressive models-a survey of recent developments. Econometric Reviews 2002; 21:147.
  • 23
    Franses PH, van Dijk D. Nonlinear Time Series Models in Empirical Finance. Cambridge University Press: Cambridge, UK, 2000.
  • 24
    Kamarianakis Y, Prastacos P. Forecasting traffic flow conditions in an urban network: comparison of multivariate and univariate approaches. Transportation Research Record-Journal of the Transportation Research Board 2003; 1857:7484.
  • 25
    Lin S, Huang H, Zhu D, Wang T. The application of space-time ARIMA model on traffic flow forecasting. IEEE Proceedings of the Eighth International Conference on Machine Learning and Cybernetics, Baoding, China, 2009; 34083412.
  • 26
    Lee TH, White H, Granger CWJ. Testing for neglected nonlinearity in time series models. Journal of Econometrics 1993; 56:269290.
  • 27
    Strikholm B, Teräsvirta T. A sequential procedure for determining the number of regimes in a threshold autoregressive model. Econometrics Journal 2006; 9:472491.
  • 28
    Andrews DWK, Ploberger W. Optimal tests when a nuisance parameter is present only under the alternative. Econometrica 1994; 62:13831414.
  • 29
    Chan KS. Percentage points of likelihood ratio tests for threshold autoregression. Journal of Royal Statistical Society Series B 1990; 3:691696.
  • 30
    Teräsvirta T, Lin CF, Granger CWJ. Power of the neural network linearity test. Journal of Time Series Analysis 1993; 14:209220.
  • 31
    Gonzalo J, Pitarakis J. Estimation and model selection-based inference in single and multiple threshold models. Journal of Econometrics 2002; 110:319352.
  • 32
    Teräsvirta T. Forecasting economic variables with nonlinear models. Handbook of Economic Forecasting 2006; 1:413457.
  • 33
    Bai J, Perron P. Computation and analysis of multiple structural change models. Journal of Applied Econometrics 2003; 18:122.
  • 34
    Tsay RS. Testing and modeling multivariate threshold models. Journal of the American Statistical Association 1998; 93:11881202.
  • 35
    Cheng T, Haworth J, Wang J. Spatio-temporal autocorrelation of road network data. Journal of Geographical Systems 2011. DOI: 10.1007/s10109-011-0149-5.
  • 36
    Borovkova SA, Lopuhaä HP, Ruchjana BN. Consistency and asymptotic normality of least squares estimators in generalized space-time models. Statistica Neerlandica 2008; 62:482508.
  • 37
    Krolzig HM, Hendry DF. Computer automation of general-to-specific model selection procedures. Journal of Economic Dynamics and Control 2001; 25:831866.
  • 38
    Tibshirani R. Regression shrinkage and selection via the LASSO. Journal of the Royal Statistical Society Series B 1996; 58:267288.
  • 39
    Zhao P, Yu B. On model selection consistency of LASSO. Journal of Machine Learning Research 2006; 7:25412563.
  • 40
    Efron B, Hastie T, Johnstone I, Tibshirani R. Least angle regression. Annals of Statistics 2004; 32:407499.
  • 41
    Zou H. The adaptive LASSO and its oracle properties. Journal of the American Statistical Association 2006; 101:14181429.
  • 42
    Wang H, Li G, Jiang G. Robust regression shrinkage and consistent variable selection through the LAD-LASSO. Journal of Business and Economic Statistics 2007; 25:347355.
  • 43
    Liu Z, Sharma S, Datla S. Imputation of missing traffic data during holiday periods. Transportation Planning and Technology 2008; 5:525544.
  • 44
    Troyanskaya O, Cantor M, Sherlock G, Brown P, Hastie T, Tibshirani R, Botstein D, Altman RB. Missing value estimation methods for DNA microarrays. Bioinformatics 2001; 17:520525.
  • 45
    Gardner ES. Exponential smoothing: the state of the art. Journal of Forecasting 1985; 4:128.
  • 46
    Kamarianakis Y, Kanas A, Prastacos P. Modeling traffic flow volatility dynamics in an urban network. Transportation Research Record-Journal of the Transportation Research Board 2005; 1923:1827.
  • 47
    Hyndman RJ, Koehler AB. Another look at measures of forecast accuracy. International Journal of Forecasting 2006; 22:679688.
  • 48
    Kamarianakis Y, Gao HO, Holmén B, Sonntag D. Robust modeling and forecasting of diesel particle number emissions rates. Transportation Research Part D 2011; 16:435443.
  • 49
    Zhong M, Sharma S. Matching hourly, daily, and monthly traffic patterns to estimate missing volume data. Transportation Research Record: Journal of the Transportation Research Board 2006; 1957:3242.
  • 50
    Pesaran MH, Pick A, Timmermann AG. Variable selection and inference for multi-period forecasting problems. CEPR Discussion Paper No. DP7139, 2009.
  • 51
    Bickel PJ, Chen C, Kwon J, Rice J, van Zwet E, Varaiya P. Measuring traffic. Statistical Science 2007; 22:581597.
  • 52
    Ou Q, van Lint H, Hoogendoorn SP. Fusing heterogeneous and unreliable data from traffic sensors. Interactive Collaborative Information Systems, Vol. SCI 281, Babuska R, Groen FCA (eds). Springer-Verlag: Berlin Heidelberg, 2010; 511545.
  • 53
    van Lint JWC, Hoogendoorn SP. A robust and efficient method for fusing heterogeneous data from traffic sensors on freeways. Computer-Aided Civil and Infrastructure Engineering 2010; 25:596612.