M‐quantile regression shrinkage and selection via the Lasso and Elastic Net to assess the effect of meteorology and traffic on air quality

In this work, we intersect data on size‐selected particulate matter (PM) with vehicular traffic counts and a comprehensive set of meteorological covariates to study the effect of traffic on air quality. To this end, we develop an M‐quantile regression model with Lasso and Elastic Net penalizations. This allows (i) to identify the best proxy for vehicular traffic via model selection, (ii) to investigate the relationship between fine PM concentration and the covariates at different M‐quantiles of the conditional response distribution, and (iii) to be robust to the presence of outliers. Heterogeneity in the data is accounted by fitting a B‐spline on the effect of the day of the year. Analytic and bootstrap‐based variance estimates of the regression coefficients are provided, together with a numerical evaluation of the proposed estimation procedure. Empirical results show that atmospheric stability is responsible for the most significant effect on fine PM concentration: this effect changes at different levels of the conditional response distribution and is relatively weaker on the tails. On the other hand, model selection allows to identify the best proxy for vehicular traffic whose effect remains essentially the same at different levels of the conditional response distribution.


INTRODUCTION
Between 2012 and 2015, an innovative system for urban air-quality monitoring was put in place in Perugia (Italy).Aerosol and gas measurements with very high spatiotemporal resolution were obtained through proper instruments integrated on one of the cabins of a public conveyance of the town: the Minimetro.The Minimetro path, about 3 km long, starts from a large suburban parking area, crosses various heavy traffic roads, passes through an urban park, residential areas, tunnels, and climbs a hill to finally reach the old town centre.The different urban scenarios located along the path provide, therefore, information about different environmental situations, while the constant and low speed of the cabins allows for an easy and precise monitoring.Castellini et al. (2014) provide a thorough description of the PMetro project, while Ranalli et al. (2016), Del Sarto, Ranalli, Bakar, et al. (2016), and Del Sarto, Ranalli, Cappelletti, et al. (2016) use functional data analysis and Hierarchical Bayesian spatiotemporal modeling, respectively, to analyze these high-frequency data.
In this paper, we integrate the fine particulate matter (PM) concentration measurements coming from the Minimetro mobile station with data coming from a traffic sampling equipment positioned at one of the intersections of the Minimetro path counting the number of vehicles every 5 min, 24 h/day.We also collect hourly weather conditions data (such as temperature, wind speed, and rainfall) coming from fixed monitoring stations of the Regional Environmental Protection Agency placed in the surrounding of the path, and merge them with the other two data sets.Since data are collected at different time frequency, we convert values to hourly data.Moreover, we reduce the spatial resolution of the data using information from only one point given by the crossroad intersecting the Minimetro path.
For the same data set, Crocchianti et al. (2020) investigate the effect of weather conditions and vehicular traffic by means of additive mixed models while Del Sarto et al. (2019) apply a finite mixture of M-quantile regression models (Alfò et al., 2017).In both papers, an intensive model selection procedure is put in place to select the variable that best measures the effect of vehicular traffic.In fact, traffic can be measured in terms of the number of vehicles passing at the crossroad every hour, at different lag hour or by cumulating them at ℎ hours before the current time point with ℎ ranging on a defined interval.To this end, several models are estimated in Crocchianti et al. (2020) and in Del Sarto et al. (2019), each one with a different traffic-related covariate (all other things being unchanged).The model fitting is evaluated using classical information criteria such as the Bayesian information criterion (BIC), and the model with the lowest BIC is retained.In particular, in Del Sarto et al. (2019) the variable selection step is preliminary and the best proxy for traffic is retained in all subsequent model selection analyses and for all levels (quantiles and M-quantiles) of the conditional distribution of the response given the covariates.This can be a limitation as the effect of traffic can change at different levels of the conditional distribution.In addition, results in Crocchianti et al. (2020) show that the effect of vehicular traffic changes for different fractions of PM, with a larger number of cumulation hours needed to better trace the vehicular effect on smaller particles with respect to larger (coarse) ones.Finally, results from previous analyses are not clear-cut and may be influenced by the hour of the day which is not accounted for in those papers.For all these reasons, in this paper we are interested in investigating how meteorological conditions and vehicular traffic levels affect fine PM concentrations at different levels of the conditional response distribution.Furthermore, we conduct model selection and estimation at the same time, by including all traffic-related variables and the hour of the day as covariates in the model.
When dealing with a problem where the data set contains a large number of covariates, the selection of predictors is a crucial goal.Recently, there has been an active research production on sparse representation of regression problems starting from the work of Tibshirani (1996).In his paper, he introduces the least absolute shrinkage and selection operator (Lasso) as a penalization technique which can simultaneously perform variable selection and parameter estimation.In particular, Lasso estimation represents an  1 -regularized least squares estimate of the parameters of the model.When categorial predictors are present in the regression model, Lasso may not be completely satisfactory since it only selects individual dummy variables instead of the whole predictor.Moreover, it fails to do grouped selection.In fact, it tends to select one variable from a group and ignore the others.To overcome these problems, several solutions and extensions are proposed in the literature such as the Elastic Net of Zou and Hastie (2005), having an improved performance in particular for correlated predictors.The Elastic Net uses both the  1 -and the  2 -norms to penalize the model and usually outperforms the Lasso regularization while enjoying a similar sparsity of representation.In addition, it encourages a grouping effect, where strongly correlated predictors tend to be in or out of the model together.
In this paper, we introduce M-quantile regression shrinkage and selection via the Lasso and Elastic Net to analyze PMetro data.This allows (i) to identify the best proxy for vehicular traffic via model selection and fitting at the same time, while accounting for the hour of the day, (ii) to investigate the relationship between fine PM concentration and the covariates at different M-quantiles of the conditional response distribution, and (iii) to obtain estimates that are robust to outliers.In addition, to account for the clustered-time structure of the data, we introduce in the model a smooth function of time (day of the year) and we estimate it using penalized B-splines (Eilers & Marx, 1996).This should account for social and generally unmeasured confounders on the distribution of fine PM as proposed also in Bertaccini et al. (2012).
Regularization has been introduced and investigated in the quantile regression framework (Belloni & Chernozhukov, 2011;Koenker, 2004;Li & Zhu, 2008;Petrella & Raponi, 2019;Slawski, 2012;Tang & Lian, 2016;Wang et al., 2012;Yan & Song, 2019;Zhang et al., 2017), also from a Bayesian perspective (Aghamohammadi & Mohammadi, 2017;Alhamzawi et al., 2012;Bernardi et al., 2018;Kyung et al., 2010;Li et al., 2010;Tian et al., 2019).In this paper, we focus on M-quantiles (Breckling & Chambers, 1988).These extend the ideas of M-estimation to a different set of location parameters of the response conditional distribution that lie between quantiles and expectiles.M-quantile regression can be considered as a robustification to outliers of expectile regression (Kneib, 2013;Newey & Powell, 1987) based on influence functions.Moreover, although M-quantiles have a less intuitive interpretation than standard quantiles (Jones, 1994), M-quantile regression offers a number of specific advantages: (i) it allows to trade robustness for efficiency by allowing to select a tuning constant in the influence function; and (ii) it offers computational stability as a wide range of continuous influence functions can be used (Tzavidis et al., 2016).
With respect to existing literature on M-quantile regression, we introduce the Lasso and the Elastic Net regularization for shrinkage and model selection.In fact, Spiegel et al. (2017) propose the use of nonnegative garrote for semiparametric expectile regression; Zhao and Zhang (2018) propose variable selection in expectile regression using the SCAD (smoothly clipped absolute deviation) penalty function, while Xu et al. (2021) introduce the Elastic Net penalty into expectile regression; Waldmann et al. (2017) perform variable selection for covariates with linear effects in Bayesian expectile regression using Bayesian regularization priors.Yi and Huang (2017) introduce Elastic Net penalization to quantile regression and to Huber loss regression.Our approach is similar to Yi and Huang (2017) as we also consider Huber loss regression, but the latter considers only the central M-quantile.Up to our knowledge, our paper is the first attempt to consider regularized M-quantile methods for the whole set of location parameters and fills in a gap in the literature.In addition, we include a nonparametric term to model departures from linearity of the effect of a covariate on the response and we model it using penalized B-splines.This can be easily extended to additive modeling.To avoid issues in combining the Lasso with penalized predictors (Marra & Wood, 2011), we propose to estimate the coefficients of the B-splines with a penalized least-square procedure similar to that in Pratesi et al. (2009) and outside the Lasso estimation procedure.
The paper is organized as follows.Section 2 describes the data used in the application and Section 3 introduces the methodology.In particular, we introduce the M-quantile with Lasso regularization in Section 3.2, and then we extend the method to the use of Elastic Net in Section 3.3.In Section 3.4, we account for nonlinearities through B-splines.The results of the proposed model to the PMetro data are reported in Section 4. Conclusions and directions for future research are provided in Section 5.

DATA
Data come from the project PMetro, which ran between 2012 and 2015 in order to assess air quality in the town of Perugia (Italy).The response variable is collected through an optical particle counter (OPC) placed on one of the cabins of the Minimetro transportation system (for more details on the performance of the OPC, see Castellini et al., 2014).These cabins travel on a monorail built 5 m above the road level and are driven by a wire rope running at a low speed of about 4-7 m/s.There are seven stations along the path and a single travel is about 20 min long, which is repeated about 40 times per day.
Since PM gravimetric and chemical measurements require long times to achieve reliable results, the OPC measures the particle size distribution in terms of the number of particles in a range of selected aerosol bins.In particular, the concentration measure collected by the OPC is divided into 22 dimensional bins according to the particle diameter, expressed in micrometers (μm, 10 −6 m), from 0.28 to 10 μm, and we focus on concentration of particles with a diameter lower than 1.10 μm, obtained by summing the first nine dimensional bins.
The predictor variables are used to measure traffic and weather conditions.The latter are measured by fixed monitoring stations of the Regional Environmental Protection Agency while the former are obtained by traffic sampling equipment installed by the Municipality.In particular, the fixed monitoring stations are located along the Minimetro path and provide data about temperature, wind speed, rainfall, total solar radiation, and radon concentration.Traffic is measured by the number of vehicles.These data are collected with different time frequencies and in different locations.In order to exploit all sources of information, the measurement of the variables has been referred to the same time frequency, that is the hour, and to the same location, that is the intersection between the Minimetro path, the fixed monitoring station and the crossroad with traffic equipment.The period considered goes from March 2014 to February 2015, for a total of 201 days

Quantile versus M-quantile regression
When investigating the relationship between a response variable  and  predictors , quantile regression (Koenker & Bassett, 1978;Koenker & D'Orey, 1987) allows to analyze the th quantile of the conditional distribution of  given .This is particularly useful when the average behavior of  given , investigated by means of the classical regression analysis, does not give a complete picture of the distribution.In particular, quantile regression assumes that the th quantile of  is a linear function of the predictors, in other words where  0 is the intercept and  1 is the vector of regression coefficients.This model leads to a family of hyperplanes indexed by the value of the corresponding quantile coefficient  ∈ (0, 1).Given a sample of  observations (  ,   ) for  = 1, … , , the vector   = ( 0 ,   1 )  is estimated by minimizing with respect to   through linear programming methods, where   (  ) =   −  0 −     1 .M-quantile regression (Breckling & Chambers, 1988) generalizes the quantile (Koenker & Bassett, 1978) and the expectile regression idea of Newey and Powell (1987) through the use of influence functions.Specifically, the M-quantile regression line of order  is defined as the solution where  denotes the distribution of  given  underlying the data and   denotes the influence function associated to the th M-quantile.The general M-estimator of   = ( 0 ,   1 )  is obtained by solving the following set of estimating equations: with respect to   .It is assumed that where   (  ) =   −  0 −     1 and   is the scale.We consider throughout the paper the Huber (Huber, 1981) influence function, given by () = , if − ≤  ≤ , and () =  sign() otherwise, where  is a cutoff constant.For ease of notation, we will drop  and use simply   instead of   .
The role of the cutoff constant  is relevant.In particular, robustness is increased as  decreases, while efficiency is increased as  increases.Note that expectile regression is a particular case of M-quantile regression with  → ∞, so that for  = 0.5, the solution to (1) coincides with ordinary least squares.In this sense, M-quantiles can be seen as a robustification of expectiles.Also, quantile regression can be seen as a particular case of M-quantile regression when the tuning constant  tends to 0. In practice, the tuning constant  is set by the data analyst in order to provide a trade-off between robustness and efficiency.Huber (1981, p. 18) suggests that "good choices are in the range between 1 and 2, say, 1.5."The default value for  in the rlm function of the R package MASS is 1.345, as suggested by Holland and Welsch (1977), which corresponds to 95% of efficiency of the estimates under normality.When the errors are normally distributed, the best choice is to set the tuning constant equal to a large value such as 100, since in this case using a smaller value, such as 1.345, will offer unnecessary robustness at the cost of reduced efficiency of the estimates.If in the diagnostic analysis some outliers are identified, a low value for the tuning constant, such as 1.345, is preferred because resistance against outliers is necessary.Bianchi et al. (2018) propose a data-driven method to select the tuning constant in the Huber loss function using the Generalized Asymmetric Least Informative density.Otto-Sobotka et al. ( 2019) replace the value of  with a function that represents the changes of the error distribution along the covariates to handle heteroskedastic errors and obtain an estimate of this function from the data.

M-quantile regression shrinkage and selection via the Lasso
Suppose we collect sample data on predictors in matrix  ∈ ℝ × and sample data on the response variable in vector  ∈ ℝ  .The Lasso-type estimator for M-quantile regression coefficients at  ∈ (0, 1) can be defined as where the loss function  is given by with   (  ) =   −  0 −     1 , and   is the scale of the residuals that can be estimated by MAD (mean absolute deviation).Here,   ≥ 0 is the tuning parameter that controls the amount of shrinkage applied to the estimates.Setting the derivatives of (3) with respect to   to zero yields the following system of equations: that can be written also as where   is an  vector of ones,  1+ is a  + 1 vector of zeros,   = diag(  ), and   =   (  (  ))∕2  (  ).As it is customary in M-quantile regression, the weighted normal equations ( 4) and ( 5) require an iterative procedure because the weights depend on the regression coefficients.In particular, the iterative procedure can be summarized as follows: 1. Start with an initial value for β , compute residuals   ( β ) and weight matrix   ; 2. Given β0 and   , estimate β1 from Equation (5); 3. Recompute   and estimate β0 from Equation (4); 4. Alternate steps 2 and 3 until convergence.
Note that through this procedure we obtain a solution for the unconstrained case, that is, to (3) only without satisfying constraints on the sum of the absolute values of the coefficients in (2).In order to get the Lasso estimates, let   ,  = 1, 2, … , 2  be the -tuples (±1, ±1, … , ±1).Then, condition For a given  1 , let  = { ∶     1 =   } and  = { ∶     1 <   }, and denote by   the matrix whose rows are   for  ∈ .Now the previous unconstrained procedure can be updated in order to find a solution to the system of equations subject to the Lasso constraint.In particular, step 2 should be modified as follows: 2. Given β0 and   , estimate β1 from Equation ( 5) subject to    1 ≤     where  denotes the number of rows of   , by using the constrained M-quantile procedure proposed by Fabrizi et al. (2012); if Note that the procedure must always converge in a finite number of steps since one element is added to the set  at each step, and there is a total of 2  elements.For  = 0.5, Yi and Huang (2017) consider a semismooth Newton coordinate descent algorithm to obtain parameter estimates.We compare the final estimates obtained with this alternative estimation procedure in a simulation study in Supporting Information Appendix A.2.The Lasso parameter   can be estimated by cross-validation or generalized cross-validation (GCV).
The covariance matrix of the Lasso estimates of the regression coefficients may be approximated following the approach in Tibshirani (1996).In particular, by writing the penalty the Lasso estimate in (3) may be approximated by a ridge regression of the type where   is the diagonal matrix with weights from the last iteration, λ is that value for which ∑  =1 | β | = t , and  = diag[0, 1∕| β1 |, … , 1∕| β |] is the penalty matrix.Then, a sandwich-type estimator can be written as where σ2  is an estimate of the scale, such as the MAD σ = median|  ( β )|∕0.6745.Note that by construction this method provides zero variance estimates when the estimated coefficients are exactly shrunk to zero.In addition, variance estimates in (7) do not account for the variability due to the selection of the tuning parameter.We investigate the magnitude of the underestimation for these variance estimates in simulated data sets.In particular, we compare them with bootstrapbased variance estimates, as this issue becomes more relevant as we extend the methodology to Elastic Net penalization and penalized B-splines.Bootstrap overcomes both above-mentioned issues, but can be much computationally intensive, particularly for a relatively large data set as the one we deal with.See Supporting Information Appendix A.1.

Extension to Elastic Net
In the previous section, we have introduced M-quantile regression with Lasso regularization, which allows us to do continuous shrinkage and automatic variable selection simultaneously.Nevertheless, there are some situations in which the Lasso does not perform well.In particular, (i) when  < , it selects at most  variables; (ii) in case of a group of highly correlated variables, Lasso selects just one of them; (iii) when  >  and there is high correlation between predictors, the prediction performance is poor compared to ridge regression (Zou & Hastie, 2005).In order to improve in such scenarios, Zou and Hastie (2005) propose the Elastic Net, which introduces a new penalization given by a convex linear combination between the Lasso and the Ridge penalization.We extend these concepts here to M-quantile regression.
We define the Elastic Net estimator for M-quantile regression coefficients at  ∈ (0, 1) as where the loss function is given by Solution to (8) can be achieved by a Lasso-type optimization problem.In fact, following lemma 1 of Zou and Hastie (2005), we define a new augmented data set of dimension  * =  +  by where   is an identity matrix of dimension by means of the iterative procedure illustrated in the previous section on the augmented data set.We can approximate the covariance matrix of the estimated coefficients β *  with Elastic Net penalization in (8) using the equivalence in (10), so that where  + is an  +  vector of ones,  *  is the  +  diagonal matrix with weights from the last iteration, γ is the final estimate of the shrinkage factor, and is the penalty matrix.A first set of final estimates β1qn , which we call naive estimates, is given by Naive estimates in (12) can suffer from double shrinkage (Zou & Hastie, 2005).For this reason, as it is suggested in Zou and Hastie (2005), we will use as final estimates The estimate of the covariance matrix of the final estimates in ( 13) can be obtained simply by The shrinkage parameter  2 can be selected by cross-validation or by means of GCV if the data set is very large.In particular, the GCV criterion can be written as where is similar to a hat matrix based on an approximated ridge regression-type estimator like that in (6) so that its trace provides an approximated value for the effective number of degrees of freedom used.Traditional GCV is obtained when parameter  is set equal to 1. Values of  larger (smaller) than one penalize more (less) complex fits and, therefore, provide more (less) shrinkage.

Accounting for nonlinearities via B-splines
So far we have assumed a linear M-quantile regression model, that is, In some circumstances, we can face nonlinear relationships between the response and one or more predictors.Therefore, we introduce the possibility to handle nonlinearities.For ease of notation, we treat the case in which only one covariate is included in the model nonparametrically.Extension to the case where more variables are included nonparametrically is straightforward.In particular, we allow a predictor  to have a nonlinear relationship with the response  and we extend model ( 15) to the following model: where   (  ) represents the unknown functional relationship between  and  at M-quantile .Here, we assume that the latter can be well approximated by means of the following polynomial spline of degree , where  ()  (  ) are -splines basis functions (Eilers & Marx, 1996),   are the corresponding coefficients, and  is the number of knots.If the number of knots  is sufficiently large, the class of functions in ( 16) can approximate most smooth functions.Using a large number of knots, however, can lead to an unstable fit.In order to overcome this problem, the influence of the number of knots is limited by choosing  to be reasonably high and by putting a constraint on the variation of the spline coefficients.Using -splines, a penalty of differences with order  is applied to all of the neighboring parameters.Let  () denote a difference matrix of order , then the penalty matrix is  =  ()  () and provides a penalty made of squared th order differences in the sequence of coefficients.Let  be the  ×  matrix of basis functions with   = ( () 1 (  ), … ,  ()  (  ))  on its th row, and   = ( 1 , … ,   )  be the -vector of corresponding coefficients, then we can extend the Elastic Net loss function in Equation ( 9) to encompass penalized splines through the following loss function:

𝐿
where   (  ,   ) =   −  0 −     1 −      and  3 is a parameter that controls the level of smoothness of the spline part of the model.Of course, Lasso regularization can be seen as a particular case in which  2 is set to zero.
The solution can be obtained again as a Lasso-type optimization problem using the augmented data set introduced in the previous section and introducing ] .
Then the solution to (17) can be obtained using Note that in this case rescaling the vector α *  is not needed because the last  rows of matrix  * (+)× are all zeros.The tuning parameter   is selected at step 2 by means of cross-validation, while the smoothing parameter  3 is chosen at step 3 via GCV as proposed in Pratesi et al. (2009).Note that Elastic Net regularization pertains only x-variables, while penalization for -splines is conducted outside the Elastic Net shrinkage.For this reason, the estimation procedure alternates step 2 (Elastic Net shrinkage) and step 3 (penalized spline estimation) to obtain the overall fit.Simple truncated splines can be accommodated by changing the basis functions in matrix B and the penalization to ridge shrinkage as      = ∑  =1  2  .When looking at the Elastic Net estimate in the presence of a spline, we can approximate the variance of β *  and α *  as follows.Let  * = [ +  *  * ], then these estimates may be approximated by a ridge regression type-estimator such as where  = blockdiag{ γ  * , λ3 }, then F I U R E 1 Correlation between meteorological and vehicular traffic variables.
The GCV criterion for the smoothing parameter can be written similarly to ( 14) as where is a combination of a hat and a smoother matrix based on the same approximated ridge regression-type estimator in ( 22) so that its trace provides an approximated value for the effective overall (covariate and spline components) number of degrees of freedom used.

APPLICATION
The proposed modeling approach has been applied to the PMetro data presented in Section 2. Indeed, we are interested in investigating how meteorological conditions and vehicular traffic levels affect fine PM concentrations in Perugia at different levels of the conditional response distribution.Since a large number of traffic predictors is available and multicollinearity can be envisioned among them (see Figure 1), Elastic Net regularization is employed in order to perform variable selection and estimation at the same time.In particular, the response variable is given by fine PM concentration on the log scale, while predictors can be grouped into two sets: meteorological data and vehicular traffic data.The first group includes radon concentration on the log-scale, temperature, wind speed, total solar radiation on log-scale and rainfall; the second group includes a set of 25 count variables,  0 - 24 , that measure the number of vehicles that passed at that hour,  0 , 1 hour before,  1 , 2 hours before,  2 , up to 24 hours before,  24 .These variables are included on the log-scale.The log-transformation is used to obtain an approximately equal spread of the variables in the model and an approximately linear relationship with the response when possible.Moreover, a set of 13 dummy variables is also added to control for the hour of the day of the observation, ℎ 7 -ℎ 20 , where ℎ 7 denotes 7 a.m. and ℎ 20 denotes 8 p.m.The reference hour is 6 a.m.Finally, in order to capture unobserved heterogeneity among days, a smooth function of the Julian day is added to the model as in Bertaccini et al. (2012).We expect this variable to have a nonlinear relationship with the response, while for the other variables a linear relationship is adequate (see results in Crocchianti et al., 2020;Del Sarto et al., 2019).Temperature is the only variable for which some curvature has been detected in similar data sets, but a quadratic term is not found to be significant in this context.The model can be summarized as follows: {log()} =  0 + log() 1 +  2 +  3 + log() 4 +  5 + log( 0 ) 6 + log( 1 ) 7 + log( 2 ) 8 + ⋯ + log( 24 ) 30 + ℎ 7  31 + ℎ 8  32 + ⋯ + ℎ 20  44 +   ( ).
The model is fitted at different levels of  = {0.1,0.2, … , 0.9}.The cutoff constant  is set to 1.345 to grant robustness with respect to a few outlying observations.At each level of , the penalization parameter of the ridge part of the model  2 is selected by GCV, the tuning parameter for Lasso is selected by means of cross-validation, and the smoothing parameter  3 for B-splines is selected using GCV, as illustrated in Section 3.For each level of , the estimation procedure requires 3.5 h on a Intel Xeon CPU E5-2420 @ 1.90 GHz server.In particular, the following estimates are obtained for  2 at each value of , λ2 = {0.00,0.20, 0.25, 0.35, 0.30, 0.30, 0.30, 0.20, 0.05}.
Variance estimates are obtained by means of the analytic formula in ( 23) and by means of bootstrap.In particular, we draw a nonparametric bootstrap sample of the same size randomly with replacement from our original data set.Then, using the bootstrap sample we estimate the model again by using the same configuration as for the original model.That is, we use the final estimates of the tuning parameters t , λ2 , and λ3 of the original model.We repeat the procedure 1000 times.We this approach to save computing time (Spiegel et al., 2021, also use this type of bootstrap for flexible expectile regression with large data sets).Estimating the tuning parameters in each bootstrap sample would allow to consider also the uncertainty of those.However, with the current algorithm on the original data set, optimizing 1000 times the tuning parameters is prohibitive.Moreover, for B-splines, it cannot be ensured that the estimated curve of the main model always lays inside the bootstrap interval.Numerical illustrations in Supporting Information Appendix A.1 show that the underestimation due to keeping the tuning parameters fixed is acceptable, so that this approach is a viable option when dealing with large data sets.
In Table 2, we report the estimated coefficients at  = 0.5, together with standard error estimates obtained using bootstrap and the analytic formula.The meteorological predictors play an important role on fine PM concentration; the sign and magnitude of the estimated coefficients are in line with those from the literature and those obtained in Del Sarto et al. (2019).In particular, radon concentration accounts for the most impact.Radon concentration can be used as a proxy of the planetary boundary layer (PBL), which represents the lowest part of the atmosphere.Its height is important in dispersion of pollutants and particles, since the lower the PBL, the higher the particle concentration.Radon products can be considered as natural tracers of the low PBL layers mixing properties: the higher the radon concentration, the lower the PBL.The estimate of radon concentration takes value 1.059; this implies that if it increases (and hence the PBL decreases) by 10%, then the -median fine PM concentration increases by approximately 11% (se 3%).Wind is inversely related while total solar radiation is positively related to fine PM, as expected.Rainfall and temperature are discarded from the model.Estimates of the effect of the hour of the day are all retained in the model and all positive, with values that increase over the course of the day, by this providing evidence of an accumulation phenomenon.Finally, we can notice that the model performs heavy selection of vehicular traffic variables.Indeed, among all the 25 vehicular traffic predictors, only seven are retained-log( 0 ), log( 5 ), log( 8 ), log( 11 ), log( 17 ), log( 20 ), and log( 24 )-with a positive effect in all cases but for log( 24 ).Indeed, only the association with the concurrent number of vehicles,  0 , is strong enough (significant also looking at the corresponding standard error).The negative association detected with  24 is only marginally significant and can be likely due to a cyclical effect coming from the day of the week.
Figure 2 displays the B-spline estimate of the effect of the day of the year together with bootstrap-based and analytic 95% confidence bounds.The estimated function follows a seasonal pattern for which there are relatively higher levels of fine PM concentrations during wintertime, reflecting confounders like social processes such as heating, and then decreasing in summertime.In the case in which the data covered a multiyear period, then a periodic spline, such as those illustrated in Eilers and Marx (2010) should be considered to describe the trend within each solar year.
Figure 3 provides estimated coefficients for meteorological covariates at different levels of , together with the corresponding 95% confidence intervals.Figures A.5 and A.6 in the Supporting Information Appendix show similar plots for vehicular traffic and hour-of-the-day covariates, respectively.The value of the intercept is increasing with , temperature TA B L E 2 Parameter estimates and corresponding estimated standard errors by bootstrap (  ) and analytic formula (  ) at  = 0.50.Note that the analytical s.e.estimates are by definition zero when the estimated coefficient is zero.and rainfall remain nonsignificant along all the values of .Wind speed is always negative and significantly different from zero, and decreases in absolute value with .This implies that higher M-quantiles of the distribution of fine PM counts depend less strongly on wind speed.Radon concentration is always significantly positive and displays an inverse U-shape, with a peak at  = 0.4.This, implies that central M-quantiles of the distribution of fine PM counts, between 0.3 and 0.7, depend more strongly on F I G U R E 3 Estimated coefficients for the meteorological predictors at different levels of , with 95% confidence intervals obtained by analytic standard error (SE) estimates.

Predictor
radon concentration than the lower and the upper tails.The effect of concurrent vehicular traffic,  0 , is estimated to be zero for  = 0.1, and is significantly positive at all other values of .The estimates display a relatively constant pattern with respect to .This implies that the relative impact of an increase in vehicular traffic remains the same for lower and for higher M-quantiles of the distribution of fine PM.The features observed at  = 0.5 for the other traffic-related variables remain similar also at the other values of .
Supporting Information Appendix A.4 reports a set of figures that compare the estimates obtained with the proposed procedure with those obtained using the Elastic Net quantile regression estimates as proposed by Yi and Huang (2017) and implemented in the hqreg R function with method = "quantile" for  = {0.1,0.2, … , 0.9}.Results appear to be different for some effects and these differences are due to at least two main reasons.First, note that Yi and Huang (2017) approach, as implemented with hqreg, method = "quantile", models the quantiles of the conditional distribution of the response, which can be very different from the M-quantiles, which are essentially robust expectiles.Second, hqreg does not allow for nonparametric regression, that is, to approximate the effect of one or more covariates via B-splines.Therefore, for the effect of the calendar day we have included a polynomial term of order 4 in hqreg, but the final approximation is essentially linear (see Figure A.7 in the Supporting Information Appendix) and this, of course, changes the effect of the other covariates, such as temperature (see Figure A.8 in the Supporting Information Appendix) and the effect of vehicular traffic (see Figure A.9 in the Supporting Information Appendix).The latter in particular, seems to be better captured by the proposed method as the approach of Yi and Huang (2017) sets the effect of concurrent vehicular traffic,  0 , always equal to zero and this is less coherent with PM dynamics and previous results on the same data.

CONCLUSIONS
In this paper, we analyze data on air quality measured by concentration of fine PM in order to assess the contribution associated with vehicular traffic.Vehicular traffic can be measured by different variables and can have a different effect at different locations of the response distribution.For this reason, we introduce M-quantile regression models with Lasso and Elastic Net penalizations in order to conduct model estimation and selection at the same time for different levels of the distribution of fine PM and to be robust to outlying observations.We use data for a whole year in order to account for seasonal trends and include meteorological covariates and other confounders to try and isolate the effect of traffic.The effect of the day of the year is included nonparametrically and estimated from the data by means of penalized B-splines in order to enhance flexibility of the predictive part of the model.The effect of traffic for this type of data has usually been evaluated in two steps: first, the best proxy is detected via model selection by including one variable at a time, then the effect of such a detected variable is included in the model, and often left unchanged at different locations of the distribution of fine PM concentration.See, for example, Del Sarto et al. (2019).Introducing Lasso and Elastic Net for M-quantile regression allows to perform these two steps simultaneously and to detect possible differences at different levels of .In addition, using Elastic Net allows to handle predictors that display a relevant collinearity, as it is the case with the data at hand.Including all traffic-related variables, we identify a major driver given by the number of concurrent vehicles.The relative effect of traffic is essentially invariant across values of , that is, traffic affects equally the low and high M-quantiles of fine PM concentration.As PM concentration is modeled on a log-scale, this implies that the effect is higher in absolute terms for the upper tail of the distribution of fine PM.
With respect to the existing methodological literature, our paper fills in a gap: Yi and Huang (2017) introduce Elastic Net for the Huber loss function for  = 0.50, our paper extends it to any value of .In addition, to better model the data at hand, we allow for the possibility of including covariates nonparametically via B-splines.Finally, we provide bootstrap-based and analytical variance estimates for regression coefficients and for B-splines function approximations.
Despite the wide range of topics covered in this paper, some research problems remain open.For example, our proposed toolkit use an extension of the algorithm proposed by Tibshirani (1996).Developing algorithms for Lasso and Elastic Net estimators for M-quantile regression coefficients based on gradient descent such as that used in Yi and Huang (2017) for robust regression at  = 0.5 remains an open research problem that would improve on computational time considerably.Finally, an efficient implementation of the proposed procedures could be provided in a R package, which could include a main function for model fitting, and a variety of auxiliary functions for summary and plotting.This is left for future research.

A C K N O W L E D G M E N T S
The work of Ranalli has been carried out with the support of the project AIDMIX, Fondo di ricerca di Ateneo, edizione 2021, Universita degli Studi di Perugia.The work of Salvati has been carried out with the support of the project InGRID 2 (Grant Agreement N. 730998) and of the project LOCOMOTION (Grant Agreement N. 821105).The authors are grateful to the PMetro project for providing the data and to David Cappelletti for useful discussions.

C O N F L I C T O F I N T E R E S T S TAT E M E N T
The authors declare no conflicts of interest.

D ATA AVA I L A B I L I T Y S TAT E M E N T
The data that support the findings of this study are available in the supplementary material of this article.

O P E N R E S E A R C H B A D G E S
This article has earned an Open Data badge for making publicly available the digitally-shareable data necessary to reproduce the reported results.The data is available in the Supporting Information section.This article has earned an open data badge "Reproducible Research" for making publicly available the code necessary to reproduce the reported results.The results reported in this article could fully be reproduced.

TA B L E 1 Descriptive statistics of the data. Variable Unit Range Mean SD
.Let   =  1 ∕ 1 .In this way, we can solve the Elastic Net penalization as a Lasso-type problem and find