In a series of experiments we issue forecasts of fine aerosol concentration over the coterminous USA and southern Canada using the Weather Research and Forecasting – Chemistry model initialized with 3D-VAR or ensemble Kalman filter (EnKF) assimilation methods. Assimilated observations include surface measurements of fine aerosols from the United States Environmental Protection Agency AIRNow Data Exchange program. Evaluation statistics calculated over a month-and-half-long summer period demonstrate the advantage of EnKF over 3D-VAR and point to the limitations of applying a simple aerosol parameterization for predicting air quality over the forecast area. Strategies for further improvement of forecasting aerosol concentrations are discussed.
 Atmospheric aerosols play an important role in shaping climate and weather, and are also a pivotal contributor to air pollution. Studies on environmental damage suggest that chronic exposure to particulates with diameters smaller than 2.5 μm (hereafter, particulate matter 2.5, or PM2.5) is the single most critical factor affecting human mortality due to air pollution [U.S. Environmental Protection Agency, 1999; Muller and Mendelsohn, 2007, 2009; Muller et al., 2011]. Thus improvements to parameterizations aerosol chemistry on one hand and assimilation of aerosol observations on the other are essential to better human well-being and more accurate environmental prediction.
 In this paper we focus on the assimilation of PM2.5 observations as a component of regional air quality forecasting. Pagowski et al.  in a study on assimilation of ozone and PM2.5provide a review of applications of data assimilation to modeling atmospheric chemistry; these applications include variational (3D-VAR and 4D-VAR) and Kalman filter methods. To avoid repetition, the past accomplishments will be only briefly summarized here. In addition, an update will be given on other recent studies.
 In the following we describe forecasting experiments with the assimilation of surface measurements of PM2.5using the Weather Research and Forecasting – Chemistry model (WRF-Chem) [Grell et al., 2005] and the Ensemble Square Root Filter (EnSRF), a variant of a deterministic EnKF approach introduced by Whitaker and Hamill . Short-term forecasts were issued for the coterminous USA and southern Canada over a month-and-half-long summer period. Forecasts initialized with the EnKF method were compared to forecasts initialized with a 3D-VAR approach as well as to forecasts without any aerosol assimilation.
 In the following we describe observations, our modeling system, experiment design, and results. A discussion concludes the paper.
 Observations of aerosols can be obtained from satellites and ground-based instruments. We refer the interested reader toShi et al.  and Schutgens et al. [2010a, 2010b] for descriptions of satellites and global measurement networks. Our task of assimilating in situ PM2.5 observations is much simpler than satellite retrieval though not free of difficulties as elaborated in Section 4.2.
 Surface PM2.5 concentrations are available through the U.S. Environmental Protection Agency (EPA) AIRNow (http://www.airnow.gov/) Data Exchange program. This program provides hourly averaged PM2.5 concentrations over the Unites States and Canada. The measurements are readily available around the clock with a delay typically ranging from one to three hours and can be used in real time for data assimilation or model evaluation. There are approximately 650 stations measuring PM2.5 concentrations over our domain, which is shown in Figure 1. The highest density of stations is found in the eastern part of the USA, followed by California and eastern Texas, while observations are relatively sparse over the central U.S. Most of the monitors are located in urban and suburban settings. The concentrations are measured with Tapered Element Oscillating Microbalance (TEOM) instruments. Based on the instrument description (Thermo Fisher, Continuous particulate TEOM monitor, Series 1400ab, product detail, 2007, available at http://www.thermo.com/com/cda/product/detail/1,10122682,00.html), uncertainty of PM2.5 measurements is calculated as 1.5 μg m−3 plus an inaccuracy of 0.75% in the species mass measurement. Hitzenberger et al.  noted that because of the volatility of the species much larger measurement errors can occur depending on atmospheric conditions. However, no formulations are available to calculate such errors for general applications.
 Observations of speciated aerosols are available through the IMPROVE (Interagency Monitoring of Protected Visual Environments, http://vista.cira.colostate.edu/improve/) and U.S. EPA STN (Speciation Trends Network) networks. These two networks report daily averaged concentrations of fine sulfate, elemental carbon, and organic carbon but the measurements are performed only once every three days. Measurements of elemental carbon and organic carbon from the STN network were corrected for bias as recommended in Malm et al. . A map of the monitors is shown in Figure 2. Measurement sites in the IMPROVE network are distributed evenly over the USA while the density of the STN network is higher in the eastern part of the USA. IMPROVE sites are usually located in national parks and rural areas, while those in the STN network are located in urban areas. Because of the length of the averaging period, observations from the IMPROVE and STN networks have limited value for assimilation and are only used in the evaluation.
 Meteorological observations assimilated with EnKF are those used in the National Centers for Environmental Prediction (NCEP) Global Data Assimilation System (GDAS) and include rawinsondes, aircraft, and surface observations. Satellite data are excluded. A description of the assimilated observations is available online (http://www.emc.ncep.noaa.gov/mmb/data_processing/prepbufr.doc/table_2.htm). Since our focus in this work is on the assimilation of PM2.5, meteorological observations were not assimilated in the 3D-VAR experiment but adopted from the EnKF experiment.
3. Modeling System
 WRF-Chem [ibid.] is an atmospheric model that simultaneously predicts weather and atmospheric composition. Multiple parameterizations of varying degrees of sophistication are available in WRF-Chem for representing physical and chemical processes on scales ranging from large eddy simulations to global. Specific choices of parameterizations in this study are given inSection 4.1. Advantages of online or integrated modeling include interaction of chemical species with meteorology, (for aerosols, primarily related to radiation and microphysics), availability of collocated in space and time all model variables without the need for interpolation, and possible improvements to the meteorology through assimilation of species [Grell and Baklanov, 2011]. Not all of these advantages are exploited in the current simulations; potential benefits of comprehensive assimilation are discussed in the conclusion of the manuscript.
 The Ensemble Kalman Filter (EnKF) approach was introduced by Evensen  to data assimilation with an ocean model and since then has become a viable alternative to variational assimilation methods in atmospheric modeling at major meteorological centers [e.g., Houtekamer et al., 2005; Whitaker et al., 2008]. In meteorology, assimilation methods based on EnKF are attractive because their performance is comparable to 4D-VAR [Kalnay et al., 2007], but coding, parallelization and maintenance are much simpler. Similarity in performance of EnKF and 4D-VAR is not warranted for chemical modeling because of the serious deficiencies in emissions and in parameterizations that contribute to model errors that may be non-Gaussian and, for some species, weaker sensitivity to the initial conditions.Evensen  and Hamill  provide comprehensive descriptions of ensemble data assimilation. Only a cursory description of EnSRF is given below. For the derivation and details of the numerical algorithm the reader is referred to Whitaker and Hamill .
 Analyzed state xa is obtained through the Kalman filter update equation [Lorenc, 1986]
where xb is background model forecast, y – vector of observations, Pb – model error covariance of the ensemble of forecasts, R – observation error covariance matrix, H – linear operator which maps the model state into the observation space, and K – Kalman gain matrix. For consistency with equations (1) and (2), updating ensemble perturbations requires scaling the Kalman gain matrix by factor α so that
 Details on implementing the EnSRF for our application are given in Section 4.3.
 To benchmark our EnKF experiments, we also initialize parallel simulations using the Grid point Statistical Interpolation (GSI) [Wu et al., 2002; Purser et al., 2003a, 2003b], a tool based on 3D-VAR. In the GSI the optimal state or analysis is obtained by minimizing a cost function that expresses misfit between the model and analysis and the observations and analysis, weighted by their respective errors. In contrast to EnKF, model errors in the GSI are obtained using climatology. Background error covariances in the GSI are decomposed into vertical and horizontal components and approximated by Gaussian curves using recursive filters. Specific details pertaining current simulations are given inSection 4.2.
 The GOCART model within WRF-Chem simulates 14 aerosol species that include sulfate, dust, organic carbon, elemental carbon, and sea salt, plus an additional aerosol component that includes nitrate and other unspecified particles with radius smaller than 2.5μm. PM2.5 is calculated as a sum of the additional aerosol component and other fine aerosol species as described in Chin et al. . The concentration of the OH radical essential for gaseous sulfur oxidation, which in turn becomes a source for sulfate production, varies according to monthly climatology. Absence of nitrate production parameterization in GOCART is expected to be most detrimental to forecasts over the western part of the domain. No accounting for the formation of secondary organic aerosols would make GOCART most deficient over urban areas in both western and eastern parts of the domain.
 While other aerosol schemes of varying complexity are available within WRF-Chem, for our initial studies using the simple GOCART scheme was dictated by computational considerations.
 Lateral boundary conditions for chemical species could be provided from a global model. Such lateral boundary conditions are subject to difficulties in matching grid resolutions, emission inventories and chemical speciation of regional and global models and the fact that predictions of species by global models can differ substantially suggest that uncertainties in the lateral boundary conditions derived for regional models are large. (For example, Tang et al. illustrated large uncertainties in the lateral boundary conditions derived from different global models for ozone forecasts.) Global advection of aerosols can be non-negligible on an episodic timescale but typically aerosol concentrations over North America remain dominated by local processes. In our simulations we prescribed lateral boundary conditions for chemical species that are invariant in space and time, except for outflow dependence. On inflow, they are specified based on measurements taken on-board NASA-sponsored aircraft missions as derived byMcKeen et al. . Argumentation above suggests that such assumptions in specifying lateral boundary conditions are defensible and do not diminish the validity of our study.
 Initial conditions for chemical species at the very beginning of the simulation period are uniform for all the ensembles and based on McKeen et al. . Similar to our previous study [Pagowski et al., 2010], a five-day model spin-up period was employed to allow adjustment of chemical species concentrations.
 Emissions of PM2.5 precursors are those referred in Kim et al.  and are based on the U.S. EPA 1999 National Emissions Inventory (version 3) with updates of major electrical generating facilities to the observed July 2004 emissions from the Continuous Emissions Monitoring Network. Perturbations to emissions for the ensembles are prescribed as outlined in Section 4.3.
 Horizontal grid spacing of the domain equals 60 km and there are 40 vertically stretched levels up to the model top at 50 hPa. Such coarse grid resolution is dictated by the computational demands of our experiments, which includes multiple ensemble simulations over an extended period of time.
4.2. GSI and the Forward Model
 The GSI in our study is used to produce innovations for EnKF and also for a benchmark assimilation study. Our implementation of the GSI for assimilating surface PM2.5 observations follows closely that of Pagowski et al.  and will be only briefly described here.
 The observation operator for PM2.5 represents linear horizontal interpolation. We assume that surface observations coincide with the first model level. An alternative approach of extrapolating surface measurements to the first model level using similarity theory would require knowledge of the species' surface fluxes. Given the level of uncertainty in estimates of the fluxes we conjecture that it is unlikely that assimilation reliability could be improved with this method.
 Since the large majority of PM2.5observations are over flat terrain and altitudes of some measurements sites are not available, differences between model topography and reality were neglected. Unlike observations, the observation operator does not include temporal averaging. A one-hour assimilation window was used, and it was assumed that observations were available at 30 min past the hour. For consistency, the same simplifications (with respect to vertical location and temporal averaging) were applied in the model evaluation.
 Representativeness error is dependent on the character of the measurement site (urban, suburban, or rural) and was specified in Pagowski et al.  following formulation of Elbern et al.  and given by εrepr = εabs × (Δx/Lrepr)1/2, where εabs is “characteristic absolute error,” a tunable parameter, Δx is model grid size and Lrepr represents the radius of influence of an observation. Previously, we found εabs = 1/2 εobs to be adequate and prescribed Lrepr equal to 10 km, 4 km, and 2 km for rural, suburban, and urban sites, respectively. The observation error is obtained as the sum of representativeness and measurement errors. The latter were described in the section on observations above.
 Background error statistics were obtained for the total PM2.5 using the NMC method [Parrish and Derber, 1992] with 24-h forecasts differences. To account for the diurnal variation in the boundary layer, separate 48-h forecasts were issued at 00, 06, 12, and 18 UTC. In the benchmark experiment, the PM2.5 increment was distributed to different species and sizes of aerosols based on their a priori contributions to the total PM2.5 mass.
4.3. EnKF and the Generation of Ensembles
 Because of sampling error due to a limited ensemble size and insufficient dispersion of ensemble systems, the usual measures to prevent filter divergence in an EnKF assimilation include covariance localization and inflation. In this work we use the Gaspari-Cohn piecewise polynomial [Gaspari and Cohn, 1999] to taper background-error covariances to zero at 1000 km in the horizontal, and at a vertical distance –ln(p/ps), where ps is surface pressure, equal to one. We also limit the influence of PM2.5 observations to only model aerosols and meteorological observations to only model meteorology. Other experiments with more robust localization were performed, but the results require further analysis and will be reported separately. Inflation is specified as in Whitaker and Hamill  so that the prescribed ensemble spread is a linear combination of the prior and posterior spreads and
where σ is the ensemble spread evaluated at each grid point, α is a positive adjustable parameter, and superscripts b and a denote prior (background) and posterior (analysis), respectively. Through experimentation we assigned value of αequal to 1.2. This value is common for perturbations to all state variables (in WRF-Chem: potential temperature, wind, geopotential, water vapor mixing ratio, surface dry pressure, and aerosol mixing ratio).
 Two EnKF experiments were performed: for the first experiment (EnKF_TOT) total PM2.5 is used as a state variable and increments are distributed to individual species as in the benchmark GSI experiment, i.e., using a priori contributions of species to the total PM2.5 mass. For the second experiment (EnKF_SPEC) individual aerosol species are defined as state variables.
 Initial conditions for meteorology for the ensembles at the very beginning of the simulation period were obtained by perturbing the WRF NMM analysis using background error covariance statistics obtained from simulations in summer 2004 imposed on a normally distributed control variable as in Barker .
 After the assimilation, lateral boundary values of meteorological state variables for the ensembles were updated with increments. The future values of these variables on the lateral boundaries were obtained by perturbing WRF NMM forecasts using background error covariance statistics in the same manner as for the initial conditions.
 Experimentation showed that the ensemble spread for aerosols is too small to account for the aerosol model error in an EnKF assimilation when meteorology alone is perturbed.
 In addition to inaccuracies in predicted meteorology and lateral boundary conditions, errors in chemical forecasts arise from uncertainties in parameterizing of chemical processes and emission sources. To account for errors in chemical parameterizations a method similar to the stochastic approach of Buizza et al.  may be applicable. Wu et al.  demonstrated the potential of introducing random perturbations to model parameters for improving ozone forecasting. We believe that the concept of parameterization of model error is very relevant to chemical forecasting and bequeath this concept to future investigation. Here, we only focus on perturbing aerosol emission sources.
 We note that errors in emission estimates are largely confined to uncertainties in their spatial representation, while temporal emissions factors are more reliable [Elbern et al., 2007]. We assume that the spatial correlation of surface aerosol species emissions reflects its origin and thus the error in the surface emissions should have similar spatial characteristics as the emissions themselves (for any single species temporal factors are constant over the domain). Then, spatial covariances of emission sources at grid points (i, j) bij = cov(ei, ej) constitute elements of background error covariance matrix B = [bij]. B is represented as VBxByByBxV, where V is the standard deviation of error and Bx and By are applications of recursive filters on correlation scales determined by analysis of covariances bij. To propagate error covariances from B, perturbations to ensemble emissions are obtained as E′ = BxByν, where ν ∼ log (μ, σ2). We prescribe lognormal distribution so that perturbations to emissions are multiplicative rather than additive. We realize that the assumption of lognormal distribution may negatively affect adherence to Gaussian distribution of model error required by EnKF. Our current implementation imposes a single dominant correlation length-scale over the whole modeling domain; this could be further refined to account for regional variability and, optimally, for source origin of pollutants. Also, we assume that emission errors are correlated for organic carbon and elemental carbon and are uncorrelated between other aerosol species. After experimenting with the EnKF spread we set the two parameters of the logarithmic distribution: meanμ = 0 and standard error deviation σ = 0.3. For these values, most of the modified emissions fall within half to twice the original field, which is consistent with emission error estimates (S. McKeen, personal communication, 2011). The pattern of spatial perturbations to emissions for a given ensemble member does not vary throughout simulation period (i.e., emission inputs are not interchanged between different members). We point out that the generation of random uncorrelated emission perturbations over the modeling domain is not likely to produce a realistic spread in the ensemble because of the cancelling effect of neighboring cells.
 Alternatively, to account for emission uncertainties, we could simulate surface emissions of aerosols as red noise processes (similar to parameter estimation in Evensen ). That approach would also require tuning of the decorrelation timescale and standard random error deviation. The advantage of any of the above methods for chemical EnKF modeling has yet to be demonstrated.
 We employed an ensemble of 50 members. Cycle frequency and forecast length were set to six hours. The forecasting period began on 26 May 2010 and ended on 15 July 2010 including the spin-up.
No aerosol assimilation, meteorology from EnKF experiments.
3D-VAR assimilation of PM2.5, PM2.5 as a state variable, PM2.5 increment distributed to aerosol species based on their a priori contribution to total PM2.5, meteorology from EnKF experiments.
EnKF assimilation of PM2.5 and meteorology, PM2.5 as a state variable, PM2.5 increment distributed to aerosol species based on their a priori contribution to total PM2.5.
EnKF assimilation of PM2.5 and meteorology, aerosol species as state variables.
5. Results and Evaluation
5.1. Total PM2.5 Aerosol
 In Figure 3, a time series of the Root Mean Square Error (RMSE) of PM2.5 forecasts was matched with the total error expressed as a sum of the ensemble spread and the observation error. For reference, the ensemble spread was also plotted.
 Ideally, the RMSE should be equal to the total error. Smaller values of the total error point to an insufficient ensemble spread and, consequently, a deficient estimate of the model error. Experiments with larger inflation factors did not result in a decrease in disparity between the RMSE and the total error. Also, no improvement in these and other statistics could be achieved by increasing the variance of emission perturbations. This suggests that other model errors are not accounted for. Sources of such errors can include deficiencies in physical and chemical parameterizations and characterization of emission errors. It can be noted that the statistics show diurnal variation with the largest forecast errors in the early morning when PM2.5 concentrations are at their peak, and the smallest errors in the afternoon when the aerosols are dispersed by turbulence throughout the boundary layer. It can be seen that the ensemble spread follows the RMSE, and that the filter performs satisfactorily and does not diverge during the modeling period. The statistics were plotted for experiment EnKF_TOT. Statistics for EnKF_SPEC look similar (not shown here for clarity in the figure), except for a slightly smaller spread and smaller total error.
 To obtain verification statistics for the analyses, experiments analogues to those listed in Table 1 were performed except for randomly withholding half of the measurement sites from the assimilations on a daily basis. Evaluation was performed against the complement of the withheld observations. Also, because of the length of the assimilation window (one hour) and availability of observations at 30 min past the hour, observations both before and after the assimilation time were used in the calculations. This approach is different from the approach used to obtain evaluation statistics in the following sections, where model forecasts available at full hours were averaged to match observation time.
 The comparison of statistics is shown in Figure 4 and includes bias, pattern root mean square error (pattern RMSE [Taylor, 2001]; see Appendix Afor its definition and description of other statistics), and spatial correlation. We note that model biases are small and the positive effects of assimilation on this statistic are not significant. Pattern RMSEs and correlations display a consistent pattern at all hours: EnKF_TOT is slightly superior to EnKF_SPEC, and both EnKF experiments are superior to the GSI experiment. The advantage over GSI is not surprising as model errors for the EnKF assimilations are dependent on a current state of the atmosphere while for the 3D-VAR they are climatological. Illustrations of horizontal flow dependency are well documented in the literature [e.g.,Buehner, 2005; Whitaker et al., 2008] but vertical distribution of the increment is less popularized and worth a demonstration. In Figure 5, sample vertical profiles of the increments are shown for GSI and EnKF_TOT along with the vertical profiles of potential temperature in convective and stable regimes. Regardless of the magnitude at the surface, it can be seen that the semi-Gaussian distributions of the increments using the GSI do not realistically reflect vertical mixing in the boundary layer, while the increments in EnKF_TOT are influenced by stratification. For the convective case, perhaps the strength of the inversion is not sufficiently recognized in the profile of the EnKF_TOT increment, but for both cases the latter profiles are clearly more realistic than those of GSI.
 A better performance of ENKF_TOT compared to ENKF_SPEC is consistent at all assimilation times and is surprising. This is further discussed in Section 6.
 In Figure 6, the diurnal cycle of biases, pattern RMSEs, and spatial correlations are plotted for PM2.5 forecasts (from 1 to 6 h) for the four experiments listed in Table 1. We note that the biases of the original model (NoDA) during the simulation period remain small. As a consequence, the effects of different assimilation strategies on this verification statistic are not significant. Without dedicated observations we cannot assess if positive/negative biases during the night/day result from insufficient/excessive mixing in the boundary layer or are a result of deficient chemical parameterizations in the model. Correlation shows a small diurnal variation for all the experiments while pattern RMSE reaches maximum in the early morning. Consistently, the EnKF_TOT experiment is the most successful followed by EnKF_SPEC, and the GSI. The statistics demonstrate a large positive impact of assimilations on the quality of forecasts but also a fast error growth after the assimilations, similar to the other studies on chemical data assimilation cited in the introduction.
5.2. Speciated Aerosols
 For comparison with IMPROVE and STN observations, the six-hour forecasts, excluding analyses, were concatenated and interpolated to site locations and averaged over a 24-h period beginning at 00 LDT of the current day and ending at 00 LDT on the following day. Time series of domain- averaged observed concentrations of sulfate, organic carbon, and elemental carbon and their standard deviations are shown inFigure 7 for both networks. Note higher values of aerosol concentrations and higher variance for the STN network. Evaluation statistics for the forecasts are presented in Figures 8a, 8b, 9a, 9b, 10a, and 10b, respectively.
 We note in the figures that, except for three days (4 July, STN: SO4; 25 June, STN: OC; 7 July, STN: EC), the model over-predicted concentrations of the observed species. Since the total daily PM2.5 bias was close to zero (Figure 6), total concentration of the remaining model aerosol species was under-predicted. Despite relatively coarse grid resolution, the model biases calculated using STN observations are larger than those calculated for the IMPROVE network. This is a consequence of uneven geographical distribution of STN sites that are concentrated over the eastern part of the continent.
 For all the above aerosol species the model correlations averaged over the simulation period are higher when calculated for IMPROVE network. This is likely due to the rural character of the sites matching more closely the coarse model resolution. The difference between correlations calculated for the two networks is most pronounced for elemental carbon, which is emitted from primary anthropogenic sources and poorly resolved by the current model. Assimilation has the most positive effect for sulfate forecasts, with lesser improvement for the prediction of organic carbon, and negligible changes for elemental carbon. It is apparent that the coarse model resolution prevents a successful assimilation of aerosols at the urban scale. In general, the EnKF_TOT experiment appears marginally better than the EnKF_SPEC experiment. The GSI performance is similar to EnKF experiments, randomly better or worse for some species and statistics.
6. Discussion and Conclusions
 Several shortcomings of our work require elaboration, and also outline areas for improvement and future work. They are listed in the order of decreasing priority and addressed in turn: (1) aerosol parameterization; (2) model error and its parameterization; (3) range of observations; (4) meteorology-chemistry coupling; (5) resolution of forecasts; (6) length of forecasts.
 We acknowledge that the application of GOCART aerosol parameterization may be controversial for air quality modeling over our modeling domain. For us application of GOCART is attractive only because of its speed and small size of the output. More sophisticated aerosol parameterizations available in WRF-Chem include gaseous chemistry and imply much more extensive computations and storage that reach beyond our current computer resources. The deficiency of GOCART is most apparent in the evaluations of aerosol species. While the bias of total PM2.5is relatively small, biases of the individual species are in the same order of magnitude as species concentrations. From experience we know that more sophisticated aerosol parameterizations available in WRF-Chem also display large errors for aerosol species. Nevertheless, it would be enlightening to evaluate their behavior in this respect using data assimilation.
 The deficiency of the chosen aerosol modules also manifests itself in the results of EnKF assimilation. When distributing the total PM2.5 increment based on a priori species contribution (experiment EnKF_TOT) results are superior to the distribution determined by EnKF (experiment EnKF_SPEC). This leads us to conclude that regression relationships derived between individual aerosol species priors and observation priors are not realistic. Under circumstances when observations of aerosol species are not available for assimilation, proportional distribution of increments is preferred. Benefits may be incurred if realistic regressions were obtained with more sophisticated aerosol parameterizations. The unrealistic distribution of increments would also occur for variational methods using background error statistics derived for a deficient model.
 Specification of model error is crucial to successful data assimilation. For EnKF model error is determined from the ensemble spread calculated from an ensemble of perturbations so the realism of the perturbations controls the skill of the assimilation. As noted above aerosol assimilation using solely perturbations to meteorology and multiplicative inflation was unsuccessful because of the insufficient spread.
 In our opinion the three dominant sources of errors in our simulations are introduced by emissions, aerosol parameterization and boundary layer meteorology. At this time, we cannot establish the order of their importance. Above, we commented on the aerosol parameterization and will elaborate further on parameterization of its errors below. At this time we are unable to assess the impact of model errors due to meteorology on aerosol concentrations. In the manuscript we focused on perturbing emissions.
 Since results of chemical models are particularly sensitive to emissions and emissions are highly uncertain it is apparent that they constitute a pivotal component of the model error. Our method of perturbing emissions proved quite effective in generating ensemble spread. In the future we would like to account for regional variability of correlation scales used for perturbing emissions sources. Ideally, spatial error correlations should be derived based on the origin of emissions using inventories. Also, because of changes in air pollution regulations, lower economic activity, and more widespread usage of natural gas in power plants our emissions are likely overestimated and revising them would be beneficial.
 Parameterization of model errors is a common practice in meteorology for both 4D-VAR (weak constraint) and EnKF approaches. Previously,Wu et al.  introduced perturbations to model parameters and showed the benefits of such approach for ozone assimilation. Design of stochastic parameterizations for aerosol chemistry would be more intricate than in meteorology, given the complexity of chemical parameterizations and level of uncertainty and number of the parameters involved. It is unclear how successful a parameterization of errors in GOCART could be given the absence of processes crucial for aerosol formation over North America. However, GOCART simplicity and relatively, compared to other schemes, small number of tuning parameters are appealing and parameterization of its errors may lead to an improvement of forecasts.
 Because perturbations to the emissions were the main contributor to the ensemble spread in our simulations, it is evident that they are responsible for better statistics for the EnKF experiments versus the GSI experiment. It is not apparent to us why the average spread in the EnKF_TOT experiment was larger than in the EnKF_SPEC experiment, but its magnitude undoubtedly contributed to better verification statistics for the former.
 The lack of vertical profiles of aerosol concentrations is a major obstacle to successful data assimilation. AOD observations cannot substitute for vertically resolved profiles but from the works of others cited in the introduction, it is apparent that forecasts of tropospheric aerosols benefit from AOD assimilation. Schwartz et al.  showed that forecasts of surface PM2.5concentrations mostly benefited in terms of biases from simultaneous assimilation of surface observations and AOD compared to the assimilation of surface observations alone. In our opinion, AOD assimilation would be most beneficial during episodes when aerosols are present throughout the troposphere or when surface observations are sparse. However, for our modeling domain such atmospheric conditions are not common and limited to episodic trans-boundary advection and extensive forest fires. Based on these considerations and the fact that biases in our simulations are small, we believe that AOD assimilation would not lead to a significant improvement of our forecasts. Adding assimilation of AOD is in our plans.
 Currently, prediction of individual aerosol species depends entirely on the reliability of an aerosol scheme since observations of such species are not available in the real time and, except for rare instances, because of the length of the averaging period they have a limited value for assimilation.
 As an online model that predicts weather and atmospheric composition, WRF-Chem offers an opportunity to study interactions between aerosol species and meteorology. The model also provides an additional bonus for assimilation of all observations that can mutually benefit meteorology and chemistry. So far we have not fully used these advantages of the model. In the current work we decided to exclude the influence of meteorological observations on aerosol increments. Initial experiments showed that such influence is detrimental to PM2.5forecasts because of the lack of constraint on vertical aerosol profiles by observations. We defer conclusions on the effect of such localization until assimilation of AOD is implemented. We believe that, because of the inherent dependence of physics and chemistry in the atmosphere, in a well-performing model a robust localization should be beneficial for both meteorology and chemistry.
 It is in our plans to increase the current model resolution, that is stipulated by simulating ensemble of 50 members, by employing a hybrid high resolution deterministic 3D-VAR, and a low resolution EnKF such as proposed byHamill and Snyder . The hybrid appears to be an economical option to obtain some benefits of the higher resolution and at least improving meteorology in the model. Doubtless, because of the advantages of ensemble forecasts over deterministic forecasts, a high-resolution ensemble would be optimal.
 The length of our forecasts is limited to only six hours. To obtain better guidance on model performance and for the benefit of model assessment for the regulatory purposes (EPA mandates 24-h average for PM2.5 compliance) it would be desirable to increase the length of forecasts to 24 or 48 h in a future study. Our past work offers some guidance on the length of the period when effects of data assimilation are present. Previously, we noted that the positive impact of assimilation using GSI persisted at 24 h [Pagowski et al., 2010]. Since the time variability of verification statistics for GSI and EnKF experiments shows a similar pattern during the first six hours after the assimilation we believe that deterministic forecasts issued using GSI and EnKF analyses would have an advantage over the NoDA forecasts over a comparable period of time.
 Our evaluation shows that PM2.5assimilation using EnKF results in an improved, but not significantly better, model performance when compared to assimilation using the GSI. In a separate series of experiments with artificially increased model bias we found that the advantage of EnKF over GSI was more pronounced. Nevertheless, assessing whether the improvement is large enough to warrant the substantially higher computational cost of the EnKF approach requires further examination. This examination could include applying cost-loss analysis for deterministic and also for probabilistic forecasts such as devised byRichardson  and applied to air quality by Pagowski and Grell .
 Given the listed shortcomings we believe that assimilation of aerosols using EnKF still offers a wide range of opportunities for further improvement of chemical forecasts, which, because of interactions on different scales, may also positively affect weather prediction [Hollingsworth et al., 2008; Grell and Baklanov, 2011].
where f, r, and overbar denote a test field, a reference field, and averaging, respectively. In our case, f represents the model concentrations interpolated temporally (model output available at the full hour) and spatially to observation locations and r represents observations (available at 30 min after the hour).
 To calculate the correlation between model and observations, a series of observations valid at a given forecast time is matched with a series of interpolated model values. Therefore, this statistic represents spatial rather than temporal correlation between model and observations. Also, for these matching series model biases and RMSEs were calculated.
 This research is funded by Early Start Funding from the NOAA Office of Atmospheric Research, the NOAA Air Quality Program, and the NOAA/NWS Office of Science and Technology. The authors thank Jeff Whitaker from NOAA/ESRL/PSD for making the EnSRF computer code available and for guidance in the initial stages of this work. Stu McKeen and Ravan Ahmadov from NOAA/ESRL/CSD prepared original emissions for WRF-Chem simulations. PM2.5 observations were obtained through U.S. EPA AIRNow Data Exchange program and speciated aerosol data are available thanks to the Western Regional Air Partnership. Comments of three anonymous reviewers let us improve the paper.