The performance of the Eta-Community Multiscale Air Quality (CMAQ) modeling system in forecasting PM2.5 and chemical species is assessed over the eastern United States with the observations obtained by aircraft (NOAA P-3 and NASA DC-8) and four surface monitoring networks (AIRNOW, IMPROVE, CASTNet and STN) during the 2004 International Consortium for Atmospheric Research on Transport and Transformation (ICARTT) study. The results of the statistical analysis at the AIRNOW sites show that the model was able to reproduce the day-to-day and spatial variations of observed PM2.5 and captured a majority (73%) of PM2.5 observations within a factor of 2, with normalized mean bias of −21%. The consistent underestimations in regional PM2.5 forecast at other networks (IMPROVE and STN) were mainly due to the underestimation of total carbonaceous aerosols at both urban and rural sites. The significant underestimation of the “other” category, which predominantly is composed of primary emitted trace elements in the current model configuration, is also one of the reasons leading to the underestimation of PM2.5 at rural sites. The systematic overestimations of SO42− both at the surface sites and aloft, in part, suggest too much SO2 cloud oxidation due to the overestimation of SO2 and H2O2 in the model. The underestimation of NH4+ at the rural sites and aloft may be attributed to the exclusion of some sources of NH3 in the emission inventory. The systematic underestimations of NO3− may result from the general overestimations of SO42−. Note that there are compensating errors among the underestimation of PM2.5 species (such as total carbonaceous aerosols) and overestimation of PM2.5 species (such as SO42−), leading to generally better performance of PM2.5 mass. The systematic underestimation of biogenic isoprene (by ∼30%) and terpene (by a factor of 4) suggests that their biogenic emissions may have been biased low, whereas the consistent overestimations of toluene by the model under the different conditions suggest that its anthropogenic emissions might be too high. The contributions of various physical and chemical processes governing the distribution of PM2.5 during this period are investigated through detailed analysis of model process budgets using the integrated process rate (IPR) analysis along back trajectories at five selected locations in Pennsylvania and Georgia. The results show that the dominant processes for PM2.5 formation and removal vary from the site to site, indicating significant spatial variability.
 Fine particulate matter (PM2.5, particles with aerodynamic diameters less than 2.5 μm) pollution is a major concern in the United States since it is linked to adverse human and ecosystem health impact [U.S. Environmental Protection Agency (U.S. EPA), 2004]. Aerosol particles can contribute to visibility degradation and influence the Earth's climate both directly by scattering and absorption of incoming solar radiation and terrestrial outgoing radiation, and indirectly by affecting cloud radiative properties through their role as cloud condensation nuclei (CCN) [Charlson et al., 1992; Yu, 2000; Yu et al., 2001]. PM2.5 can be emitted directly to the atmosphere (primary) or be formed in the atmosphere through atmospheric oxidation of gaseous precursors such as sulfur oxides (SOx), nitrogen oxides (NOx) and volatile organic compounds (VOCs), and subsequent gas-to-particle conversion processes (secondary). High levels of PM2.5 concentrations are typically observed during cool, moist, and stagnant atmospheric conditions at the locations with substantial primary PM emissions and gaseous precursor concentrations [U.S. EPA, 2004]. The 24-h PM2.5 National Ambient Air Quality Standards (NAAQS) promulgated by the U.S. Environmental Protection Agency (EPA) in 1997 is 65 μg m−3 [U.S. EPA, 2004]. To reflect more recent health effect studies and provide increased protection of public health and welfare, EPA revised the level of the 24-h PM2.5 NAAQS to 35 μg m−3, effective on 18 December 2006 [Federal Register, 2006]. The standard is considered to be attained if the 3-year average of the 98th percentile of 24-h PM2.5 concentrations at each population-oriented monitor within an area does not exceed 35 μg m−3, with fractional parts of 0.5 or greater rounding up. It is desirable for local air quality agencies to accurately forecast PM2.5 concentration levels to alert the public of the onset, severity and duration of unhealthy air and to encourage people to help reduce emission-producing activities.
 Real-time forecasting systems for O3 with Eulerian models have been publicly available for several years [U.S. EPA, 1999; McHenry et al., 2004; McKeen et al., 2005; Otte et al., 2005; Yu et al., 2007a], whereas real-time forecasts of PM2.5 are for the most part in the developmental stage. McKeen et al.  evaluated the real-time forecasts of PM2.5 from seven air quality forecast models and their ensembles statistically over the northeastern United States and southern Canada during the 2004 International Consortium for Atmospheric Research on Transport and Transformation (ICARTT) study and concluded that the ensemble PM2.5 forecast, created by combining six separate forecasts with equal weighting, can give the best possible forecast in terms of the statistical measures considered. Clearly, there is a need for accurate characterization of the various processes controlling PM2.5 distributions and trends and critical evaluation of PM2.5 forecast capabilities and skill before forecasts of PM2.5 can become routinely available to the general public.
 The regional air quality in New England was a focus of the 2004 ICARTT study. One broad scope of the 2004 ICARTT study was to maximize the resulting advances in our understanding of the transport and chemistry of aerosol, O3, and oxidants during intercontinental transport, as well as their relationship to radiation balance and climate through cooperation with multiple national and international partners [Fehsenfeld et al., 2006; Singh et al., 2006]. Model evaluation studies for aerosols and their precursors are severely limited by the lack of routine data both aloft and at the surface. The performance requirements for current generation air quality models have increased significantly as a result of exceedingly complex issues that the models are now being used to simulate [Mathur et al., 2005]. The 2004 ICARTT experiment resulted in a comprehensive set of measurements of chemical constituents and meteorological variables, both from surface and aircraft based platforms, which can be used to examine in detail the performance of air quality models from a multipollutant perspective, both in terms of their surface concentrations as well as vertical distributions. Such detailed information on model performance, in turn, helps in identifying deficiencies in existing models, providing guidance for further model enhancements, and building robust operational models.
 The Eta-CMAQ air quality forecasting system, created by linking NOAA/NWS's operational weather forecast model Eta with the Community Multiscale Air Quality (CMAQ) model, was applied over a domain encompassing the eastern United States during summer 2004. A detailed evaluation of the Eta-CMAQ forecast model performance for O3, its related precursors and meteorological parameters during the 2004 ICARTT study was described by Yu et al. [2007a]. In this study we extend that analysis to examine the performance for another critical pollutant, PM2.5, made with the NOAA-EPA air quality forecast capability. The purpose of this paper is twofold: First, this study evaluates the Eta-CMAQ forecast model performance for the vertical profiles of the chemical and physical properties (SO42−, NO3−, and NH4+ concentrations) of PM2.5 with the observational data from aircraft (NOAA P-3 and NASA DC-8) flights during the 2004 ICARTT field experiments. The spatial variability and temporal behavior of the modeled surface PM2.5 mass over the eastern United States during this period are examined through comparison with observations from the U.S. EPA Air Quality System (AQS) network. The model spatial performance for PM2.5 chemical constituents (SO42−, NO3−, NH4+, organic carbon (OC) and elemental carbon (EC)) is evaluated with the observational data from the IMPROVE, CASTNet, and STN networks. Recommendations for further research and analysis in the pursuit of improved PM2.5 forecast are also provided on the basis of the current evaluation. Second, the contributions of various physical and chemical processes governing the distribution of PM2.5 during this period are investigated through detailed analyses of model process budgets using the integrated process rate analysis (IPR) algorithm along the paths of back trajectories from selected locations.
2. Description of the Modeling System and Observational Databases
 The developmental Eta-CMAQ air quality forecasting system for PM2.5, created by linking the Eta model [Rogers et al., 1996] and the CMAQ Modeling System [Byun and Schere, 2006], was applied over a domain encompassing the eastern United States (see Figure 1) during summer 2004. The linkage of the two modeling systems is described in detail by Otte et al. . The detailed description of model configurations is given by Yu et al. [2007a]. The model domain has a horizontal grid spacing of 12 km with 22 vertical layers. The boundary conditions for various species were based on a static vertical profile that was uniformly applied along all lateral boundaries. The species profiles are representative of continental “clean” conditions. The primary Eta-CMAQ model forecast for next day's surface layer PM2.5 is based on the current day's 1200 UTC Eta simulation cycle. The emissions used in the Eta-CMAQ forecasting system are described by Otte et al. . The area source emissions are based on the 2001 National Emission Inventory (NEI). The point source emissions are based on the 2001 NEI with SO2 and NOx projected to 2004 on a regional basis using the Department of Energy's 2004 Annual Energy Outlook issued in January of 2004. The mobile source emissions were generated by EPA'S MOBILE6 model using 1999 Vehicle Miles Traveled (VMT) data and a fleet year of 2004. Daily temperatures from the Eta model were used to drive the inputs into the MOBILE6 model using a nonlinear least squares relationship described by Pouliot  and Otte et al. . The biogenic emissions are calculated as by Otte et al.  using Biogenic Emissions Inventory System (BEIS) version 3.12. The Carbon Bond chemical mechanism (version 4.2) has been used to represent photochemical reaction pathways.
 The developmental Eta-CMAQ model uses the aerosol module as CMAQ described by Binkowski and Roselle  and updates described by Bhave et al.  and Yu et al. [2007b]. Accurately predicting aerosol size distributions by current air quality model remains a challenge. As reviewed by McKeen et al. , the size distribution of aerosols in tropospheric air quality models can be represented by the sectional approach [Zhang et al., 2004], the moment approach [Yu et al., 2003], and the modal approach [Binkowski and Roselle, 2003]. In the aerosol module of the CMAQ, the aerosol distribution is modeled as a superposition of three lognormal modes that correspond nominally to the ultrafine (diameter (Dp) < 0.1 μm), fine (0.1 < Dp < 2.5μm), and coarse (Dp > 2.5 μm) particle sizes. Each lognormal mode is characterized by total number concentration, geometric mean diameter and geometric standard deviation. The model results for PM2.5 concentrations are obtained by summing aerosol species concentrations over the first two modes. Conceptually, the CMAQ PM trimodal distribution data can be converted into size-resolved PM data. So far, the CMAQ model is able to simulate the integral properties of fine particles such as PM2.5 mass and visible aerosol optical depth reasonably well but not good for PM size distributions. Generally speaking, the modal approach offers the advantage of being computationally efficient, whereas the sectional representation provides more accuracy at the expense of computational cost. In order to simulate the aerosol size distribution accurately, using the bin structure with a large bin number such as 50 to represent the PM size distribution is necessary. This will be realized in the future after more powerful computers are available. In this study, we only present the model performance for PM2.5 mass but not size distributions.
 Over the eastern United States, a total of four routine monitoring networks for PM2.5 measurements were employed in this evaluation (Interagency Monitoring of Protected Visual Environments (IMPROVE), Speciated Trends Network (STN), Clean Air Status Trends Network (CASTNet) and Air Quality System (AQS)), each with its own and often disparate sampling protocol and standard operating procedures. In the IMPROVE network, two 24-h samples are collected on quartz filters each week, on Wednesday and Saturday, beginning at midnight local time [Sisler and Malm, 2000]. The observed PM2.5, SO42−, NO3−, EC and OC data are available at 71 rural sites over the eastern United States. The STN network (http://www.epa.gov/air/data/aqsdb.html) follows the protocol of the IMPROVE network (i.e., every third day collection) with the exception that most of the sites are in urban areas. The observed PM2.5, SO42−, NO3−, and NH4+ data are available at 178 STN sites over the eastern United States. The CASTNet (http://www.epa.gov/castnet/) collected the concentration data at predominately rural sites using filter packs that are exposed for 1-week intervals (i.e., Tuesday to Tuesday). The aerosol species at the 34 CASTNet sites used in this evaluation include: SO42−, NO3−, and NH4+. In addition the hourly near real-time PM2.5 data at 309 sites in the eastern United States are measured by tapered element oscillating microbalance (TEOM) instruments at the U.S. EPA's Air Quality System (AQS) network sites (Figure 1). During 5 July to 15 August 2004, measurements of vertical profiles of PM2.5 composition, gas species (CO, NO, NO2, H2O2, HNO3, SO2, isoprene, toluene, terpene, etc.) were taken by instrumented aircraft (NOAA P-3 and NASA DC-8) deployed as part of the 2004 ICARTT field experiment. The PM2.5 composition (SO42−, NH4+, and NO3−) were determined by using the PILS (particle-into-liquid sampler) measurement system [Weber et al., 2001] in the study. The detailed instrumentation and protocols for the aircraft measurements are described at http://www.esrl.noaa.gov/csd/ICARTT/fieldoperations/and by Fehsenfeld et al.  and Singh et al. . In this study, the model performance is examined for the period of 14 July to 18 August 2004 on the basis of the 1200 UTC model run for the target forecast period.
3. Results and Discussions
3.1. Spatial and Temporal Evaluation for PM2.5 Over the Eastern U.S. Domain at the AQS Sites
 For model performance evaluation, regression statistics along with two measures of bias, the mean bias (MB) and the normalized MB (NMB), and two measures of error, the root mean square error (RMSE) and normalized mean error (NME) [Yu et al., 2006] were calculated. Table 1 summarizes the evaluation results for the hourly and daily (24-h) average PM2.5 concentrations. The evaluation results at the urban and rural sites (Figure 1e) are also summarized. Following the protocol of the IMPROVE network, the daily (24-h) PM2.5 concentrations at the AQS sites were calculated from midnight to midnight local time of the next day on the basis of hourly PM2.5 observations. The domain wide mean values of MB and RMSE for daily PM2.5 at the AQS sites during the ICARTT period are −3.2 and 8.8 μg m−3, respectively, and those for NMB and NME are −21.0% and 41.2%, respectively. As can be seen, the model performance for hourly PM2.5 is not as good as that for daily PM2.5. It is of interest to note that the model performed much better at the urban sites than at the rural sites, with greater underpredictions at the rural sites. As shown in section 3.2, for analysis of PM2.5 chemical composition at the IMPROVE rural sites and STN urban sites, the model underestimated NH4+ and NO3− more at the rural sites than at the urban sites, leading to the larger underestimation of PM2.5 at the rural sites compared to those in urban areas. On the other hand, the scatterplot of Figure 1a indicates that the model captured a majority (73.3%) of observed daily PM2.5 within a factor of 2, but generally underestimated the observations in the high PM2.5 concentration range. Since TEOM measurements for PM2.5 should be considered as lower limits because of volatilization of soluble organic carbon species in the drying stages of the measurement [Grover et al., 2005], the underprediction by the model is likely more severe than this evaluation suggests. In order to investigate the AQF system's performance over time, the values of mean, NMB, NME and correlation coefficients were calculated (domain wide averages) and plotted as a daily time series for the 24-h concentrations (Figure 1b). The NMB values range from −44.8% (18 July) to 6.0% (17 August) with the best performance from 13 to 16 August, and significant underestimations of PM2.5 at the beginning of the period (16 to 26 July). A closer inspection of the results indicates that the majority of the domain was significantly influenced by pollutants from large forest fires from Alaska from 16 to 26 July, as shown by the aerosol index images from the TOMS satellite observations (http://toms.gsfc.nasa.gov) [Yu et al., 2007a]. The significant underprediction of PM2.5 during this period is mainly attributed to inadequate representation of the transport of pollution associated with biomass burning into the model domain. Spatially, the model often slightly underestimated observed PM2.5 across space, especially over the northern part of the domain because of low concentrations there (Figures 1c and 1d).
Table 1. Operational Evaluation for PM2.5 Concentrations on the Basis of the AQS Data Over the Eastern United States During the ICARTT Period
Domain Mean (μg m−3)
RMSE (μg m−3)
MB (μg m−3)
3.2. Evaluation for PM2.5 and Its Chemical Constituents at the CASTNet, IMPROVE, and STN Sites Over the Eastern United States
 The scatterplots of Figure 2a indicate that at the IMPROVE, CASTNet and STN sites, the model captured a majority of observed daily SO42−, NH4+, PM2.5 concentrations within a factor of 2. The examination of the domain-wide bias and errors (Table 2) reveals that the model overestimated the observed mean SO42− by 15%, 6% and 11% at the STN, CASTNet and IMPROVE sites, respectively. The model also overestimated the observed mean NH4+ by 21% at the STN sites but underestimated by 6% and 12% at the CASTNet sites and IMPROVE sites, respectively. The model overestimated the observed SO2 by 77% at the CASTNet sites. This may be one of the reasons for the general overestimation of observed SO42−. The comparison of the modeled and observed total sulfur (SO42− + SO2) at the CASTNet sites in Figure 2b reveals that the model overestimated the observed total sulfur symmetrically and the modeled mean total sulfur mean is 33% higher than the observation. This indicates too much SO2 emission in the emission inventory. The poor model performance for NO3− (see scatterplot in Figure 2a and correlation <0.43 in Table 2) is related in part to volatility issues of measurements associated with NO3−, and their exacerbation because of uncertainties associated with SO42− and total NH4+ simulations in the model [Yu et al., 2005], although the domain-wide bias in Table 2 shows that the model underestimated the observed mean NO3− by only 5–14% at the three networks. Examination of the scatterplot in Figure 2a shows that the model underestimated most of the observed OC and TC at the IMPROVE sites by more than a factor of 2. The model seriously underestimated the observed mean OC and TC concentrations at the IMPROVE sites by 63% and 57%, respectively. The model slightly underestimated EC concentrations at the IMPROVE sites. The model grossly underestimated the observed TC concentrations at the STN sites by 60%. Note that since the STN network used the thermo-optical transmittance (TOT) method to define the split between OC and EC while the IMPROVE and the model emission inventory use the thermo-optical reflectance (TOR) method, only the determination of total carbon (TC = OC+EC) is comparable between these two analysis protocols [Yu et al., 2004]. Therefore, Table 2 only lists the results for TC comparisons from the STN sites. As pointed out by Yu et al. [2007b], factors contributing to this underestimation of the modeled OC include (1) missing sources of primary OC in the emission inventory used for the summer and (2) underestimation of secondary OC (SOA) formation such as sources from the oxidation of isoprene and sesquiterpenes [Edney et al., 2005] and an aqueous-phase mechanism for SOA formation from the oxidation of VOCs [Carlton et al., 2006] that are not yet included in the CMAQ model. Morris et al.  found that including the SOA formation from sesquiterpene and isoprene improved the CMAQ model performance for OC. The model underestimated the observed mean PM2.5 by 15% and 20% at the STN and IMPROVE sites, respectively. The underestimation of NH4+ at the IMPROVE rural sites leads to the greater underestimation of PM2.5 than at the STN urban sites (see Table 2).
Table 2. Summary Statistics for PM2.5 and Its Components for Each Network Over the Eastern United States During the ICARTT Perioda
The unit of Mean, MB and RMSE is μg m−3.
Figure 3 shows comparisons of stacked bar plots for observed and modeled concentrations for each chemical constituent of PM2.5 at the IMPROVE and STN sites. Note that “OTHER” species in Figure 3 refers to unspecified anthropogenic mass which comes from the emission inventory in PM2.5, i.e., [PM2.5] = [SO42−] + [NH4+] + [NO3−] + [TCM] + [OTHER]. The significant underestimation of PM2.5 at the IMPROVE sites (most of them are located in rural areas) mainly results from the underestimations of total organic carbon mass (TCM) (TCM = EC+1.4*OC) and OTHER components. Since organic compounds comprising ambient particulate organic mass are largely unknown, an average multiplier is frequently used to convert measurements of OC (typically reported as μg C/m3) to organic carbonaceous aerosol mass (OCM). The value of 1.4 has been widely used to estimate particulate organic mass [e.g., Turpin and Lim, 2001] from measured OC and is also used in our analysis. On the other hand, the underestimation of PM2.5 at the STN sites (most of them are located in urban areas) mainly results from the underestimations of the TCM component. These results suggest a need to improve accuracy of TCM at both rural and urban sites and OTHER at the rural areas. On the basis of analysis of the diurnal cycles from the AIRNow PM2.5 monitors and comparison with model median diurnal cycles over the northeastern United States during the 2004 ICARTT study, McKeen et al.  found some inconsistencies with certain processes within the models and the observations. There is very little diurnal variation in the median observed diurnal cycles at urban and suburban monitor locations, but significant diurnal variability exhibited by some models such as the Eta-CMAQ that does not capture the decrease of observed PM2.5 from 0100 to 0600 LT, indicating a reduced role for aerosol loss during the late night and early morning hours [McKeen et al., 2007]. The large scatter in Figure 2 for PM2.5 can also arise because of inadequate representation of the diurnal evolution of observed PM2.5 by the Eta-CMAQ [McKeen et al., 2007]. As shown by Yu et al. [2007a], the Eta model tended to overestimate the observed PBL heights derived from radar wind profilers at all times at Concord, NH, during the 2004 ICARTT study. Further investigation is needed to understand the reasons for the high diurnal variability in the Eta-CMAQ model
3.3. Evaluation of Vertical Profiles for PM2.5 Chemical Components (SO42−, NH4+, and NO3−) and Its Related Gas Species
3.3.1. Vertical Profiles of SO42−, NH4+, and NO3−
 Comparisons of modeled vertical profiles against observed vertical profiles obtained by aircraft provide an assessment of the model's ability to simulate the vertical structure of air pollutants. Following Mathur et al. , modeled results were extracted by “flying” the aircraft through the 3-D modeling domain by mapping the locations of the aircraft to the model grid indices (i.e., column, row, and layer). In addition, hourly resolved model outputs were linearly interpolated to the corresponding observational time. The flight tracks of aircraft show that measurements on board the P-3 airplane covered a regional area over the northeast around New York and Boston (Figure 4a) with profiles up to ∼5 km, whereas the DC-8 aircraft covered a broader regional area over the eastern United States (see Figure 4b) with profiles up to 12 km. All DC-8 measurements were conducted in the daytime (∼0700 to ∼1900 EST (EST = local time −1 h for the summer of eastern daylight time)), and P-3 also conducted most of its measurements during the daytime, except on 31 July and 3, 7, 9, and 11 August when the P-3 measurements were conducted at night (∼2000 to ∼0600 EST). Table 3 summarizes the specific missions and weather conditions encountered during the flight. To compare the modeled and observed vertical profiles, the observed and modeled data pairs were grouped according to the model layer for each day and each flight; that is, both the observations and predictions were averaged along a particular aircraft transect at an approximate altitude (layer height). Thus, the vertical profiles from both model and observations obtained in this manner can be regarded to represent average conditions encountered over the study domain. We refer to these average regional vertical variations as composite vertical distributions in the subsequent discussion. Figures 5–8 present modeled and observed daily composite vertical distributions for PM2.5 chemical components (SO42−, NH4+, and NO3−) and related gaseous species (HNO3, SO2, H2O2, VOC (isoprene, toluene, terpene)) during the ICARTT period. Mean composite vertical distributions according to the model layer for the model and observation for the whole period are summarized in Table 4.
Table 3. Flight Observation Summary for WP-3 and DC-8 Aircraft
The flight occurred above the cloud at ∼8 km during 1730–2000 UTC in the NW of Boston city.
18 Jul 2004
Characterization of North American pollution outflow, possible characterization of Alaskan fires, and a flyby over the NOAA ship Ron Brown in the NE of Boston city.
20 Jul 2004
Characterization of smoke from Alaskan fires transported over the United States, boundary layer pollution over the southeast and midwest
22 Jul 2004
Sampling polluted boundary layer outflow along the eastern seaboard both to the north and south of Pease.
25 Jul 2004
Convective outflow from southeast United States, and Ohio River Valley emissions in northerly flow.
28 Jul 2004
Sample the chemical evolution of the U.S. continental outflow out over the Atlantic Ocean.
31 Jul 2004
Aged air sampling/recirculation, low-level outflow, and possible Asian influences
2 Aug 2004
Sample low-level North American outflow and aged air pollution aloft.
6 Aug 2004
Flew over the Ohio River Valley.
7 Aug 2004
Sample North American outflow, a stratospheric intrusion, and perform P-3 intercomparison.
11 Aug 2004
NA outflow and WCB lifting, frontal crossing and low-level pollution
13 Aug 2004
Outflow from major industrial cities (Houston, New Orleans) with clear skies for most of time.
14 Aug 2004
Flight above the cloud over Missouri-Kansas during 1900 and 2000 UTC.
Table 4. Comparison of Layer Means From Model and Observations on the Basis of All P3 and DC-8 Aircraft Measurements Over the Eastern United States During the ICARTT Period
Aircraft P-3 (μg m−3)
Aircraft DC-8 (μg m−3)
 As shown in Figures 5 and 6, the model generally overestimated SO42− on most days except on 22 and 31 July relative to the P-3 observations and 18 July and 11, 13 and 14 August relative to DC-8 observation (see Table 4). These overestimates of SO42− are believed to arise in part from possible over estimation of H2O2 predicted by the CB-IV mechanism, resulting in overestimation of aqueous SO42− production to be shown below. The model consistently overestimated observed H2O2 by more than a factor of 2 at altitude below 1000 m most of time compared to the P-3 observations (see section 3.3.2), leading to the overestimation of SO42− due to excessive SO2 cloud oxidation. In contrast, Figures 5 and 6 reveal that the model generally underestimated NH4+ on most days, except on 15, 25, and 28 July and 6, 14, and 15 August relative to P-3 observations. A closer inspection of the model results at these locations indicates that the modeled NH3 is near zero, revealing that the model has underestimated total ammonium (TNH4 = NH4++NH3). The systematic underestimation of NH4+ is because of missing sources for NH3 in the model emission inventory. On 6 and 9 August, the P-3 sampled the fresh plumes of Ohio Valley power plants at ∼1000 m as shown in Table 3. The results of Figure 5 show that the model reproduced the SO42− and NH4+ concentrations at ∼1000 m on these 2 d very well. On 21, 27 and 28 July, the P-3 encountered the fresh city plume (Boston or New York) shortly after takeoff as summarized in Table 3 with low SO42− and NH4+ at low altitudes ( < 200 m). The model estimations are close to the observations at low altitudes as shown in Figure 5. The model performance for aerosol NO3− is poor like that at the surface (see section 3.2). One of the reasons is that NO3− concentrations from the aircraft observations are very low and the model results at low NO3− concentrations are very sensitive to any errors in SO42− and TNH4 in the simulation [Yu et al., 2005]. The large systematic underestimations of NO3− relative to P-3 and DC-8 observation as summarized in Table 4, in part, result from the general overestimations of SO42− because too much of H2SO4 in aerosol leaves less NH3 available to react with HNO3, leading to lower aerosol NO3− if HNO3 is not overestimated.
3.3.2. Vertical Profiles for H2O2, SO2, and HNO3
Figure 7 shows the comparison of the modeled and observed daily composite vertical distributions for HNO3, SO2 and H2O2. As summarized in Table 3, the P-3 aircraft sampled the plume of Ohio Valley power plants at ∼1000 m during 6 August from 1530 to 2030 UTC and 10 August from 0030 to 0330 UTC. Figure 7 indicates that the model reproduced the SO2 and HNO3 concentrations well relative to P-3 observations in the power plant plumes at this height on these 2 d. However, the model overestimated SO2 in the New York City (NYC) and Boston plumes at low altitudes <700 m for these 2 d. Generally, the modeled SO2 concentrations are higher than the observations at the low altitude (< ∼300 m) but close to the observations at higher altitudes. Specifically, when the P-3 sampled the urban plumes of NYC and Boston except on 21 July and 7 August, the modeled SO2 concentrations are generally higher than the observations at the low altitude (<200 m) most of the time, indicating that the model may have overestimated some of emission sources of SO2 from the New York and Boston areas. The possible errors in the meteorological fields such as horizontal transport and vertical mixing can also cause the overestimations of SO2. The general overestimation of SO2 is believed to be one of the reasons leading to the general overestimation of SO42− in the model.
 On 27 July, the surface weather map showed convective activity associated with a surface cold front that stretched from the center of a surface low over the West Virginia-Pennsylvania state line to the Southwest along the Appalachian Mountains with thunderstorms. There was pollution accumulation ahead of the cold front. The pollution upwind and downwind of the Washington and Baltimore metropolitan area between 600 and 2000 m altitudes was sampled by the P-3 from 1730 to 1830 UTC, 27 July, with very high SO2 (>5 ppb), CO (>180 ppb), and HNO3 (>3 ppb) but relatively low O3 ( ∼60 ppb) concentrations. The model underestimated HNO3 and SO2 below 2 km for this pollution accumulation event ahead of the cold front as shown in Figure 7. The model shows good agreement with HNO3 most of the time except on 9, 21, 22, 27, and 28 July and 11 August relative to P-3 observations and except on 18, 20 and 22 July relative to DC-8 observations as shown in Figure 7. The model overestimated the HNO3 concentrations at the low altitudes in the air masses containing fresh plumes such as on 9 and 21–22 July.
 A noticeable discrepancy in Figure 7 is the consistent overestimations of observed H2O2 by more than a factor of 2 at altitudes below 1000 m most of time relative to DC-8 measurements. These overestimations are attributed to the CB-IV mechanism used to represent photochemical reaction pathways in the model. Previous chemical mechanism intercomparisons [e.g., Dodge, 1989; Zaveri and Peters, 1999] suggest significantly higher H2O2 yields predicted by the CB-IV mechanism. Recent simulations with alternative chemical mechanisms during the ICARTT period also show higher H2O2 levels with the CB-IV relative to both the SAPRC and CB05 mechanisms [Mathur et al., 2007]. The overestimations of H2O2 contribute to the overestimation of in-cloud SO42− formation which may be further magnified in situations where SO2 is also overestimated.
3.3.3. Vertical Profiles for Terpene, Toluene, and Isoprene
 Anthropogenic sources from the Washington, DC/New York City/Boston urban corridor and biogenic emissions in New Hampshire and Maine significantly impact the sampled atmospheric aerosols along the coast of New England. Biogenic monoterpene and isoprene emission rates are high over the coniferous forests of northeastern North America, especially in the summer months [Guenther et al., 2000], providing gas precursors for the formation of biogenic secondary organic aerosols (SOA). Anthropogenic toluene stems predominantly from automotive emissions. In the CMAQ aerosol module, biogenic and anthropogenic SOA occur exclusively by absorptive partitioning of condensable oxidation products of aromatic (mainly toluene) and monoterpene compounds into a preexisting organic aerosol phase [Yu et al., 2007b]. Biogenic isoprene can also produce SOA, especially under high-NOx conditions [Kroll et al., 2005] although this is not currently modeled in the CMAQ. Therefore, it is useful to evaluate the model performance for these SOA precursors.
 The model's ability to simulate the composite vertical distributions for isoprene, terpene and toluene, as measured by the P-3, is illustrated in Figure 8. In general, the model captured the vertical variation patterns of the observed isoprene quite well on most days, except on 20 and 27 July and 3 August although the model underestimated the observed isoprene concentration by ∼30% on average. The model captured the very low isoprene concentrations on 21 and 28 July and 11 and 14 August very well, but missed the high concentrations on 27 July and 3 August at the middle altitudes. A noticeable discrepancy is the consistent underestimations of terpene by a factor of 4 (the mean modeled and observed terpene concentrations for all data are 3.5 and 16.2 ppt, respectively) vertically from the low to high altitudes on most days except on 15 August as shown in Figure 8. The flight track of P-3 in Figure 4 shows that on 15 August, observations were taken during the transit flight from Pease Tradeport, NH, to Tampa, FL, via Atlanta, GA, covering a broader regional area over the eastern United States instead of only northeastern United States. This may lead to different results of the model performance on 15 August. This implies that the model may underestimate terpene only over the northeastern United States but not in the southeastern United States. The model overestimated toluene by a factor of 3 (the mean modeled and observed toluene concentrations for all data are 198.2 and 65.6 ppt, respectively) most of days except at high altitudes (>2000 m) as shown in Figure 8. The summary of Table 3 indicates that there was a widespread signature of biomass burning plume (i.e., the observed acetonitrile, the biomass burning plume tracer, was strongly enhanced) over the studied areas during the analysis period. A notable portion of the model domain was significantly influenced by pollutants from large Alaska forest fires from 16 to 26 July as shown by the aerosol index images from the TOMS satellite observations (http://toms.gsfc.nasa.gov). The observations on 15, 20, 21, 22, 27, and 31 July and 3, 9, 11, 14, and 15 August were only significantly affected by the urban (New York, Boston or Washington and Baltimore) plumes, whereas the observations on 25 July and 6 August were mainly affected by power plant plumes. The relatively high observed toluene concentrations on 15, 20, 21, 22, and 31 July and 3, 9, and 11 August are due to urban plume effects and the power plant plumes on 25 July and 6 August do not exhibit high toluene concentrations. This is reasonable because major sources of toluene in the atmosphere are predominantly from automotive emissions. The consistent overestimates of toluene by the model under these different conditions suggest that the over estimations may be attributed to the uncertainty in the emission inventory that has too high toluene emissions systematically. In addition the emission inventory for biogenic emissions of isoprene and monoterpenes is highly uncertain, possibly explaining the general underestimations of isoprene and monoterpenes. The underestimations of terpenes will cause underestimation of biogenic SOA, leading to the underestimation of OC as shown in section 3.2., although the overestimates of toluenes lead to overestimation of anthropogenic SOA. Thus, improvement of the VOC emission inventory is recommended in order to provide better model results for these species.
3.4. Process Analysis (PA) of PM2.5 Formation at Selected Sites
 To study the contributions of various physical and chemical processes to the formation of PM2.5, we employ a simple approach by using the Hybrid Single Particle Lagrangian Integrated Trajectory model (HYSPLIT version 4.7, http://www.arl.noaa.gov/ready/hysplit4.html [Draxler, 2003]) to determine a back trajectory, linking a downwind receptor to upwind source areas and then applying process analysis (PA) to the CMAQ grid cells along the trajectory of the air mass transport path. This can provide quantitative information about the relative importance of each process in changing the PM2.5 concentrations along the trajectory. This enables us to determine the relationship between sources and receptors with respect to PM2.5 formation within the moving air mass. In the CMAQ output, the results of PA provide the hourly time series of vertical advection/diffusion (ZADV/VDIF), horizontal advection/diffusion (HADV/HDIF), dry deposition (DDEP), cloud process (CLD), aerosol process (AERO) and emission (EMIS) at each grid cell. In the CMAQ, aerosol process (AERO) refers to the effects of aerosol module, which includes processes of nucleation, condensation and coagulation, and equilibrium thermodynamics. Note that wet deposition is included in the cloud process and the effects of aerosol gaseous precursors, such as H2SO4 and HNO3, generated by the gas-phase chemistry on the formation of aerosol particles are included in the aerosol process.
 For the back trajectory analysis, the same meteorology applied in the Eta-CMAQ simulation was used to generate input data sets for use in the HYSPLIT back trajectory calculation. By using an ensemble approach to estimate uncertainty in HYSPLIT back trajectory calculation, Draxler  found that the trajectory ensemble approach accounted for about 41% to 47% of the variance in the measurement data although a cumulative distribution of the ensemble probabilities compared favorably with the measurement data. This uncertainty needs be kept in mind when HYSPLIT back trajectories are calculated and used. Figure 9 shows the 24-h backward trajectories ending at 1100 UTC 17 August 2004 at the South Allegheny High School (SAHS) and John sites in Pennsylvania, and ending at 1100 UTC 19 August 2004, at the South Dekalb (SD), McDonough (MD) and Newnan (NN) sites in Georgia. These sites and times were chosen because their PM2.5 concentrations were high (>40 μg m−3) relative to other sites as shown in Figures 9a and 9b. Another reason for these choices is to illustrate two different scenarios in Northeast and Southeast for how the high PM2.5 concentrations were formed. The mean primary emissions of PM2.5 and SO2 over the domain during the period of 6 to 18 August 2004, are shown in Figure 10. Additionally, since during the daytime pollutants are well mixed vertically through the PBL, we examine vertically integrated process tendencies; we choose 2 km as being representative of the mean daytime PBL height for this analysis. Yu et al. [2007a] indicated that average daytime PBL heights during the ICARTT period in the Eta model at Concord, NH, can be ∼2 km. In a study of the summertime atmospheric boundary layer over the eastern United States, Rao et al.  found that the PBL heights can vary from <200 m (nighttime) to ∼2.5 km (the afternoon). Additionally, since efficient long-range transport occurs above the nocturnal PBL, we use a height of 2 km to integrate the process tendencies along the back trajectories. Also, since dispersion is irreversible, the 2 km layer should be maintained for the full duration of the trajectory. Figures 11 and 12 show the accumulated contributions of each process to PM2.5 formation along the 24-h back trajectories at the SAHS and John sites, Pennsylvania, and South Dekalb and Newnan sites, Georgia. Table 5 summarizes the total accumulated contributions of each process to the PM2.5 formation along the 24-h back trajectories (see Figure 9) at the five sites. As can be seen, the dominant processes for PM2.5 formation and sink vary from the site to site.
Table 5. Summary of the Total Accumulated Contributions of Each Process to PM2.5 Formation for 24-h Back Trajectories (See Figures 8, 9, 10 and 11) on the Basis of Column Means From Layer 1 to Layer 14 (PBL Height) at the Five Sites (Two in Pennsylvania (PA) and Three in Georgia (GA))
Processes for PM2.5 (μg m−3)
NN, GA (19 Aug)
SD, GA (19 Aug)
MD, GA (19 Aug)
SAHS, PA (17 Aug)
JOHN, PA (17 Aug)
Aerosol process (AERO)
Cloud process (CLD)
1.06 × 10−4
6.67 × 10−5
8.32 × 10−5
Dry deposition (DDEP)
−1.44 × 10−4
−1.11 × 10−4
−1.30 × 10−4
−2.31 × 10−4
−2.20 × 10−4
Horizontal advection (HADV)
Horizontal diffusion (HDIF)
Vertical diffusion (VDIF)
Vertical advection (ZADV)
 There are noticeable differences for the PM2.5 formation at the Pennsylvania and Georgia sites. For example, horizontal advection (i.e., transport) process contributes to the loss of PM2.5 at most of sites except the SAHS site where it increases PM2.5 significantly (see Figure 11 and Table 5). In most of cases, vertical diffusion and vertical advection processes make small contributions to the loss of PM2.5, and the effects of horizontal diffusion on the PM2.5 formation are negligible as shown in Table 5. Aerosol process is one of major sources for the PM2.5 formation for all sites as expected. Table 5 shows that the total changes in PM2.5 concentration along the 24-h trajectory due to aerosol process are 4.58, 2.90, 3.10, 3.12 and 2.49 μg m−3 for NN, SD, MD, SAHS and John sites, respectively, contributing to 53, 70, 68, 20 and 24% of total sources of the PM2.5 formation, respectively. Emissions are another significant source contributing to PM2.5 burden. The total changes in PM2.5 concentration along the 24-h trajectory due to emissions are 4.04, 1.25, 1.47, 1.21 and 0.71 μg m−3 for NN, SD, MD, SAHS and John sites, respectively, contributing to 47, 30, 32, 8 and 7% of total sources of the PM2.5 formation, respectively. The aerosol process and primary emissions are major sources for the PM2.5 formation at the Georgia locations. On the other hand, it is of interest to note that the integrated process budgets along the trajectories at the Pennsylvania sites in Table 5 indicate large contributions from cloud processing to PM2.5. In contrast, the trajectories reaching the sites in Georgia are characterized by negligible contributions by the cloud process to PM2.5 formation inn this case. The total changes in PM2.5 concentration along the 24-h trajectory due to cloud process are 6.48 and 7.28 μg m−3 for SAHS and John sites of Pennsylvania, respectively, contributing to 42 and 69% of total sources of the PM2.5 formation, respectively. At the SAHS site of Pennsylvania, SO42−, NH4+ and OCM comprise 80, 6 and 4% of PM2.5, respectively, on average, whereas at the NN site of Georgia, they comprise 58, 13 and 13% of PM2.5, respectively (also see Figures 11c and 11d). This suggests relatively large contribution of SO42− from the cloud process to PM2.5 in these air masses reaching the Pennsylvania sites.
4. Summary and Recommendations
 A detailed evaluation of the real-time forecast of PM2.5, its chemical components and its related precursors by the Eta-CMAQ model has been carried out over the eastern United States by comparing the model results with the observations from a variety of surface monitoring networks and aircraft obtained during the 2004 ICARTT study. The results at the AIRNOW surface sites show that the model was able to reproduce day-to-day variations of observed PM2.5 and captured the majority (>70%) of observed PM2.5 within a factor of 2 with NMB = −21%. The significant underestimation of PM2.5 by a factor of 2 during 16 and 26 July is mainly attributed to inadequate representation of the transport of pollution associated with large forest fire in Alaska. Similar to the results at the AIRNOW sites, the model also generally underestimated PM2.5 at the STN (NMB = −15%) and IMPROVE (NMB = −20%) sites. A closer inspection indicates that the significant underestimations of PM2.5 at the IMPROVE rural sites mainly results from the underestimation of TCM by more than a factor of 2 and OTHER (unspecified anthropogenic mass which represents trace elements in the current CMAQ configuration) components, whereas the underestimation at the STN urban sites mainly results from the TCM underestimation. The underestimations of PM2.5 are typically more pronounced at rural (IMPROVE) than urban (STN) sites because the NH4+ concentrations were underestimated at the rural sites (by −12%) but overestimated at the urban sites (by 21%). On the contrary, the model over estimated the observed SO42− by 15%, 6% and 11% at the STN, CASTNet and IMPROVE sites, respectively. A comparison with the aircraft observations reveals that the model over estimated SO42− on most days aloft relative to P-3 and DC-8 observations over the northeastern United States. The consistent overestimations of SO42− both at the surface sites and aloft, in part, possibly reflect too much in-cloud SO2 oxidation because of overestimations of H2O2 concentrations in the model. The underestimation of NH4+ at the rural sites and aloft may be attributed to the exclusion of some sources of NH3 in the real-time model emission inventory. The large systematic underestimations of NO3− relative to P-3 and DC-8 observation may result from the general overestimations of SO42− since too much H2SO4 in aerosol will leave less NH3 to react with HNO3, leading to less aerosol NO3− if HNO3 is not overestimated. The systematical underestimation of biogenic isoprene (by ∼30%) and terpene (by a factor of 4) suggests that the emission inventory may have been systematically low for both isoprene and terpene emissions. On the other hand, the consistent overestimations of toluene by the model under the different conditions suggest possible overestimation of anthropogenic emissions for toluene in the real-time forecast emission inventory.
 To investigate the details of PM2.5 formation and evolution over the eastern United States, the model process budget analysis using the integrated process rate (IPR) analysis along back trajectories at selected five locations in Pennsylvania and Georgia was carried out. The results show that the dominant processes for PM2.5 formation and sink vary from the site to site. Over the Pennsylvania sites, in addition to aerosol process and primary emission, cloud process made a significant contribution to the PM2.5 formation. In contrast, over the Georgia sites, the cloud process has negligible contribution to PM2.5 formation.
 Given the uncertainties in the photochemical mechanisms, emission inventories, and prognostic model forecasts of meteorological fields and difficulties for real time, the above performance of the Eta-CMAQ forecast model can be expected for PM2.5 during the ICARTT period. Nevertheless, a systematic evaluation of the model with the detailed observations from ICARTT revealed several systematic trends in model errors and biases that need to be addressed to improve the model's forecast skill. The following recommendations for improvements are suggested on the basis of these results: (1)The significant underestimation of PM2.5 by a factor of 2 during events influenced by biomass burning from outside the model domain suggest the need for event-based representation of these emissions in real-time modeling application. (2) The significant underpredictions of organic aerosol by more than a factor of 2 on the basis of results at the IMPROVE rural and STN urban sites highlight the need for improving representation of the secondary organic aerosol formation processes. (3) The significant underpredictions of “OTHER” components on the basis of results at the IMPROVE rural sites highlight the need for improving representation of primary PM2.5 emissions in the rural regions. (4) The systematic overprediction of SO42− at both the surface and aloft in the CB-IV chemical mechanism, which produces too much H2O2 highlight the need for more detailed analysis of gas-phase oxidant chemistry and its impact on in-cloud SO2 oxidation. (5) The systematical underestimation of biogenic isoprene (by ∼30%) and terpene (by a factor of 4) and the consistent overestimations of toluene by the model for the different conditions suggests the need for accurate representation of both biogenic and anthropogenic emissions in real-time application.
 The authors would like to thank R. Pinder, J. Godowitch and the anonymous reviewers for the constructive and very helpful comments that led to a substantial strengthening of the content of the paper. We thank Paula Davidson for programmatic support, Jeff McQueen, Pius Lee, and Marina Tsidulko for collaboration and critical assistance in performing the forecast simulations. We are grateful to the 2004 ICARTT investigators for making their measurement data available. The research presented here was performed under the Memorandum of Understanding between the U.S. Environmental Protection Agency (EPA) and the U.S. Department of Commerce's National Oceanic and Atmospheric Administration (NOAA) and under agreement DW13921548. This work constitutes a contribution to the NOAA Air Quality Program. Although it has been reviewed by EPA and NOAA and approved for publication, it does not necessarily reflect their policies or views.