This study develops probabilistic estimates of ozone (O3) sensitivities to precursor emissions by incorporating uncertainties in photochemical modeling and evaluating model performance based on ground-level observations of O3 and oxides of nitrogen (NOx). Uncertainties in model formulations and input parameters are jointly considered to identify factors that strongly influence O3 concentrations and sensitivities in the Dallas-Fort Worth region in Texas. Weightings based on a Bayesian inference technique and screenings based on model performance and statistical tests of significance are used to generate probabilistic representation of O3 response to emissions and model input parameters. Adjusted (observation-constrained) results favor simulations using the sixth version of the carbon bond chemical mechanism (CB6) and scaled-up emissions of NOx, dampening the overall sensitivity of O3 to NOx and increasing the sensitivity of O3 to volatile organic compounds in the study region. This approach of using observations to adjust and constrain model simulations can provide probabilistic representations of pollutant responsiveness to emission controls that complement the results obtained from deterministic air-quality modeling.
Secondary air pollutants like ozone (O3) are formed as a result of complex nonlinear chemistry between various primary pollutants emitted directly into the atmosphere due to anthropogenic and natural activities. Understanding the responses of ambient pollutant concentrations to emission changes (sensitivity) is therefore crucial for the development of effective pollution abatement strategies. Photochemical models are used to estimate the sensitivity of secondary air pollutants to their precursor emissions, and thus serve as useful tools for determining the amount of emission reduction needed to attain ambient air-quality standards and informing the selection of control strategies.
Models for informing air-quality management are typically run deterministically with a single best-available setting for model formulation and inputs. However, there has been a growing interest in probabilistic representations of model results that account for model uncertainty [Dennis et al., 2010; Hogrefe and Rao, 2001]. Uncertainties in pollutant-emission sensitivity may arise from choices of numerical representations of atmospheric processes such as chemical mechanism, vertical mixing scheme, horizontal transport, and emission model (structural uncertainty), and/or from the values of input parameters such as emission rates, reaction rate constants, boundary conditions, and deposition velocities (parametric uncertainty) [Deguillaume et al., 2008; Fine et al., 2003; Pinder et al., 2009].
Recent work by Digar and Cohan  and Tian et al.  introduced efficient Monte Carlo techniques for characterizing parametric uncertainties in O3 and particulate matter (PM) responses to emission controls. Pinder et al.  jointly considered parametric and structural uncertainties to develop probabilistic estimates of O3 concentrations. However, none of these studies evaluated the relative likelihoods of the various Monte Carlo cases.
Previous work by Bergin and Milford  had shown that a Bayesian inference approach can weight the relative likelihood of each Monte Carlo model formulation based on its performance in simulating observed concentrations, and thus yield probability distributions for predicting the actual values of pollutant-emission sensitivities as well as model inputs. That study used a simplified two-dimensional trajectory model, and only a handful of studies have applied Bayesian Monte Carlo approaches to characterize O3 responsiveness in more computationally intensive three-dimensional regional models [Beekmann and Derognat, 2003; Deguillaume et al., 2008].
The aim of this study is to develop probabilistic representations of O3 responsiveness to emission changes constrained by actual measurements of pollutant concentrations. The Monte Carlo Reduced Form Model (RFM) approach of Digar and Cohan  has been used to generate a large ensemble of model predictions of O3 concentrations and responsiveness to emission controls in the Dallas-Fort Worth (DFW) region of Texas, which is currently a nonattainment area for the 1997 eight hour O3 National Ambient Air Quality Standard (NAAQS). The simulated concentrations of O3 and its precursor nitrogen oxides (NOx ≡ NO and NO2) are compared against observations to yield adjusted (observation-constrained) probabilistic representations of photochemical model inputs and output predictions. Use of both Bayesian and non-Bayesian statistical techniques allows us to evaluate the consistency of our results across various observational metrics and methods of comparison. Sections 2 and 3 describe the modeling and measurements used for this work, and section 4 describes the statistical methodology and metrics considered here. Important findings are elaborated in Results and Discussion (section 5), followed by the Conclusion.
2 Photochemical Model Description
2.1 Base Case Modeling
The Comprehensive Air Quality Model with Extensions (CAMx) v5.32 [ENVIRON, 2010] is used here to study a 2006 summer episode in DFW spanning from 31 May to 2 July, which includes numerous (17) days with meteorological conditions favoring O3 formation. This period was identified by the Texas Commission on Environmental Quality (TCEQ) based on its prevalence of observed 8 hour daily maximum O3 concentrations exceeding the 8-hour O3 1997 NAAQS of 84 ppb [Texas Commission on Environmental Quality (TCEQ), 2011a, 2011b]. Results for the first 5 days were neglected for model initialization. Sensitivity of O3 to its precursor emissions is computed using the high-order decoupled direct method [Dunker, 1984; Hakami et al., 2003] within the CAMx model. The modeling domain covers 69 × 67 grids in the Eastern United States with a horizontal grid resolution of 36 km, encompassing nested finer domains of 12 (East Texas) and 4 km (DFW subdomain) spatial grid resolution (Figure 1). Back trajectory analysis shows southerly and easterly flow into the DFW region on a majority of episode days, with air flow into the DFW region coming overwhelmingly from within the 36 km model domain. The vertical configuration for the model domain consists of 28 layers of varying thickness, sufficient to examine the effect of vertical mixing within the typical planetary boundary layer height (for details, see Table 2-2 of TCEQ, 2011a, Appendix C).
The CAMx model inputs (emissions, meteorological conditions, initial and boundary concentrations, chemical mechanism, and deposition scheme) were taken from the TCEQ's Base Case Modeling for the 8 hour O3 State Implementation Plan (SIP) in DFW [TCEQ, 2011a]. The mobile emission (on-road and nonroad) inputs were obtained from the U.S. Environmental Protection Agency (EPA) MOBILE6.2 emission factor model, EPA's National Mobile Inventory Model (NMIM), and the Texas NONROAD (TexN) mobile source models, and were processed to a model-ready format by the Emissions Processing System version 3 (EPS3) [ENVIRON, 2007]. Base case biogenic emissions were derived from the Global Biosphere Emissions and Interactions System (GloBEIS3.13.1) model [Yarwood et al., 1999]. The Fifth Generation Meteorological Model (MM5 version 3.7.4) [Dudhia, 1993] was used to generate the meteorological inputs to CAMx including wind speed, wind direction, temperature, humidity, and so on (C. Emery et al., MM5 Q21 meteorological modeling of Texas for June 2006. Final report prepared for Texas Commission on Environmental Quality, unpublished report, 2009, http://www.tceq.texas.gov/assets/public/implementation/air/am/contracts/reports/mm/5820783986FY0802_June2006MM5_Final.pdf). [Details regarding the meteorological and emission modeling and their performance evaluations can be found in Appendixes A and B of TCEQ, 2011a]. Specifically, TCEQ found that benchmarks for error in wind direction (≤30°), wind speed (≤2 m/sec), and temperature (≤2 °C) were achieved 92%, 99%, and 92% of the time for the Dallas region. The base case model uses the Carbon Bond version 05 (CB05) chemical mechanism [Yarwood et al., 2005], a dry deposition scheme based on the works of Wesely  and Slinn and Slinn , and the global Model for Ozone and Related Chemical Tracers (MOZART) to generate episode-specific boundary condition concentrations for the coarse-grid (36 km) modeling domain [ENVIRON, 2008].
2.2 Model Uncertainty Scenarios
This study jointly considers uncertainties in both model formulation (structural uncertainties) and in model input parameters (parametric uncertainties).
2.2.1 Structural Scenarios
Past studies have shown that alternate chemistry and emission models strongly influence photochemical sensitivities [Pinder et al., 2009; Fine et al., 2003; Russell and Dennis, 2000; Bergin et al., 1999]. Uncertainty in the representation of dry deposition can also impact O3 modeling [Mallet and Sportisse, 2006]. TCEQ had previously funded work that showed that boundary conditions from alternate global models could significantly influence background O3 concentrations, but had not explored the impact on O3 sensitivities [ENVIRON, 2009]. Therefore, structural scenarios were constructed by choosing either the Base Case setting explained earlier (section 2.1) or the alternate setting described later for each of four features: chemical mechanism, biogenic emissions model, dry deposition scheme, and boundary conditions model. Although additional structural uncertainties, most notably in the meteorological model, also influence O3 sensitivities, alternate meteorological simulations and other inputs were not available within the scope of this study.
126.96.36.199 Alternate Chemical Mechanism
In this setting, the 2005 version of the Carbon Bond chemical mechanism (CB05) in the base model is replaced by the sixth version (CB6) [Yarwood et al., 2010]. In CB6, several long-lived, abundant organic compounds, namely, propane, acetone, benzene, and ethyne (acetylene), are added explicitly to improve oxidant formation from these compounds as they are oxidized slowly at the regional scale. Compared to the CB05 mechanism, CB6 increases the number of model species (from 51 to 76) and the number of reactions (from 156 to 218). We adjust the rate constant for the reaction (OH + NO2) in CB6 to be consistent with the most recent findings of Mollner et al.  (CB6 also includes several updates for organic and inorganic aerosol chemistry). Detailed discussion of the differences between CB05 and CB6 is provided by (D. S. Cohan et al., Factors influencing ozone-precursor response in Texas attainment modeling, final report, Texas Air Quality Research Program, Project 10-008, 2011, unpublished report, http://aqrp.ceer.utexas.edu/projectinfo%5C10-008%5C10-008%20Final%20Report.pdf.)
188.8.131.52 Alternate Biogenic Emissions
The GloBEIS-derived biogenics inventory is replaced by alternate biogenic emissions (BIO) from the Model of Emissions of Gases and Aerosols from Nature (MEGAN) [Guenther et al., 2006], which uses updated land cover data based on satellite and ground observations. Guenther et al.  reports that the global annual isoprene emission, as estimated by MEGAN, approximately ranges from 500 to 750 Tg. Strong differences (about a factor of 2) between biogenic emission estimates from BEIS and MEGAN have been documented by Carlton and Baker . For the 12 km CAMx modeling domain overall, MEGAN estimated 47% lower biogenic NOx emissions (ENOx) and 24% higher biogenic nonmethane volatile organic compound (NMVOC) emissions than GloBEIS (for detailed differences, see Cohan et al., unpublished report, 2011). Within the DFW region, both models estimated only about 10 tpd of biogenic ENOx, but MEGAN estimated twice as much biogenic NMVOC as GloBEIS.
184.108.40.206 Alternate Dry Deposition Scheme
The original base case that uses land-use inputs and a dry deposition scheme (DEP) based on the work of Wesely  and Slinn and Slinn  is replaced here by an updated approach [Zhang et al., 2001, 2003]. The Zhang scheme incorporates vegetation density effects via leaf area index, possesses an updated representation of nonstomatal deposition pathways, has more land-use categories, and has been tested extensively through its use in daily air-quality forecasting.
220.127.116.11 Alternate Boundary Conditions
Here, the MOZART boundary conditions used in the base case model are replaced by alternate boundary conditions (BC) from the GEOS-Chem global model [Bey et al., 2001] that exhibit higher O3 concentrations (0.7–8 ppb) than MOZART at all model layers and the differences increase aloft (Figure 2) [Cohan et al., unpublished report, 2011].
2.2.2 Parametric Uncertainties
For parametric uncertainties, we target the model input parameters identified by Digar and Cohan  as most likely to influence model predictions of O3 concentrations and their sensitivities to NOx and volatile organic compound (VOC) emission controls. These parameters include specific emission rates, reaction rate constants, and boundary conditions (Table 1). The uncertainty estimates were derived from an extensive literature review of experimental and model-based studies. In particular, for the uncertain reaction rate of NO2 + OH, we used a factor of uncertainty given by National Aeronautics and Space Administration's (NASA's) Jet Propulsion Laboratory (JPL) report [Sander et al., 2006] that compiles findings from multiple laboratory-based experiments. For the rest of the input parameters, we applied uncertainties based on the findings by Deguillaume et al.  and Hanna et al.  that compute uncertainty based on Monte Carlo simulations of air-quality models at different ranges of input perturbations, predecided (a priori) based on expert elicitation. Deguillaume et al. [2007, 2008] also used Bayesian analysis to constrain the a priori input distribution by comparing the Monte Carlo outputs with actual measurements.
Table 1. Screening Test for the Selection of Uncertain Input Parameters
Parameters selected based on the impact analysis by Digar and Cohan  and Digar et al. .
All distributions are assumed lognormal.
Impact factor: The fractional change in concentrations and first-order sensitivity of ozone to emissions, due to a 1σ change in an input parameter as detailed in section 4.1.2. Uncertainty factors are based on ±2σ (i.e., 95%) confidence interval. Underlined terms were chosen for the parametric Monte Carlo sampling.
Sections 4.1.1 and 4.1.2 describe additional screening that was conducted to further narrow the structural cases and input parameters that most influence O3 concentrations and sensitivities for the episode considered here.
3 Ground-Level Measurements of Ozone and Its Precursors
Measurement data were obtained from the U.S. Environmental Protection Agency's (EPA's) Air Quality System (AQS) database for ground-level concentrations of O3 and NOx. These monitors record hourly concentrations of ambient air pollutants through a nationwide monitoring network (http://www.epa.gov/ttn/airs/airsaqs/index.htm); the monitors in Texas are operated by TCEQ. The raw data were then postprocessed to obtain daily maximum 8 hour O3 and 24 hour average NOx concentrations at all the monitors that fall within the nine-county DFW nonattainment area (based on 1997 8-hour O3 NAAQS): Denton, Collin, Parker, Tarrant, Dallas, Rockwall, Kaufman, Johnson, and Ellis Counties. We considered 11 monitors that measure both O3 and NOx concentrations (Figure 1).
Measurements of O3 are conducted by well-established techniques; thus, instrumental error is relatively small [EPA, 2006]. However, due to lack of direct measurement technique for nitrogen dioxide (NO2), NOx measurements tend to have significant instrumental bias and monitor interferences [Demerjian, 2000; Dunlea et al., 2007]. NOx concentrations are, therefore, bias-corrected for interference with other nitrogen species. We apply a bias-correction factor (β) adapted from Lamsal et al.  computed using modeled species concentrations to correct reported NOx observations:
where PAN is peroxy acetyl nitrate and PNA is peroxy nitric acid. The factor β was computed for each monitor based on the episode average of the daily 24 hour mean concentration of the modeled NOy (= NOx +HNO3 + PAN +HONO+N2O5) species.
4.1 Model Uncertainty Analysis
This section details the methodology adopted for incorporating structural and parametric uncertainties in the photochemical air-quality modeling.
4.1.1 Screening for Structural Uncertainty
To assess the effect of model structural uncertainty, we first run the photochemical model with the base-case scenario (BASE) and then with each of the alternate assumptions of atmospheric processes detailed in section 2.2.1., which include alternate chemical mechanism (CHEM), biogenic emission inventory (BIO), dry deposition scheme (DEP), and boundary conditions (BC). Figure 3 shows how the diurnal patterns of DFW O3 sensitivities to DFW anthropogenic emissions change with each of these different model assumptions. Here, “sensitivity” denotes the seminormalized local first-order sensitivity coefficient that measures the responsiveness of concentrations (C) to fractional perturbation in precursor j, where Pj is the unperturbed input value (base), Pj is new perturbed parameter value, and φj is a scaling variable with a nominal value of 1 [Cohan et al., 2005]. The unit of sensitivity is, therefore, the same as that of concentrations. Afternoon O3 in DFW is primarily NOx limited in all of the structural cases, with O3 about an order of magnitude more sensitive to DFW anthropogenic NOx (ANOx) than anthropogenic VOC (AVOC). In general, use of MEGAN biogenic emission increases O3 sensitivities to ANOx () and decreases sensitivity to AVOC (SVOC) relative to the base case during daytime because of its stronger biogenic VOC emissions (EBVOC). The alternate CB6 chemical mechanism also affected daytime O3 sensitivities but in the opposite direction, yielding stronger sensitivities to AVOC, though conditions remain predominantly NOx sensitive under either structural configuration. The alternate BC case did not significantly affect O3 sensitivities, and DEP affected sensitivities mostly during night.
To select the most important structural factors that influence predictions of O3 concentrations, we compare each structural scenario against the observations. For screening the factors that most strongly affect O3 sensitivities, we compare each alternate scenario against the base-case simulation results. The statistical measures that serve as the bases for the comparisons are as follows:
where N is the number of observations (site/days), and Yj denotes concentrations or sensitivities from each of the model structural cases considered earlier. For comparisons of concentrations, Oj represents the observations; for comparisons of sensitivities, Oj represents the base-case simulation results.
The comparison results (Table 2) show that alternate chemical mechanism (CB6 vs. CB05) and biogenic model (MEGAN vs. GloBEIS) most strongly influence the predicted O3 concentrations and sensitivities. Therefore, we build an ensemble of models with the following structural members: (1) BASE, (2) CHEM, (3) BIO, and (4) a combination of alternate chemical mechanism (CB6) and biogenics (MEGAN) (hereafter abbreviated as CHEM + BIO). Figure 4 shows the spatial plots for O3 sensitivities to each of these four structural members. NOx-limited conditions for daily maximum 8 hour O3 persist even in the urban center regardless of which structural scenario is considered.
Table 2. Screening Test for the Selection of Uncertain Model Structural Assumptionsa
ANOx, anthropogenic NOx; AVOC, anthropogenic volatile organic compound; BC, boundary condition; BIO, alternate biogenic emissions; CHEM, alternate chemical mechanism; DEP, dry deposition scheme; DFW, Dallas-Fort Worth; NMB, normalized mean bias; NME, normalized mean error; RMS, root mean square.
Comparison of each structural case against the observations for 8 hour O3 concentration in DFW
Comparison of each alternate case against the base case for 8 hour DFW O3 sensitivity to DFW ANOx
Comparison of each alternate case against the base case for 8-hour DFW O3 sensitivity to DFW AVOC
4.1.2 Screening for Parametric Uncertainty
Uncertainties in input parameters (parametric uncertainties) are characterized by Monte Carlo analysis, where values of input parameters are selected randomly from the probability distribution assumed for each input based on their standard deviations. For computational efficiency, we use an RFM to compute adjusted concentrations (C*) and sensitivities () based on the uncertainties in input parameters using the relationships given by Cohan et al.  and Digar and Cohan ,
where C0 is the concentration modeled under default setting of the parameters, and φj and φk are the perturbations in parameters j and k, respectively. and denote seminormalized first- and second-order sensitivities of concentrations to the parameter j. denotes cross-sensitivity between two input parameters j and k. In the RFMs, the value of each φ is restricted to within a 2-sigma range for that parameter to avoid extreme values of input parameters.
As discussed in section 2.2, we use a suite of uncertain model input parameters listed in Table 1. Each parameter was assumed to have a lognormal probability distribution, characterized by the uncertainty value (1σ) reported in Table 1. To screen parameters that strongly influence O3 concentrations and sensitivity to emissions, we perform an impact analysis where relevant “impact factors” were evaluated as follows:
Although there was considerable overlap in the selected parameters, there were also some differences in those found to have a greater impact on concentrations and the two sensitivities (Table 1). Domain-wide ENOx and EBVOC, photolysis rates (hν), and the reaction rate constants R(NO2 + OH) and R(NO + O3) significantly impacted all three categories. Meanwhile, boundary conditions (BC) of NOy were not major influences on any of the results. However, the BC(O3) parameter significantly impacted concentrations and sensitivity to VOC, but not to NOx, whereas anthropogenic VOC emissions (EAVOC) impacted sensitivities, but not concentrations.
4.1.3 Joint Consideration of Structural and Parametric Uncertainty
We construct an ensemble consisting of the four targeted structural members based on the screening test in Table 2 (BASE, CHEM, BIO, and CHEM + BIO), each coupled with 1000 Monte Carlo samplings from the probability distributions for the selected model input parameters underlined in Table 1. Total sample size of the final ensemble was, therefore, 4000. The final set of parametric factors considered in this study is summarized as follows:
For O3 concentration: ENOx, EBVOCs, photolysis frequencies, R(NO2 + OH), R(NO + O3), and BC(O3)
For O3 sensitivity to ANOx emissions: ENOx, EBVOCs, EAVOC, photolysis frequencies, R(NO2 + OH), R(NO + O3), and R(all VOCs + OH)
For O3 sensitivity to EAVOC: ENOx, EBVOCs, EAVOC, photolysis frequencies, R(NO2 + OH), R(NO + O3), R(all VOCs + OH), and BC(O3).
4.2 Constraining Model Predictions Using Measurements
A key limitation of the traditional Monte Carlo analysis of the model ensemble [e.g., Pinder et al., 2009; Digar et al., 2011] is the assumption that each of the cases is equally likely. This study uses actual observations to prioritize cases that show good agreement with measured concentrations over those that do not perform well. Figure 5 shows the framework of the observation-constrained Monte Carlo analysis. Concentration estimates from each of the 4000 simulations are compared with actual measurements at the monitors to evaluate the adjusted (observation-constrained) probability distribution of the ensemble. Various techniques are used to weight (Bayesian) or screen (model performance and hypothesis testing) the best-performing model cases to characterize adjusted probability distributions of pollutant concentrations and sensitivities. The methods and observation metrics used in this study are elaborated below.
4.2.1 Metric 1: Bayesian Analysis
A Bayesian inference approach [Bergin and Milford, 2000; Deguillaume et al., 2007] is applied to assign relative weightings to each case based on its performance in simulating observed O3 and NOx. For evaluating the likelihood of model prediction () for the mth simulation of the nth observation (n = 1, 2, …, N, where N denotes total number of observations), a Gaussian likelihood function is used (as defined by Bergin and Milford, 2000). Errors (θ) in the observed O3 and NOx concentrations are assumed to be independent and normally distributed with mean zero. The likelihood of model prediction given observation On can be expressed as
The total likelihood for simulation m given all observations of a species can then be computed by the product of its likelihoods for individual observations. Errors in the observed O3 and NOx concentrations are assumed to be independent; thus, . is computed separately for O3 and NOx, then multiplied together to get the overall likelihood based on both species. Finally, Bayes theorem is applied to compute the a posteriori probability distribution (p′) based on the a priori probabilities and the likelihoods computed above.
The mean (μ′) and standard deviation (σ′) of the resulting posterior ensemble distribution can be computed by
where Yj denotes the jth value of the simulation and denotes the respective posterior probability for that iteration [obtained from equation ((10))], and M is the total size of the ensemble (= 4000).
The observation metric chosen for the Bayesian analysis is highly aggregated, as were the metrics used by Bergin and Milford  and Deguillaume et al. . Here, episode averages of the daily 8 hour O3 and of the 24 hour NOx concentrations at each of the 11 monitors were considered (N = 11). The consideration of episode-average concentrations on a site-by-site basis tests the ability of each model case to simulate overall levels and spatial patterns in O3 and NOx, even if errors in simulating meteorology or emissions variability may have obscured day-to-day comparisons. The errors in the observed episode averages are assumed to be independent across space because we do not expect calibration or representativeness errors to be spatially correlated at these temporal scales.
Errors and uncertainty in applying measurement data to evaluate model results can arise from instrumental error and from the use of a point measurement to represent a model grid-cell average concentration. The resulting uncertainty can be quantified jointly by examining the variability between pollutant concentrations measured by multiple monitors within the same grid cell. Analysis using 5 years (centered on our base case model year 2006, i.e., 2004–2008) of data for the summer O3 season (May to September) showed that the error (θ) characterizing the standard deviation of differences between observed 8 hour O3 values at three pairs of sites falling in the same grid location ranged from 3.0 to 7.2 ppb; for bias-corrected 24 hour NOx observations, θ ranged from 2.2 to 8.2 ppb. Because these estimates are based on a limited number of site pairs, to be conservative, we choose the maximum values of these ranges (i.e., θ = 7.2 and 8.2 ppb for 8 hour O3 and 24 hour NOx, respectively).
4.2.2 Metric 2: Screening Based on Model Performance
An alternate approach to developing observation-constrained distributions is to retain only cases that meet specified performance criteria [e.g., Mallet and Sportisse, 2006]. Because the base modeling used here was developed for an SIP attainment plan, we formulate a new metric [metric 2 (M2)] that screens the 4000 cases based on the three-model performance evaluation criteria recommended by the EPA [1999, 2007] for determining the acceptability of an O3 SIP model (Table 3). This metric uses all available valid observations of daily 8 hour O3 at each monitor (N = 289). Mean normalized bias (MNB) and mean normalized gross error (MNGE) were computed for model results (Model) when O3 observations (Obs) were greater than the recommended threshold of 60 ppb [EPA, 2006]. The screened cases were assigned equal weights to develop the adjusted (observation-constrained) distribution.
Table 3. Statistics for Evaluating Model Performance in Metric 2a
Mean normalized gross error (MNGE) and mean normalized bias (MNB) were computed for model results (Model) when O3 observations (Obs) were greater than the recommended threshold of 60 ppb [EPA, 2006].
4.2.3 Metric 3: Screening Based on Nonparametric Test
Statistical nonparametric tests of significance like the Cramér-von Mises (CvM) criterion and the Kolmogorov-Smirnov test have been used to test for general differences in predicted and observed distributions of air-quality data [Holland and Fitz-Simons, 1982; Taylor et al., 1987]. The CvM criterion [Anderson, 1962] provides a nonparametric test of the null hypothesis (H0) that two samples are drawn from the same (unspecified) distribution. In the CvM two-sample test, the test statistics T is computed as follows:
where FA(x) and GB(y) are the empirical distribution functions of the two samples x = x1, x2, …, xA (representing model predictions) and y = y1, y2, …, yB (representing observations) of size A and B, respectively. Note that GB(xi) denotes the relative frequency that the observed concentration is at most xi (i.e., sum of all the elements in the sample less than xi, divided by the sample size B), and FA(yj) denotes the relative frequency that the modeled concentration is at most yj.
The null hypothesis is rejected when T is large, indicating that the two samples are significantly different. The advantage of this method is that it assesses whether there are any differences in the modeled and observed probability distributions, not just differences in the means of the two samples (e.g., differences in the variance and/or the tail of the samples). Note that the CvM criterion does not pair observations in time and space, but instead indicates whether the distribution of model predictions is consistent with the distribution of observations. For our case, the two samples represent the modeled and observed distribution of pollutant concentrations, and the sample size for the two distributions are equal here (i.e., A = B = N, where N denotes total number of observations). Therefore, equation ((14a)) reduces to the form
The test statistic T is computed for each of the 4000 members of the model ensemble, separately for available 8 hour O3 (N = 289) and 24 hour NOx (N = 303) concentrations using equation ((14b)). Next, we compute the p value associated with each test statistic (T), defined as the probability of observing a test statistic greater than or equal to T, if H0 is true. A small T will result in a large p value, indicating that there is not sufficient evidence to reject the null hypothesis (H0). Screening is then applied to select Monte Carlo cases that generate p values greater than the 10% significance level, i.e., α = 0.1, below which we reject the null hypothesis. We select only those cases that satisfy this test for both of the observational constraints (O3 and NOx).
4.3 Adjusted Ozone Sensitivity
To characterize adjusted O3 response to emission changes, we use the RFM given in equation ((7)) to generate the a priori (equal-weighted) probability density of O3 sensitivity to any emission j for each of the 4 structural cases based on the 1000 samplings of input parameters k. Because pollutant sensitivities cannot be directly evaluated, the observation-constrained O3 sensitivities for the full ensemble (all 4000 cases) are estimated based on the model's performance in reproducing observed concentrations. Therefore, for metric 1 (M1), we assume that the a posteriori probabilities estimated for O3 concentrations by equation ((11)) can also be applied to adjust the a priori probability distribution of O3 sensitivities; for M2 and metric 3(M3), we assign equal probability to sensitivities from each simulation that passed the respective screening test, and zero to the remaining cases.
5 Results and Discussion
In this section, results for input parameter values, O3 concentrations, and sensitivities to emissions are presented to show how the adjusted (observation-constrained) probability distributions generated by application of the three observational metrics differ from the a priori (equal-weighted) distribution. The evaluation of the quality of the final three adjusted model ensembles has been discussed elsewhere [Digar, 2012].
Application of M1 (Bayesian weightings) to our ensemble of 4000 simulations assigns half of the total weight to the 496 best-performing model simulations. Most of the spread in weightings results from evaluation against O3 observations rather than against NOx observations; however, the multiplication of weightings by equation ((10)) leads the joint weightings to differ substantially from those that would have resulted from considering O3 alone (Table 4).
Table 4. Observation-Constrained Probability of the Structural Ensemble Membersa
M2 screened 1134 cases that satisfied all three of the EPA's recommended model performance criteria detailed in Table 3. This selection was mainly restricted by the bias term (MNB), which was satisfied by 1137 cases. The other two criteria, namely, the unpaired peak accuracy and MNGE, selected nearly all of the 4000 cases, rejecting only 15% and 1% of cases, respectively. M3, which selects cases based on the CvM two-sample test, selects 766 model cases that satisfy the test for both O3 and NOx observations. Screening based on O3 or NOx observations alone would have selected 1003 and 2457 cases, respectively.
Accuracy of the ensemble-mean prediction is tested by evaluating the normalized mean bias, the normalized mean error (NME), the correlation, and the regression coefficients of the ensemble mean with 8 hour O3 observations for all sites and days (N = 289) in the DFW region (Table 5). As expected, the model performance improves when the ensemble is constrained based on the observations. All the observational metrics help to minimize the model bias and error, and to some extent increase the overall correlation and regression (Table 5). The base-case model underpredicts O3 concentrations by 6%. The non-Bayesian metrics, on the other hand, tend to slightly overpredict O3 (M2 by 4.5% and M3 by 1%), although they reduce the overall error by 11%.
Table 5. Performance of the Base Model and Observation-Constrained Model Ensemble Means against Observed 8 Hour O3 at All Sites and Days in Dallas-Fort Wortha
NMB, normalized mean bias; NME, normalized mean error.
To further evaluate the performance of the ensemble in simulating episode-average conditions (similar to the scenario used in M1) at a given location, results for observation-constrained O3 concentrations are probed for the 11 DFW monitors (Table 6). In an effort to correct the overall underprediction of the base model, the ensemble weightings increased the mean O3 concentration at all the sites. As a result, the posterior adjustments significantly improved the prediction accuracy for monitors that had greater negative bias in the base-case modeling scenario (Table 6).
Table 6. Comparison of Observed and Modeled Episode-Average 8 Hour Ozone Concentrations (ppb) at Dallas-Fort Worth Sitesa
2004–2006 O3 Design Value
Base Model O3
Prior Ensemble (μ ± σ)
Observation-Constrained Ensemble (μ ± σ)
μ and σ denote mean and standard deviation, respectively.
65.51 ± 7.33
65.53 ± 2.16
69.04 ± 2.03
68.85 ± 1.87
66.62 ± 7.21
66.76 ± 2.21
70.02 ± 2.17
69.98 ± 2.03
66.10 ± 7.17
66.23 ± 2.22
69.50 ± 2.17
69.43 ± 2.01
64.09 ± 7.00
64.25 ± 2.11
67.42 ± 1.99
67.36 ± 1.84
62.73 ± 7.07
62.75 ± 2.09
66.13 ± 1.91
65.94 ± 1.74
65.12 ± 7.27
65.12 ± 2.17
68.61 ± 2.01
68.43 ± 1.86
63.08 ± 6.75
63.33 ± 2.09
66.27 ± 2.04
66.26 ± 1.88
59.77 ± 6.76
59.77 ± 2.00
63.01 ± 1.91
62.77 ± 1.75
59.23 ± 6.50
59.35 ± 1.94
62.32 ± 1.82
62.19 ± 1.71
57.04 ± 6.70
56.96 ± 2.10
60.29 ± 2.11
59.96 ± 1.79
58.31 ± 6.88
58.20 ± 2.14
61.64 ± 2.14
61.29 ± 1.81
Detailed comparisons are illustrated for the Denton monitor (DENT), which recorded the highest 8 hour O3 design values among all the DFW sites in 2006. Figure 6a shows the probability density functions (PDFs) of episode-average O3 concentrations at Denton. The blue curve in Figure 6a depicts the a priori (equal-weighted) probability density. The other solid curves show the final observation-constrained distributions resulting from joint consideration of the full 4000 case ensemble under the 3 observational metrics. The deterministic model (BASE) underpredicts (62.0 ppb) the episode-average daily 8 hour O3 observation of 70.1 ppb at Denton during the study period. The a priori equal-weighted ensemble predicts a mean concentration of 65.5 ppb with a standard deviation of 7.3 ppb (Table 6). Application of each of the 3 metrics narrowed the spread of the ensemble predictions, as can be seen by the curves in Figure 6a and the smaller standard deviations (~2 ppb) in Table 6, indicating greater confidence in the ensemble. M2 and M3 yielded ensemble-mean predictions of episode-average O3 (69.04 and 68.85 ppb, respectively) that more closely matched observations at Denton. Testing showed that withholding a monitor from the observations used to constrain the ensembles did not substantially alter the posterior results at that monitor (Figure 6b).
Although each of these metrics uses different criteria and methods for comparing pollutant concentrations, they each yield similar allocation of adjusted probabilities among the structural scenarios (Table 4). For the study region and episode, application of each metric tends to prioritize model cases that use the CB6 chemical mechanism. For example, under M1, 384 of the 496 highest-weighted cases used CB6, lending to 64% of overall weight being placed on the CHEM and CHEM + BIO scenarios (Table 4). The CHEM and CHEM + BIO scenarios were also favored relative to their CB05 counterparts by M2 and M3. The metrics do not show a consistent preference between the MEGAN and GloBEIS biogenic inventories.
Application of the three metrics also generated observation-constrained probability distributions for the scaling factors (1+φ) for the model input parameters listed in Table 1. Figure 7 shows the PDFs for some of the key parameters. The a priori PDFs are derived from the 1000 Monte Carlo cases randomly sampled from the truncated lognormal probability distributions assumed for each input parameter, and the adjusted PDFs are generated by applying the same weightings (M1) and screenings (M2 and M3) used for constraining O3 concentrations. No significant differences were observed in the a priori and observation-constrained distributions of model input parameters, except for ENOx. Adjustment under M2 and M3 preferred slightly higher levels of NOx emission, as indicated by the positive shifts in the adjusted PDFs in Figure 7a. M1 favored ENOx levels close to the original values.
We now examine how the relative sensitivities of O3 to DFW ANOx () and to DFW AVOC (SVOC) change when the observational constraints are considered. Although results have been presented here for Denton, similar trends were observed for the other sites in DFW as well. The base-case model (without incorporating uncertainties) predicts that at Denton, is 6.56 ppb and SVOC is 0.83 ppb, indicating that DFW ANOx controls are approximately 7.9 times as effective per ton as DFW AVOC controls for reducing episode-average 8 hour O3 concentrations (Figure 8). The equal-weighted a priori ensemble yields a distribution of O3 sensitivity results and indicates 93% likelihood that O3 is more sensitive to DFW ANOx than to DFW AVOC, but a 2.3% chance that reducing local ANOx emission may actually increase episode-average 8 hour O3 concentrations in the region. A sharp negative correlation is observed between O3 sensitivities to NOx and VOC, which leads to a large variability in the ratio of these two sensitivities. This reflects the tendency of changes in model inputs to push the O3 formation regime toward being more NOx limited or more VOC limited, and hence less sensitive to the other precursor.
The observational metrics also yield adjusted distributions of O3 sensitivity to DFW ANOx and AVOC emissions. M1 does not substantially change the mean estimate; however, applications of M2 and M3 shift O3 sensitivity toward slightly higher SVOC and slightly lower than in the equal-weighted ensemble (Figure 9 and Table 7). This is also seen in the shift toward lower values of under M2 and M3, even as predictions remain primarily NOx limited (Figure 9). This is because most of the cases accepted by the M2 and M3 screenings used the alternate (CB6) chemical mechanism and higher domain-wide ENOx (Table 4 and Figure 7), each of which makes O3 slightly more sensitive to VOC compared to NOx (Figures 3 and 4). M1 favored cases with CB6 (Table 4) but gave low weightings to cases with high ENOx (Figure 7).
Table 7. Comparison of Prior and Observation-Constrained Predictions of Episode-Average Sensitivities of 8 Hour Ozone at Denton to Dallas-Fort Worth Emissionsa
(μ ± σ)
(μ ± σ)
μ and σ denote mean and standard deviation, respectively.
6.79 ± 2.59
1.09 ± 0.81
6.98 ± 2.19
1.03 ± 0.54
6.67 ± 3.01
1.35 ± 0.74
6.49 ± 2.83
1.28 ± 0.69
In this study, measurements of O3 and NOx have been used to adjust probabilistic estimates of O3 concentrations and O3 responsiveness to NOx and VOC emission changes in the DFW region. Three distinct observation-based approaches have been applied to weight or screen an ensemble of model simulations that employ alternate model assumptions (structural uncertainty) and model input values (parametric uncertainty).
Screening analysis of structural uncertainties led to a focus on scenarios involving alternate choices for the biogenic emissions model and chemical mechanism. However, alternate meteorological scenarios were not available for this study, an important limitation that could be explored in further research. The omission of some key structural uncertainties such as meteorology led the ensemble spread to be too narrow (underdispersive) in simulating observed concentrations [Cohan et al., unpublished report, 2011], and may lead to errors in assessing the true values of the uncertain inputs considered here. For parametric uncertainties, impact analysis identified the specific emission rates, reaction rate constants, and boundary conditions that most influence O3 concentrations and their sensitivities to NOx and VOC emissions. Some parameters such as O3 boundary conditions were found to impact concentrations far more strongly than sensitivities, whereas the converse was true for some other parameters such as EAVOC.
Traditional Monte Carlo analysis of uncertain inputs or model ensembles yields probabilistic (a priori) estimates of model outputs but assumes that each of the scenarios is equally likely. This article has explored three of the many Bayesian and non-Bayesian approaches that could be used to adjust these a priori estimates by evaluating each case against observations. All three metrics tend to favor the CB6 chemical mechanism over CB05 for this region and episode, and two of the metrics favor scaling up NOx emission rates. These resulted in enhanced O3 responsiveness to VOC emission and dampened sensitivity to NOx, although the region still remained predominantly NOx limited.
The Bayesian and non-Bayesian metrics introduced here are just three of the many that could be chosen for observation-constrained analysis, and each has its own strengths and weaknesses. Non-Bayesian M2 and M3 use more observed data to prioritize simulations compared to Bayesian M1, but disregard simulations that fail to meet the test criteria. The Bayesian metric, on the other hand, retains all cases but with varying weights based on their likelihood of estimating actual measurements. This metric, however, relies on the questionable assumption that the measurement errors are statistically independent and normally distributed. Alternate Bayesian metrics could be developed to avert this assumption or to use less aggregated observational data. M2 has the practical advantage of mimicking EPA's standard test criteria for acceptable attainment demonstration modeling, but there is little conceptual basis for treating all acceptable cases equally or excluding all others.
A key assumption of this study is that performance of a model case in simulating observed concentrations provides an indicator for the reliability of the input choices and output sensitivity predictions associated with that case. Because ambient monitors observe concentrations but not sensitivities, this assumption is both necessary and yet unverifiable. Dynamic evaluation of how pollutant concentrations respond to emission changes over weekly (i.e., weekday vs. weekend) or interannual (e.g., before and after a major emission trend) time scales can provide a proxy for ground-truthing sensitivity estimates [Dennis et al., 2010; Gilliland et al., 2008; Pierce et al., 2010; Yarwood et al., 2003].
Methods applied in this study could readily be extended to consider other regions (e.g., larger domains to include both urban and rural settings), episodes, and observational metrics. In particular, more disaggregated observations could be considered for the Bayesian metric; doing so would more fully capitalize on the spatial and temporal specificity of available data, but also tends to yield vastly different weightings among similar model cases due to the multiplicative nature of the Bayesian likelihood function [Cohan et al., unpublished report, 2011]. Future work could also consider observations taken aloft by aircraft, sondes, and satellites [Henderson et al., 2012; Yang et al., 2010]. Other model constraining methods such as Bayesian model averaging [Raftery et al., 2005] may be explored to consider errors in both the model and the measurements. Additional structural uncertainties such as use of alternate meteorological inputs or model formulations could expand the ensemble considered here.
The preparation of this manuscript is based on work supported by the State of Texas through the Air Quality Research Program administered by The University of Texas at Austin by means of a grant from the TCEQ and by National Science Foundation grant #087386. Although this article has been reviewed by the EPA and TCEQ and approved for publication, it does not necessarily reflect the policies or views of either agency. The baseline modeling for the study was provided by TCEQ.