Abstract
 Top of page
 Abstract
 1. Introduction
 2. Methodology
 3. Cumulative Probability Distributions of CO and O_{3}
 4. LatitudeHeight Distributions of O_{3} From the Lidar Sampling
 5. Probability Density Function in COO_{3} Domain
 6. Empirical Orthogonal Function Analysis of Ozone Profiles
 7. Conclusion
 Acknowledgments
 References
 Supporting Information
[1] Observations of CO and O_{3} from the Transport and Chemical Evolution over the Pacific (TRACEP) campaign are compared with modeled distributions from the FRSGC/UCI CTM driven by the Oslo T63L40 ECMWF forecast meteorology. The modelmeasurement comparison is made within the context of how well the TRACEP observations represent the springtime chemistry and ozone distributions over eastern Asia and the western Pacific in March 2001 and uses the fourdimensional (4D) extended domain from the model to provide unbiased statistics. A key question is whether the limited sampling density or mission strategy led to a statistically biased sample. To address this question, we examine a diverse range of statistical analyses of the observations of CO and O_{3}. The middle percentiles of the cumulative probability functions for CO in the free troposphere are representative (and reproduced by the CTM), but those in the boundary layer are not. The frequency of lowCO, stratospheric influence is well matched along flight tracks but is atypical of the extended domain. The percentiles of the latitudebyheight distribution of lidar O_{3} show how the CTM reproduces the nonrepresentative clumpy nature of the observations but has too low a tropopause about the jet region (30–35N). Adaptive kernel estimation of the 2D probability density of O_{3}CO correlations shows a very good simulation of two different chemical regimes (stratospheric and polluted) that is quite different from the extended domain but also highlights the failure to predict CO > 400 ppb. Empirical orthogonal function analysis of the O_{3} vertical profiles shows how six EOFs can effectively describe the 4D structures of O_{3} over this entire domain. The latitudebylongitude maps of the principal components provide an excellent test of the CTM simulation along flight tracks and clearly show the unique sampling of O_{3} events by the TRACEP flights. In many cases the ability of the model to simulate the nonrepresentative observations implies a clear skill in matching the unique meteorological and chemical features of the region.
1. Introduction
 Top of page
 Abstract
 1. Introduction
 2. Methodology
 3. Cumulative Probability Distributions of CO and O_{3}
 4. LatitudeHeight Distributions of O_{3} From the Lidar Sampling
 5. Probability Density Function in COO_{3} Domain
 6. Empirical Orthogonal Function Analysis of Ozone Profiles
 7. Conclusion
 Acknowledgments
 References
 Supporting Information
[2] In trying to understand atmospheric composition on a global scale, instantaneous measurements of trace gases made at specific sites are often assumed to be representative of a larger region over a longer period. For example, the NOAA CMDL flask network [Dlugokencky et al., 1994; Conway et al., 1994] has provided the global mean latitudinal gradient as well as trends on the total atmospheric burdens of CH_{4} and CO_{2}, based solely on biweekly sampling at a number of remote sites. The assumption that these limited, surface measurements can be used to integrate the total atmospheric burden is necessary but farreaching. In other cases, many singlelocation, surface, or sonde measurement programs have argued that their data are representative of the entire boundary layer [e.g., Haszpra, 1999; Inoue and Masueda, 2001] or the free troposphere for a greater region [e.g., Navascues and Rus, 1991; Gallardo et al., 2000]. Compared with these examples, recent airborne regional campaigns have provided an extremely dense, latitudinal, longitudinal, and vertical sampling over several weeks (e.g., TRACEP, PEM, NARE, CEPEX, MINOS). These data sets provide extensive statistics on the atmospheric chemistry of the region and thus are often assumed to be fully representative of the location and period. Nevertheless, even such highdensity campaign data are greatly undersampled compared with atmospheric variability, and, further, the design of campaigns to study specific processes may result in statistically biased sampling of the region. For example, the NASA Transport and Chemical Evolution over the Pacific (TRACEP) measurement campaign [Jacob et al., 2003] had the primary goal of studying the export of pollution from eastern Asia. Understanding the representativeness of these campaign data would greatly strengthen their use in global studies.
[3] In addition, understanding of the representativeness of a given set of observations can help evaluate the accuracy of model simulations of those observations. Matching the observed statistics when the observations are representative of the sampling region is one measure of skill; however, if the observations are a statistically biased sampling, then accurate model simulation implies a clear skill in matching the unique meteorological and chemical features of the region. In the case when there is representative sampling from a campaign, the discrepancy between the flighttrack observations and the model simulation cannot be dismissed as meteorological error and is more likely due to a fundamental or systematic error in the model such as emission levels, chemistry, or largescale spatial gradients. In the case of statistically biased sampling, the mismatch between observations and model can perhaps be attributed to errors in the modeled meteorological fields, such as the height of convective outflows or the timing of frontal passing. Here we evaluate the accuracy of our global chemistrytransport model (CTM) simulations of the TRACEP observations of CO and O_{3} within the context of how these measurements are representative of the western Pacific during March 2001.
[4] For TRACEP the community developed several, independent, highresolution fourdimensional (4D) CTM simulations for that region and period that do a commendable job on matching many of the measurements [e.g., Kiley et al., 2003; Wild et al., 2003; Carmichael et al., 2003; Pierce et al., 2003; Liu et al., 2003; C. Mari et al., The effect of clean warm conveyor belts on the export of pollution from East Asia, submitted to Journal of Geophysical Research, 2004, hereinafter referred to as Mari et al., submitted manuscript, 2004]. Here we use the FRSGC/UCI/Oslo chemistrytransport model (T63L40 resolution with EC forecast fields) to generate a densely sampled 4D data set for each chemical species for the TRACEP region. The accuracy of our CTM simulations is determined by comparing a wide range of statistical features of O_{3} (in situ and lidar) and CO (in situ) from (1) the observations and (2) the CTM simulations along flight tracks. The representativeness of the TRACEP sampling is determined by parallel comparisons between (2) the CTM flighttrack data and (3) the CTM simulations over the extended 4D domain (i.e., the eastern Asiawestern Pacific region for the month of frequent flight measurements).
[5] A first approach in comparing such TRACEP species measurements with model simulations is to plot the two overlapping time series for each flight. From parallel measurementmodel plots of several species one can visually identify the temporal and spatial scales of variability and also the correlation of different species. Another, more quantitative method plots the modeled versus measured abundances as a scatter plot, yielding a measure of the accuracy of the simulation through a linear regression (for CO from TRACEP, see Figure 1 in the work of Kiley et al. [2003]). Here we resort to a new range of statistical methods that allow us to compare not only the measurements with the model at the specific measurement locations but also the model sampled along flight tracks versus an extended 4D domain.
[6] Section 2 describes the FRSGC/UCI version of the CTM and our simulations of the extended TRACEP domain. In addition, two statistical techniques used in this study are described: the adaptive kernel estimation for construction of 2D Probability Density Functions (PDFs) to characterize the O_{3}CO correlations and Empirical Orthogonal Functions (EOFs) for analysis of vertical structures in the lidar O_{3} profiles. Cumulative probability distributions for in situ CO, in situ O_{3}, and lidar O_{3} are examined in section 3. Latitudeheight sections of the O_{3} abundance from the lidar sampling are shown for the 10th, 50th, and 90th percentiles of both observation and model in section 4. The same CTM statistics are also presented for the extended TRACEP domain rather than just the flight tracks. In section 5 the O_{3}CO correlations observed along the flight tracks are compared with the model for individual flights. These data are combined into a single, twodimensional probability distribution for all flights to compare with CTM simulation of both the flight tracks and the extended TRACEP domain. In section 6, EOFs are used to describe the vertical features of the O_{3} distribution and to show where these features are prevalent off the coast of Asia. Conclusions regarding the accuracy of the CTM simulation of TRACEP observations in relation to the representativeness of the observing strategy are given in section 7.
3. Cumulative Probability Distributions of CO and O_{3}
 Top of page
 Abstract
 1. Introduction
 2. Methodology
 3. Cumulative Probability Distributions of CO and O_{3}
 4. LatitudeHeight Distributions of O_{3} From the Lidar Sampling
 5. Probability Density Function in COO_{3} Domain
 6. Empirical Orthogonal Function Analysis of Ozone Profiles
 7. Conclusion
 Acknowledgments
 References
 Supporting Information
[17] The cumulative probability distributions of the observed CO and O_{3} abundances show a population that can be separated into background levels (typically the central 50% or more of the population) and into pollution events or stratospheric intrusions (evident in the extreme abundance ranges). Our CTM simulated probability distributions of this flighttrack data are a direct test of our ability to represent these populations for the TRACEP sampling of the domain. Moreover, the same probability distributions from the extended 4D domain can identify whether the TRACEP sampling is representative of the larger domain.
[18] We recognize that different parts of the domain will have different extreme populations and background levels, and thus we have split the domain into tropical (14N–25N) and extratropical (25N–46N) as well as boundary layer (0–1 km), free troposphere (1–10 km), and region of stratospheric influence (10–12 km) where air of stratospheric origin are more likely to be sampled. The boundaries are somewhat arbitrary but we found that these choices sufficiently highlighted the different probability distributions in the observations. Table 1 gives the number of data points, N, for CO and O_{3} in situ measurements along the combined DC8 and P3B flight tracks, O_{3} measurements from DC8 lidar sampling, and the sampling from CTM in the extended 4D domain. N increases by about 2 orders of magnitude from in situ measurement to lidar sampling, and by about 1 order of magnitude from lidar sampling to 4D model data sampling. Table 2 summarizes the 25th percentile (first quartile or Q) and the 50th percentile (median or M) for the observations, the simulated observations along flight tracks, and the simulated distributions of the extended 4D domain. The cumulative probability distributions for CO are plotted in Figure 2a as a function of sigma, σ, the standard deviation of the normal distribution. Vertical dashed lines mark the 25% (Q, σ = −0.675) and 50% (M, σ = 0) probabilities. In each of the six regions, the observed distribution (solid line) is compared with the simulated observations (dashed line). The overlap of these distributions is a measure of the accuracy of the CTM in simulating the TRACEP observations.
Table 1. Number of Data Points (N) for CO and O_{3} From DC8 and P3B In Situ Flight Measurements, O_{3} Lidar Data Along DC8 Flight Tracks, and CTM Data for the Extended 4D Domain^{a}  0–1 km  1–10 km  10+ km 


Sample size (N) for CO in situ data 
14–25N  793  3135  434 
25–46N  1624  4337  241 

N for O_{3}in situ data 
14–25N  821  3305  486 
25–46N  1736  4638  271 

N for O_{3}lidar data 
14–25N  35814  0.29 M  0.13 M 
25–46N  33300  0.27 M  75431 

N for extended 4D domain 
14–25N  1.0 M  2.8 M  0.3 M 
25–46N  1.4 M  5.2 M  0.5 M 
Table 2. TRACEP In Situ CO and O_{3} Percentile Levels (in ppb) Compared With the CTM Simulation Along the Flight Tracks and for the Extended Domain^{a}  0–1 km  1–10 km  10+ km 


Q:M for in situ CO OBS 
14–25N  140:185  86:108  80:87 
25–46N  205:226  114:138  68:89 

Simulated OBS 
14–25N  130:171  90:99  81:94 
25–46N  201:214  119:139  66:101 

Extended domain 
14–25N  116:149  88:100  85:95 
25–46N  178:207  113:137  15:45 

Q:M for in situ O_{3}OBS 
14–25N  28:46  33:48  22:29 
25–46N  53:57  53:58  65:69 

Simulated OBS 
14–25N  31:49  29:34  26:29 
25–46N  66:70  51:59  50:94 

Extended domain 
14–25N  22:39  26:36  24:33 
25–46N  58:62  53:60  85:225 

Q:M for Lidar O_{3}OBS 
14–25N  28:41  32:46  24:36 
25–46N  47:54  53:58  50:69 

Simulated Lidar OBS 
14–25N  32:48  29:35  25:33 
25–46N  62:69  51:59  45:75 

Extended domain 
14–25N  22:39  26:36  24:37 
25–46N  58:62  53:60  158:500 
[19] Below the 50th percentile, the CO distribution is extremely well simulated (typically within 5 ppb) by the model for all regions except the tropical boundary layer where the CTM uniformly underestimates the observed CO by about 12 ppb. In CTM sensitivity tests with a range of COlike tracers (not shown here), we find that much of the observed variance (e.g., as measured by M  Q), including finescale features, is driven by large and synopticscale systems acting on the globalscale latitudinal gradients in CO, rather than by the nearby east Asian emissions. Thus we take this agreement to mean that the largescale CO gradients and meteorological systems are well simulated. Above the 75th percentile, however, the simulations are uniformly much smaller than observed. One cause might be the failure of the CTM to resolve urban plumes, for example, the intense, smallscale pollution events such as the Shanghai plume [Russo et al., 2003; Simpson et al., 2003; Talbot et al., 2003]. However, for the distributions shown in the figure (CO < 300 ppb), the observed probability distributions are unaffected by spatial filtering at the CTM resolution, and hence these probability distributions should be resolved by the model. Thus the uniform underprediction of the CO probabilities at the upper end of the distribution as shown are likely due to an underestimate of CO emissions from east Asian sources [Palmer et al., 2003] or possibly to chemical influences rather than lack of model resolution (more supporting evidence from the O_{3}CO correlations is presented in section 4).
[20] The difference between the flighttrack simulations (dashed line) and the extendeddomain probabilities (dotted line) is also shown in Figure 2. These latter distributions have three orders of magnitude more points than the in situ sampling and hence smoother curves. In the free troposphere the Q and M values show no obvious statistical bias, but in the boundarylayer there is a preference for sampling higher CO, a possible indication of chasing pollution outflow from the continent. For CO abundances greater than 200 ppb at all heights, the extendeddomain sampling includes values over the continent and thus shows greater probabilities for these highCO events than the simulated flight tracks. The extremely low CO abundances in the extratropics (1–10 km and 10–12 km) indicate air of stratospheric origin, and their frequency ranges from a few percent below 10 km height to as much as 50% in the 10–12 km region. The flight track sampling greatly underestimates their frequency both above and below 10 km height; this reflects the strong latitude and height gradients over this domain and sampling that is preferentially toward the southern and the lower part of the range.
[21] Ozone comparisons show both successful simulations and some obvious model errors. In the free troposphere the observed extratropics probability distribution for the in situ data (Figure 2b) is well matched by the model for the central 50% of the distribution. In the tropics, however, the model accurately matches only the lowest 25% of the distribution and consistently underestimates ozone in the remaining 75 percent of the distribution by 10 ppb or more. For the boundary layer, the model is biased high for both tropics and extratropics. The offset between boundary layer and free troposphere is large and consistently in error for all latitudes: Observations have a shift of about +5 ppb (boundary layer being less than free troposphere for both Q and M); the model predicts an opposite shift of about −10 ppb. This model error can best be explained if the continental boundarylayer sources of ozone from Asia are exaggerated [Wild et al., 2003]. A separate error, the underestimate of tropical ozone (and also the upper 50% of CO) could be due to a missing source, most likely from an underestimate of the episodic emissions from biomass burning and lightning during the TRACEP period.
[22] For stratospherically influenced regions the comparison with in situ data is erratic due to the small number of points and the large variability induced mostly by stratospheric intrusions. If we expand the comparison to the DC8 lidar data (Figure 2c), the number of these highaltitude points increases from a few hundred to a hundred thousand and the probability distributions become well defined. For this sampling the model successfully matches the observations for the lowest 50% of the distribution and predicts about the right frequency of the stratospheric influence (O_{3} > 100 ppb). Including the lidar data does not change the previous conclusions and only reemphasizes the systematic error in the boundary layer found with the in situ data (see also later discussion on ozone EOFs).
4. LatitudeHeight Distributions of O_{3} From the Lidar Sampling
 Top of page
 Abstract
 1. Introduction
 2. Methodology
 3. Cumulative Probability Distributions of CO and O_{3}
 4. LatitudeHeight Distributions of O_{3} From the Lidar Sampling
 5. Probability Density Function in COO_{3} Domain
 6. Empirical Orthogonal Function Analysis of Ozone Profiles
 7. Conclusion
 Acknowledgments
 References
 Supporting Information
[23] The latitudeheight distribution of tropospheric O_{3} can help identify the tropopause, stratospheric intrusions, pollution events, and the largescale gradient between tropics and extratropics. The O_{3} lidar data from TRACEP provide the extensive sampling needed to define this latitudeheight section [Browell et al., 2003]. Here, we examine the probability distributions of these latitudeheight sections, showing how the sequence from 10th, to 50th, to 90th percentiles (Figure 3) can be used to identify the statistical distribution of tropopause heights, those regions impacted by stratospheric intrusions, and the representativeness of the TRACEP sampling. Each percentile figure shows the observations (top panel), the CTM simulation along flight tracks (middle panel) and the CTM simulation of the extended 4D domain (bottom panel). The CTM simulation along the flight tracks follows the lidar sampling with data points missing; while the extended domain statistics assume that O_{3} is measured from 0 to 18 km, even in the presence of clouds. The white reference line in the extendeddomain plots marks the upper boundary of the flighttrack data used in the analysis.
[24] In the tropics, there is clear evidence of a highozone region at 5–10 km height near 17 N in the TRACEP sampling. It is seen in both the observations and the CTM. This region is clearly seen at all percentiles from 10th to 90th, and moreover the ozone abundance increases slowly from about 45 ppb at the 10th percentile to about 75 ppb at the 90th percentile indicating an extensive region of low variability. Even at the 90th percentile, the ozone abundance remains well below stratospheric intrusion levels. This region was sampled on DC8 Flight 6, and the high O_{3} levels have been attributed to biomass burning [Browell et al., 2003]. The TRACEP sampling clearly singles out this event (i.e., it is not seen on the extended domain) and shows the success of the CTM simulation in reproducing it at all statistical levels. Overall in the free troposphere, the CTM underestimates ozone abundance by about 10 ppb as seen also in the probability distributions in Figures 2b and 2c.
[25] In the midlatitudes, the region of high ozone abundance at 6–12 km near 28 N can be clearly seen as the remnant of a stratospheric intrusion; at the 10th percentile it has similar enhancements to the 17 N region, but the ozone abundance jumps to more than 75 ppb at the 50th percentile and become merged into the stratosphere (>100 ppb) at the 90th percentile. Several flights intercepted stratospheric intrusions in this region behind midlatitude cyclones, indicated in in situ measurements by low CO and high wetbulb potential temperatures (Mari et al., submitted manuscript, 2004). However, the greatest contributions to the O_{3} feature at 28N come from DC8 Flight 16, which made a transect at this latitude on 30 March and intercepted a deep intrusion. The greater sampling along this transect explains why the feature is clearly seen in the median as well as the high end of the distribution. Another stratospheric intrusion, visible only at the 90th percentile, is observed in the tropics at 22N between 7–14 km; while several flights contributed to the statistics over this region, a stratospheric intrusion was sampled on only one flight (DC8 Flight 14). The CTM simulation captures this intrusion, but it occurs at a slightly lower altitude and a little further north. In summary, the CTM captures the basic statistical features of these intrusions along the flight tracks. It simulates the distribution of elevatedozone (55–95 ppb) as it mixes into the troposphere through the range of percentiles. This success, plus the overall excellent simulation of TRACEP ozone, ozone sondes, TOMS ozone columns, and the global mean stratospheretotroposphere ozone flux [Wild et al., 2003] indicates a good simulation of the dispersion and mixing of stratospheric intrusions.
[26] The statistics for the extended domain show a layer of enhanced ozone, apparently of stratospheric origin, extending from the subtropical break in the tropopause down into the tropics at about 6 km height. At all percentiles, however, they show none of the individual features picked up by the TRACEP flight tracks. Thus tropical ozone appears to be enhanced in the midtroposphere by stratospheric ozone mixing down from the subtropical jet. In comparing the CTM flight tracks with the extended domain, it is clear that the TRACEP mission favored sampling of pollution sources with enhanced boundarylayer ozone, but this only emphasizes the CTM exaggeration of boundarylayer ozone when compared with observations, as discussed above. All three latitudeheight sections show the descent of stratospheric air (as measured by the 100 ppb contour) by 2–3 km in height as one progresses from 10th to 90th percentile.
5. Probability Density Function in COO_{3} Domain
 Top of page
 Abstract
 1. Introduction
 2. Methodology
 3. Cumulative Probability Distributions of CO and O_{3}
 4. LatitudeHeight Distributions of O_{3} From the Lidar Sampling
 5. Probability Density Function in COO_{3} Domain
 6. Empirical Orthogonal Function Analysis of Ozone Profiles
 7. Conclusion
 Acknowledgments
 References
 Supporting Information
[27] The patterns of correlation between CO and O_{3} can identify mixing between different chemical regimes in the atmosphere and further provide information on the photochemical evolution of O_{3} in polluted plumes [Parrish et al., 1993]. A first approach to O_{3}CO correlations is to examine the scatter plots for all in situ measurements. As an example, the oneminute in situ measurements from DC8 flights 13 and 15 are compared with the CTM simulation in Figure 4. In this analysis, the 1min observations are taken as is, with no spatial smoothing to match the CTM grid. In the figure, the dashed lines (the same in all panels of Figures 4 and 5) are a leastsquares fit to all observations for lowCO and highCO regions (see Figure 5 caption or text below). DC8 flights 13 and 15 show occurrences of lowCO stratospheric air that are well simulated, including both magnitude and slope.
[28] Model simulations of individual flights are generally excellent, agreeing with the observation of stratospheric air (low CO, very high O_{3}) and pollution plumes (high CO, moderate O_{3}). A combined scatter plot with all the data points would not be easy to interpret, and thus we apply adaptive kernel estimation (section 2.2) to derive twodimensional probability density functions (PDF) for both measurements and model. The adaptive kernel method generates smooth PDFs without spurious maxima from the more than 10,000 individual points as shown in Figure 5.
[29] In calculating these PDFs, we have chosen to spatially smooth the observations to more closely match the CTM grid. The observations show distinct features on very small scales, such as 100m thick laminae, which cannot be resolved by the CTM grid (about 500 m vertical by 180 km horizontal). Thus we define a triangular weighting function with a halfheight, halfwidth of 500 m in the vertical and 180 km in the horizontal, and we process the 1min in situ observations from each flight according to their vertical and horizontal separations. For each point all measurements that fall within a 180 km radius and within 500 m in height contribute to the value at that point.
[30] The top panel in Figure 5 shows the PDF for the observed COO_{3} data points during TRACEP. The contours are logarithmic (base 10) and denote the probability per unit area in ppb^{2}. For example, the probability of observing CO between 200 and 201 ppb at the same time as O_{3} between 50 and 51 ppb is about 10^{−4}. The integral of the PDF over the entire range up to 1000 ppb in CO and O_{3} is nearly 1. The middle panel of Figure 5 shows the equivalent PDF for the CTM flighttrack data; and the bottom panel, for the extended 4D domain. The dashed line in all three panels is the same: The small positive O_{3}/CO slope (+0.06) is a fit to the spatially filtered observations that generally describe background plus pollution events (CO > 200 ppb or O_{3} < 100 ppb); the large negative slope (−3.4) is a fit to the stratospherically influenced air (CO < 200 ppb and O_{3} > 100 ppb).
[31] Both observed and modeled flighttrack PDFs show almost identical patterns of stratospheric influence, even to the bimodality due to TRACEP sampling that is not seen in the extendeddomain PDF. The probability of highO_{3} intrusions is accurately modeled in terms of magnitude, slope, and probability. The highCO region, unfortunately, is not well simulated. Even with the spatial filtering, the observations show a significant probability for CO > 500 ppb, where the CTM shows none. On the other hand, the highCO regions of greater probability (i.e., orange: PDF ≥ 10^{−5} per ppb^{2}) are well modeled. There is a clear statistical bias in which the CTM TRACEP flight tracks have this probability region extending to 400 ppb (both observations and model), whereas the extended CTM 4D domain has it extending only to 300 ppb.
[32] The O_{3}/CO slope for the highCO events is often used to infer the amount of O_{3} exported from regional pollution [Parrish et al., 1993]. Taking the TRACEP statistics as a whole, rather than selecting individual events, we find that the derivation of a single slope to characterize the observations is difficult. The mean slope of +0.06 accurately characterizes the very high probability region (red, PDF ∼ 10^{−4} per ppb^{2}), but this slope is smaller for the moderatetolower probability regions that characterize CO > 300 ppb. For both model and measurements, the O_{3}/CO slope for this extended region is almost zero. In varying different criteria for selecting the data (e.g., considering only CO > 200 ppb, tropics versus extratropics), we find that this O_{3}/CO slope varies considerably and includes also small negative slopes. This is consistent with measurements made at Sable Island in pollutant outflow from North America, where the correlation of O_{3} and CO is very poor in spring and autumn, and the ratio is close to zero in March [Parrish et al., 1998]. One consistent picture from the observations is that the slope in the tropics (e.g., 0.08 for CO > 200 ppb) is greater than that in the extratropics (0.03 for CO > 200 ppb). Both values are much less than that derived from a simple mixing curve between the median values of O_{3} and CO in the tropics and extratropics (about +0.3). Use of the O_{3}/CO slope alone as a test of photochemistry or a measure of O_{3} production will require analysis on a casebycase basis, and it is likely to work less well in spring than during summer when O_{3} production is more rapid.
6. Empirical Orthogonal Function Analysis of Ozone Profiles
 Top of page
 Abstract
 1. Introduction
 2. Methodology
 3. Cumulative Probability Distributions of CO and O_{3}
 4. LatitudeHeight Distributions of O_{3} From the Lidar Sampling
 5. Probability Density Function in COO_{3} Domain
 6. Empirical Orthogonal Function Analysis of Ozone Profiles
 7. Conclusion
 Acknowledgments
 References
 Supporting Information
[33] The dense vertical sampling of O_{3} by lidar allows us to characterize the vertical structures of ozone over the west Pacific. We analyze the covariance matrix constructed from the horizontaltemporal sampling of O_{3} vertical profiles up to a height of 8.3 km, as described in section 2.3. The EOFs (dimensionless) are patterns of vertical structure of the variance about the mean profile. The Principal Components (PCs, in units of ppb) are the coefficients of the EOFs (one for each EOF) derived from fitting a single profile to the mean profile plus EOFs. In this study each profile, and hence each set of PCs, is a function of latitude, longitude, and time.
[34] The six leading, normalized EOFs plus the mean profile are shown in Figure 6 for the lidar observations and for three different ways of sampling the model. The EOF vertical structures, even to EOF6, are remarkably similar across all four data sets. There is a tendency for the lidarobservation EOFs (dotted lines) to look more like those from the extended 4D domain (solid lines) rather than flighttrack EOFs (dotdashed and dashed lines). Within the CTM data there is a systematic downward shift in the EOF structures for the TRACEP flight tracks relative to the extended 4D domain but hardly any difference due to the lidar missing data (dashed versus dotdashed). The impact of missing data is also barely visible for the mean profile. The mean profile from either of the CTM data sets reemphasizes the model error in boundarylayer O_{3} with a vertical gradient opposite to observations as discussed in section 3. The variability structures are quite reasonable up to EOF5 or EOF6, and hence this error is likely due to a systematic, timeindependent model bias that is apparently related to the overproduction of O_{3} in the boundary layer [Wild et al., 2003].
[35] The variances (in percentage) captured by each EOF for each data set are listed in Table 3. The first six EOFs comprise 95% of the total variance from the lidar observations and 99% of that from the CTM data. This difference might be attributed to the limited vertical resolution of the model. There is a systematic difference between the model and observations in the partitioning of variance among the EOFs: The model overestimates the EOF1 variance by a factor of 1.3 and underestimates that from EOFs 2–6 by factors between 1.4 and 2. EOF1 appears to represent the variance due to tropopause height, and thus the overestimation of EOF1 variance in the model is consistent with the O_{3} 90th percentile distribution shown in Figure 3, which shows that the model predicts a lower tropopause and extensive stratospheric decent northward of 27 N.
Table 3. Variance in Percentage (%) Captured by the First Six EOFsEOFs  1  2  3  4  5  6 

Lidar Observations  57.41  20.38  8.89  4.29  2.31  1.42 
CTM Simulated lidar  75.14  13.79  4.94  3.04  1.30  0.72 
CTM Complete profiles  76.70  13.34  3.91  2.81  1.31  0.70 
CTM 4D Extended domain  80.88  11.69  3.26  1.81  0.90  0.52 
[36] To demonstrate how these EOFs describe the O_{3} abundance, we reconstruct an O_{3} latitudinal transect at 125.6 E on 05Z 21 March by adding successive EOFs to the mean profile (shown as the leftmost panel in Figure 6) in Figure 7. At that time, DC8 flight 13 was flying toward the Yellow Sea from 25N to 32N approximately along the same longitude, and the detailed meteorological analysis can be found in the work of Mari et al. (submitted manuscript, 2004). The top panel of the figure shows the O_{3} latitudeheight distribution constructed from the mean profile plus EOF1, and successive panels show the cumulative addition of EOF2 through 6. The input O_{3} latitudeheight distribution (panel below EOF–6) is accurately reconstructed by these six EOFs. The residual O_{3} (input minus reconstructed, lowest panel) shows that only smallscale, smallamplitude features remain. Some understanding of what the EOFs may represent can be seen in this sequence: EOF1 restores the largescale latitudinal gradient, distinguishing tropics from extratropics, especially in terms of tropopause height and largescale descent of stratospheric air; EOF2 restores the high O_{3} abundances in the extratropical boundarylayer and the low O_{3} abundances in the tropical boundary layer; EOF3 captures the high O_{3} band around 5–7 km both in midlatitudes and tropics; EOF4 reemphasizes the boundarylayer gradients; and the stratospheric intrusion (about 6 km, 25 N–30 N) is finally outlined with EOF5 and EOF6.
[37] The timeaveraged PC distribution gives a good indication of preferred geographical locations for a given EOF. Figure 8 shows the geographic distribution of the PCs of EOF1 through EOF5 for the CTM extended 4D domain (left panels, top to bottom respectively), the CTM lidar simulations (center), and the lidar observations (right). For all three data sets (left, center, right), we project the centered data (i.e., mean profile removed) onto one set of EOFs, those derived from the 4D CTM data (solid curves in Figure 6). For the center and right panels, thus the principal components should be denoted pseudoPCs. Also for the center and right panels, projecting an incomplete O_{3} profile (even with mean removed) onto any EOFs produces large, unrealistic values; and we choose to fill the missing data with the corresponding CTM data values. Thus the observed lidar data set has all missing data replaced by modeled values and the simulated lidar data set used for the PCs have complete profiles along flight tracks. All PCs are averaged over time (3 March to 3 April 2001). For the 4D data the horizontal resolution is that of the CTM, and for flight track data (center and right panels), the oneminute data are averaged over 2° × 2° bins in longitude and latitude. Note that the units are ppb of O_{3} and have different amplitudes for each EOF but the same scale across the three data sets.
[38] As shown in the left panels in Figure 8, PC1PC3 have more uniform zonal distributions while PC4 and PC5 show high values just off the Asian coast in the subtropics. This pattern is consistent with the decomposition of the single transect in Figure 7: The first three EOFs capture mostly the variance associated with the largescale O_{3} background, and the higher EOFs explain fine structures that tend to have more localized distributions. PC1 is mostly associated with the stratospheric influence below 8.3 km and exhibits a maximum around 135E and 45N, decreasing monotonically toward the tropics. PC2, whose EOF has a deep boundarylayer structure, has a maximum band around 32N and decreasing both northward and southward, with an equally large negative minimum in the tropics. PC3, whose EOF has a maximum around 5–6 km, has a positive maximum amplitude in the tropics near 14N and a smaller negative minimum near 22N in the subtropics. The fact that the maximum occurs near the tropics seems to suggest that this feature is related to biomass burning. PC4 has a positive maximum distribution along the southeast coast of Asia; PC5 has a maximum located west of Taiwan and North of Hong Kong. These features, in contrast to PC1PC3, are probably associated with variability from local pollution plumes.
[39] The CTM flight track data (center panel) capture more or less similar PC distributions to the extended 4D results. Nevertheless, the values are higher than those from the 4D data indicating a statistical bias, for example, in stratospheric intrusions north of Japan (PC1) and in pollution plumes near the coast (PC4 and PC5). Comparing the observed PCs (right panel) with those from the CTM flight tracks, the agreement is excellent for PC1 and PC2, quite good still for PC3 and PC4 (at least in terms of general pattern), but loses much of the coherence by PC5.
[40] In summary, this EOF/PC analysis of the TRACEP O_{3} profiles has clearly quantified the statistical biases in TRACEP sampling and identified them with specific profile structures and specific locations. In addition there is generally good agreement between model and observations for the geographic patterns of PC1 through PC4, however, some caution on this approach as a modelmeasurement validation tool is needed. The filling of lidar missing data with model data may have enhanced this agreement, and additional approaches to analyzing the lidar data for vertical structures are needed.
7. Conclusion
 Top of page
 Abstract
 1. Introduction
 2. Methodology
 3. Cumulative Probability Distributions of CO and O_{3}
 4. LatitudeHeight Distributions of O_{3} From the Lidar Sampling
 5. Probability Density Function in COO_{3} Domain
 6. Empirical Orthogonal Function Analysis of Ozone Profiles
 7. Conclusion
 Acknowledgments
 References
 Supporting Information
[41] We present a range of atypical statistical analyses of the TRACEP observations of CO and O_{3} to evaluate the representativeness of the TRACEP observations and to provide possible new insights on the accuracy of chemistrytransport models. Representative is used here to mean that the data along the flight tracks has the same statistical properties as a uniform sampling of an extended region over eastern Asia and the western Pacific. This evaluation uses the modeled distributions from the FRSGC/UCI CTM driven by the Oslo T63L40 ECMWF forecast meteorology (1.9° × 1.9° × 500 m) to compare flight track data with those from an extended 4D domain defined arbitrarily as 14N to 46N, 100E to 150E, up to 18 km in height, and from 3 March to 3 April 2001. We assume the extended domain as providing unbiased statistics.
[42] We first focus on the central 50% of the distribution for CO and O_{3}, since these values can be thought of as background air and generally avoid pollution plumes or stratospheric influences that appear at the extreme probabilities. For CO, outside of the boundary layer (0–1 km) and the region of dominant stratospheric influence (>10 km in midlatitudes), the 25th and 50th percentiles from the CTM along the flight tracks are basically the same as from the extended 4D domain, and, moreover, these agree with the TRACEP observations. In the boundary layer, the CTM flight tracks are systematically 10–20 ppb greater than the extended domain even though the domain includes continental emissions. This indicates that TRACEP sampling is biased toward sampling pollution plumes. Furthermore, both CTM extended domain and flight tracks are less than the observations, particularly so for the tropical region. Thus the cumulative probability functions for CO support the generally excellent CTM simulation of the observations but for a systematic underestimate of nearby emissions. Even the frequency of lowCO stratospheric influence is well matched along flight tracks but is atypical of the extended domain. For O_{3} these same probability functions clearly point out problems: excessive boundarylayer production in midlatitudes but a missing source in the tropics.
[43] The 10th, 50th, and 90th percentiles of the latitudebyheight distribution of lidar O_{3} show how the CTM reproduces the nonrepresentative clumpy nature of the observations, which is dramatically different than the smooth patterns from the extended domain even at the extreme (10th and 90th) percentiles. This modelmeasurement comparison also shows good agreement for the statistical height of the stratospheretroposphere transition (defined here as 100 ppb O_{3}), except about the jet region (30–35N) where the model shows intrusion of the 100ppb air to much lower heights.
[44] Adaptive kernel estimation of the 2D probability density of O_{3}CO correlations shows a very good simulation of two different chemical regimes (stratospheric and polluted) that is quite different from the extended domain. It also clearly points out the model failure to predict CO > 400 ppb. For the EOF analysis of the vertical O_{3} profiles, the lidar curtain sampling along the flight tracks has the EOF structures shifted downward about 1 km as compared with the extended domain. The latitudebylongitude maps of the principal components show larger amplitudes for the CTM flight tracks as compared with the extended CTM domain indicating inadequate sampling or bias toward sampling anomalous events. In summary, for most tests we find that the TRACEP data set shows some statistical biases in sampling and cannot be simply taken as representative of the chemistry and ozone distributions over eastern Asia and the western Pacific in March 2001.
[45] In evaluating model error using these new statistical measures, we find that the FRSGC/UCI CTM simulation of the TRACEP flighttrack data is in most cases quite good and is even better when one takes into account the biased sampling of the extended domain by the specific flight tracks. For example, the CTM does an excellent job in simulating the stratospheric influence in the upper troposphere for the TRACEP flights, and this influence is quite different from that averaged over the larger region. In previously noted cases in which the model failed to match highCO events or produced too high O_{3} abundances in the boundary layer, these new analyses point out that the errors are most likely due to sourceregion errors (e.g., CO emissions or nearfield O_{3} production) rather than meteorological errors. In most cases the modeled flighttrack data look much more like the observations than the model averaged over the region, indicating that the specific spatial and meteorological characteristics of the observations are captured.
[46] Overall, the TRACEP sampling is not representative of the larger domain we selected. Similar results would likely apply for any useful domain size. We believe that the simplest explanation for this is a combination the limited number of observations plus TRACEP strategy of sampling the chemical processes in pollution plumes leaving Asia and stratospheric intrusion events associated with cyclones. If one uses such campaign data to detect systematic longterm changes (e.g., between overlapping campaigns such as TRACEP and PEMWest B [Davis et al., 2003]) or to provide longterm calibration for satellite observations, then the representativeness of the different data sets needs to be evaluated.