3.1. Lower Troposphere
 Figure 2 compares an early morning ozone sounding (1100 UTC/0600 LST) on 14 July 2004 at Egbert, Ontario, with the predicted ozone profiles from the five model versions for that time and location (obtained by bilinear interpolation in the horizontal). The sharp transition in ozone concentration in the vertical from the surface through the nocturnal inversion to the residual layer above is captured by the models, although in all cases the vertical gradient is apparently overestimated. This may be due in part to the response time of the ozone sensor (see section 2.l), since this will tend to smooth the observed profile. The vertical gradient of the true ozone profile below 500 m may therefore more closely resemble that forecast by the models than it would appear from this comparison. However, all the models forecast higher ozone than is observed between 500 and 1000 m. The two CHRONOS runs also reproduce the secondary feature at 2000 m.
 Figure 3 shows an evening sounding (2300 UTC/1900 LST) on 30 July at Sable Island, Nova Scotia. The boundary layer transition in the vertical is less pronounced for this case. The three AURAMS runs predict this low-level feature well, though two are biased low overall and one high overall. The two CHRONOS runs, on the other hand, predict much more pronounced PBL effects than are seen in the measurements. On the other hand, all of the models are low, relative to the sonde, above 3000 m. Similar behavior above 3000 m is apparent in Figures 4–7, and as discussed in section 3.2 this bias becomes more pronounced at higher altitudes.
 An early afternoon sounding (1700 UTC/1300 LST) on 30 July at Yarmouth, Nova Scotia (Figure 4), shows a shallow layer of high ozone molar mixing ratio at 500 m, just above the top of the marine boundary layer. The two CHRONOS runs predict this feature fairly well, with the operational version (CHRONOS-OP) doing a somewhat better job of reproducing the narrowness of the layer, while the version with assimilation of surface ozone data (CHRONOS-SDA) correctly places the altitude of the layer. The three AURAMS runs are all quite different among themselves but all predict a broader feature and all are biased low.
 In Figure 5 the early afternoon ozone profile (2000 UTC/1400 LST) for 16 July at Huntsville, Alabama shows a deep (2 km) boundary layer of photochemically produced ozone. All five model versions predict large ozone production in this layer, and all get the PBL depth about right, but none predicts the ozone increase from the surface to 1500 m. As a result the AURAMS run results are much closer to the sonde measurement at the surface, whereas the CHRONOS run results are close to the sonde value at the top of the PBL (1500 m). All of the models underpredict ozone values above 3000 m, the AURAMS runs especially so.
 Some of the sites near the ocean (Narragansett, RHBrown) appear also to be subject to daylight titration under certain conditions (temperature inversions, clouds or fog). A dramatic example of this was observed at Narragansett on 28 July (Figure 6). Interesting, three of the model runs (AURAMS-NEW, CHRONOS-OP and CHRONOS-SDA) appear to reproduce this, although the predicted loss is only half that observed. The assimilation of surface ozone data appears to have helped the CHRONOS profile very little in this case, probably because the titration is a transient phenomenon.
 In Figure 7, the early afternoon sounding (1900 UTC/1400 LST) for 7 July at Wallops Island, Virginia shows a very complex, multilayered ozone profile. Four of the model versions predict the ozone maximum at 700 m. One run (CHRONOS-SDA) also shows some indication of the secondary peak at 2000 m. The AURAMS-NEW profile is quite different from the others and does not predict any layering above 500 m.
 These examples demonstrate that while the models all show some skill in forecasting ozone in the boundary layer and lowermost troposphere, they all show at times large differences from the actual profile, and in general large differences from each other. This is quite surprising, because the models have major features in common: they all use the same gas phase chemistry, and all are driven by the same meteorological forecast model. This implies that the differences in predicted ozone are due to differences in horizontal resolution, integration time step, treatment of biogenic emissions and aqueous phase chemistry, all of which might be expected to be of minor importance.
 A statistical summary of overall model performance in predicting the IONS ozone profiles is given in Figure 8 for the five model versions. Calculated biases are variable, and in some cases quite modest. Differences in sonde preparation between stations may contribute a minor part of the station-to-station variation in model-sonde bias. As noted in section 2.1, such differences are small (2–3%), but they are systematic between stations. Model-sonde differences for individual profiles, however, are often large, as evidenced by the error bars (one standard deviation), which are generally in the range of 10–30 ppbv, or 25–75% of typical tropospheric ozone amounts. Over all sites, in the first 1000 m, biases are lowest for the AURAMS-NEW run, while standard deviations are lowest for the AURAMS-RT run. Average biases over the first 1000 m are −5.6, 2.8, 1.8, 5.3 and 2.6 ppbv for AURAMS-RT, AURAMS-BIO, AURAMS-NEW, CHRONOS-OP and CHRONOS-SDA, respectively, while average standard deviations are very similar, ranging between 15.5 and 18.1 ppbv. The surface ozone data assimilation appears to reduce both biases and standard deviations for CHRONOS, although for some sites actual surface biases increase (e.g., Huntsville, where the bias for CHRONOS-SDA is the largest surface-level bias of any of the models, at any site). In general, agreement in the first 1000 m is best at Egbert, Yarmouth, Pellston and Sable Island, that is, at the northernmost IONS stations (see Figure 1). One possible explanation is that the Canadian emissions used as input to CHRONOS and AURAMS were more accurate than those for the United States. Interestingly, implementation of pollutant control legislation in the United States (“NOx SIP Call”) resulted in a significant reduction in U.S. NOx emissions occurring between 2001 and 2004, after the applicable years for the two U.S. emission inventories that were used for these runs [Frost et al., 2006]. The biases in Figure 8 at U.S. sites in the lowest 1000 m are predominantly overpredictions, and this may be partly due to the reduction in actual versus forecast emissions. In addition, several of the U.S. sites (Beltsville, Houston, Narragansett) are near or downwind of large pollution sources, and so see large variability in surface ozone depending on local winds, insolation and temperature inversions, rendering forecasting more difficult. In Figure 8 the standard deviations of the model-sonde differences for these sites decline markedly from the surface to 3000 m.
 The AURAMS-RT and AURAMS-BIO runs show much larger (negative) biases than the two CHRONOS runs and the AURAMS-NEW run above about 1500 m. As noted above, all of the models show exclusively negative biases above 2000 m.
 Another aspect of model performance, one that is perhaps the most important for an AQ forecast model, is how well the model predicts changes in ozone concentration from day to day. Several of the IONS sites launched sondes on a daily or near-daily schedule. Figure 9 shows time series of surface ozone from the ozonesondes at six of these sites, compared with the five model runs. Although individual differences are often significant, all the models track major changes in ozone concentration well overall. Variability in the model values is somewhat higher than in the measured values, by 12%, 32%, 38%, 27% and 17%, for AURAMS-RT, AURAMS-BIO, AURAMS-NEW, CHRONOS-OP and CHRONOS-SDA, respectively. Figure 10 is similar, comparing time series of measured ozone at 1000 m with those forecast by the five model versions for the same six sites. All of the models also track major changes in ozone concentration at 1000 m well, although individual differences are often significant. This is probably in part due to the fact that the models use emissions inventories for ozone precursors, and lack data on actual emissions. For example, none of the model runs predicts the large increases in ozone at 2000 m seen over Houston on 19 and 20 July, which were apparently due to pollution from Alaskan and Canadian forest fires [Morris et al., 2006]. Variability in the model values is somewhat higher than in the measured values, by 13%, 34%, 27%, 23%, and 29%, for AURAMS-RT, AURAMS-BIO, AURAMS-NEW, CHRONOS-OP and CHRONOS-SDA, respectively.
3.2. Upper Troposphere
 In marked contrast to the skill shown in the first 2000 m, above this level all of the models show exclusively negative biases with respect to measurements, and these biases become quite severe, particularly for AURAMS, in the upper troposphere (UT). Figure 11 shows average differences at Yarmouth, Nova Scotia, between the observed and forecast ozone profiles for each model. Other IONS sites show similar differences in the middle and upper troposphere. These differences can be as much as 80–90% in the UT for AURAMS; that is, the model is only showing 10–20% of the actual ozone values at these heights (compare Figures 11 and 13). For CHRONOS the low bias is less marked, but can be nearly 50% at 8 km. Possible reasons for this behavior will be discussed in the next section.