Airborne Multi-AXis Differential Optical Absorption Spectroscopy (AMAX-DOAS) measurements of NO2 tropospheric vertical columns were performed over California for two months in summer 2010. The observations are compared to the NASA Ozone Monitoring Instrument (OMI) tropospheric vertical columns (data product v2.1) in two ways: (1) Median data were compared for the whole time period for selected boxes, and the agreement was found to be fair (R = 0.97, slope = 1.4 ± 0.1, N = 10). (2) A comparison was performed on the mean of coincident AMAX-DOAS measurements within the area of the corresponding OMI pixels with the tropospheric NASA OMI NO2 assigned to that pixel. The effects of different data filters were assessed. Excellent agreement and a strong correlation (R = 0.85, slope = 1.05 ± 0.09, N = 56) was found for (2) when the data were filtered to eliminate large pixels near the edge of the OMI orbit, the cloud radiance fraction was <50%, the OMI overpass occurred within 2 h of the AMAX-DOAS measurements, the flight altitude was >2 km, and a representative sample of the footprint was taken by the AMAX-DOAS instrument. The AMAX-DOAS and OMI data sets both show a reduction of NO2 tropospheric columns on weekends by 38 ± 24% and 33 ± 11%, respectively. The assumptions in the tropospheric satellite air mass factor simulations were tested using independent measurements of surface albedo, aerosol extinction, and NO2 profiles for Los Angeles for July 2010 indicating an uncertainty of 12%.
 Nitrogen dioxide (NO2) is a regulated air pollutant, and monitoring is needed to assess the effects of reduction measures. Satellites provide the opportunity for global measurements of tropospheric vertical column densities (tropVCDs) on the basis of nadir measurements of ultraviolet (UV) and visible scattered sunlight. There is/was a number of satellite instruments available for measuring NO2 tropVCDs starting in the mid-90s of the last century: GOME on ERS-2 [Burrows et al., 1999], SCIAMACHY on ENVISAT [Bovensmann et al., 1999], Ozone Monitoring Instrument (OMI) on AURA [Levelt et al., 2006], and GOME-2 on MetOp-A [Munro et al., 2006]. Although satellites today only have a relatively coarse resolution of ca. 10 to 100 s of kilometers and usually only provide a global picture every 1 – 6 days, those measurements are useful tools for trend analyses [e.g., Richter et al., 2005; Lamsal et al., 2011], to infer NOx emissions by inverse modeling [e.g.,, Miyazaki et al., 2012; Lin, 2012], as assimilation data for air quality forecasting [e.g., Wang et al., 2011; Petritoli et al., 2011], and to deduce NO2 surface concentrations [e.g., Lamsal et al., 2008; Lee et al., 2011]. However, tropVCDs from different research groups do not always agree [e.g., Lamsal et al., 2010; Bucsela et al., 2008; Boersma et al., 2008]. Differences result mainly from differences in the assumptions in the radiative transfer calculation and in the way the stratospheric NO2 is removed from the total column density. The uncertainty of satellite retrievals for polluted environments is dominated by the uncertainty in the radiative transfer calculations and originates from the necessary assumptions made to describe the surface albedo, the NO2 and aerosol extinction a priori profile shapes as well as insufficient information about clouds [Boersma et al., 2004; Richter and Burrows, 2002]. Therefore, regular validation under a variety of conditions and also spanning the entire area of a satellite footprint is important.
 We present an NO2 validation study using the University of Colorado Airborne Multi-Axis Differential Optical Absorption Spectroscopy (CU-AMAX-DOAS) instrument, which is optimized to obtain tropVCDs of trace gases as well as detailed profile information for trace gases and aerosol extinction when flying so-called low approaches [Baidar et al., 2013]. We recorded data in California from 19 May – 19 July 2010 flying on a NOAA Twin Otter aircraft (http://www.aoc.noaa.gov/aircraft_otter.htm) and compared the data to the NASA OMI tropospheric NO2 product. The sampling area contained pollution hotspots like the Los Angeles Metropolitan Area as well as background conditions, e.g., over the High Desert. These measurements were part of the California Research at the Nexus of Air Quality and Climate Change (CalNex, campaign overview by Ryerson et al. ) and Carbonaceous Aerosols and Radiative Effects Study (CARES, campaign overview by Zaveri et al. ) campaigns. We also deployed a set of upward and downward pointing radiometers to measure surface albedo. Hence, the key parameters for the solar radiative transfer calculation, i.e., surface albedo, aerosol extinction profiles, and trace gas profiles, were measured quasi-simultaneously with the NO2 tropVCD. Since the AMAX-DOAS instrument measures in a similar wavelength range as the UV/visible satellite spectrometers, the same absorbers can be retrieved which makes this technique applicable to the validation of other species as well, e.g., glyoxal or formaldehyde.
 Satellite instruments and the CU-AMAX-DOAS instrument measure with a similar viewing geometry, i.e., nadir or near nadir. Hence, they exhibit similar vertical sensitivities. Figure 1 compares box air mass factors (BAMFs; BAMFs are equivalent to weighting functions for optically thin absorbers, and they express the average light path enhancement with respect to the vertical thickness of a layer) for a typical scenario during CalNex/CARES at the solar zenith angle (SZA) of the OMI overpass. AMAX-DOAS measurements are on average about 50 – 36% more sensitive than OMI to the lowest 1 km in the atmosphere for an aircraft altitude of 3 – 5 km, respectively, but the shapes of the BAMFs are very similar. The aerosol extinction in these examples is rather at the extremes: AERONET measurements at CalTech, Pasadena indicate a mean aerosol optical depth (AOD) of 0.19 ± 0.08 and a median AOD of 0.18 with 0.14 being the 25th percentile and 0.24 being the 75th percentile at 440 nm over the time period of 19 May – 19 July 2010. As can be seen from Figure 1, the differences in the BAMFS for the two different AODs are negligible for the nadir observations (compare top and bottom panel).
 Tropospheric NO2 satellite validation has been performed with airborne or ground-based chemiluminescence NOx analyzers, but also with airborne or ground-based remote-sensing instruments.
 When comparing ground-based concentrations with column densities measured from space, assumptions of the profile shape have to be made either by using models [Petritoli et al., 2004; Ordóñez et al., 2006; Blond et al., 2007; Lamsal et al., 2010] or mixing depth from climatologies [Boersma et al., 2009] or as done by Schaub et al. , using NO2 measurements from different altitudes in the Alps to construct a profile. The chemiluminescence NOx analyzers applied in the above studies use a heated molybdenum oxide surface to reduce NO2 to detectable NO. These analyzers are known to suffer from interferences: species such as nitric acid or peroxyacetyl nitrate also convert to NO [e.g., Steinbacher et al., 2007; Dunlea et al., 2007], and correction factors have to be applied [e.g., Ordóñez et al., 2006]. Discrepancies between the different measurements can partly be attributed to the nonrepresentativeness of point measurements [e.g., Ordóñez et al., 2006; Schaub et al., 2006; Celarier et al., 2008; Boersma et al., 2009]. Hence, ground-based in situ monitor validation is best performed for background sites [e.g., Lamsal et al., 2010]. Especially at urban hotspots, the traditional and localized ground-based validation attempt fails [e.g., Lamsal et al., 2010].
 In situ measurements of NO2 on an aircraft make profile measurements possible when the altitude of the aircraft is changed. Usually, spiral flight patterns are performed for this. The concentrations can then be integrated to yield partial column densities. However, missing layers have to be interpolated and the layers above maximum flight altitude or below the minimum extrapolated to obtain a tropVCD to compare with satellite products [Heland et al., 2002; Ladstätter-Weißenmayer et al., 2003; Martin et al., 2004; Bucsela et al., 2008; Boersma et al., 2008; Hains et al., 2010]. This approach is prone to introduce errors for the bottom layer where most of the NO2 resides and an aircraft is not allowed to fly, at least not if not taking off or landing at an airport.
 Ground-based multi-axis (MAX) DOAS measurements have been used for comparison with satellites since those instruments not only provide tropVCDs, but also surface concentrations for comparison with NOx analyzers [e.g., Brinksma et al., 2008]. Measurements usually agree within the limits of the uncertainties. But the quality of agreement is variable depending on satellite product, but also season and location. MAX-DOAS tropVCDs at highly polluted areas seem to indicate that also those measurements, similar to the ground-based in situ sensors, are more sensitive to local emissions and yield higher columns than the satellite (Figure 1) [e.g., Irie et al., 2008; Brinksma et al., 2008; Ma et al., 2013; Irie et al., 2012]. Other remote-sensing ground-based techniques that have been used for satellite validation include Lidar [Volten et al., 2009], zenith-sky DOAS measurements [Petritoli et al., 2004; Chen et al., 2009], and direct-sun DOAS [Wenig et al., 2008].
 Although airborne DOAS NO2 measurements have been performed for several years [e.g., Melamed et al., 2003; Wang et al., 2005; Merlaud et al., 2011], only in three studies, NO2 tropVCDs were compared to satellite measurements but with very limited data from only one day of measurements each [Heue et al., 2005, 2008; Bruns et al., 2006; Walter et al., 2012]. Knowledge of the horizontal variability of tropVCDs can also be obtained by ground-based mobile MAX-DOAS, and those tropVCDs were compared to OMI but again only for a few days of measurements [Wagner et al., 2010; Shaiganfar et al., 2011].
 The advantages of using the CU-AMAX-DOAS measurements recorded during summer 2010 over California for NASA OMI NO2 tropVCD validation are the vast amount of data collected over a time period of two months (overall 206 h) spanning an area covering Los Angeles and surroundings (and to a lesser extent Sacramento and surroundings).
2 Instrumentation and Measurements
 The CU-AMAX-DOAS measurements used for this validation study were recorded during the CalNex and CARES campaigns, both took place in California in summer 2010. The CU-AMAX-DOAS instrument and albedo sensors were mounted to the NOAA Twin Otter remote-sensing aircraft which was based out of Ontario International Airport during CalNex from 19 May – 15 June and 30 June – 19 July and for a shorter time period (16 – 29 June), McClellan Airport in Sacramento during CARES. The majority of the measurements were performed over the Greater Los Angeles Area and its relatively unpolluted surroundings. Hence, measurements of the study presented here provide a large range of NOx conditions from background to highly polluted.
2.1 CU-AMAX-DOAS Instrument and NO2 tropVCD Retrievals
 The instrument and its performance during CalNex and CARES are described in detail in Baidar et al. . Briefly, a telescope pylon is window-plate-mounted pointing parallel to the flight direction. A motor-mounted prism in the telescope permits scanning in forward, upward, and downward directions under discrete elevation angles with a field of view (FOV) of 0.3° × 5.9°. Independently of the movement of the aircraft, the angle is held constant with respect to horizontal by means of an angle sensor coupled to a motor adjusted with controller times of ~20 Hz and an overall 1 sigma angle accuracy of 0.35°. The collected light is transmitted by a split glass fiber bundle into two temperature-stabilized spectrometer/CCD detector systems. In this study, spectra only from one of the systems are used with a wavelength range of 330 – 470 nm and a resolution of 0.7 nm full-width half-maximum (FWHM). The spectra are analyzed with the well-known DOAS technique [e.g., Platt and Stutz, 2008]. Here, the WinDOAS software [Fayt and Van Roozendael, 2001] was used to retrieve NO2 slant column densities (SCDs) in a wavelength range from 433 to 460 nm applying two NO2 absorption cross sections at 220 K and 294 K [Vandaele et al., 2002], and further including cross sections for the oxygen dimer [Greenblatt et al., 1990], ozone [Bogumil et al., 2001], glyoxal [Volkamer et al. 2005], and water vapor [Rothman et al., 2005]. A ring cross section was calculated with the MFC software [Gomer et al., 1993] from the zenith reference spectrum and taken into account as a pseudo-absorber in the DOAS fit. A polynomial of the order of three was used and a straylight correction (see Fayt and Van Roozendael, 2001) applied. All spectra of one flight were analyzed in comparison to one common zenith reference spectrum recorded during that particular flight in a clean environment and flying at relatively high altitudes, i.e., ca. 3 – 5 km. The results are trace gas differential SCDs (dSCDs), i.e., the integrated absorber density along the average photon path from the sun to the telescope, and differential with regards to the amount of NO2 present in the reference spectrum.
 In the following, only nadir and zenith spectra are investigated, for other geometries see Baidar et al. . Stratospheric NO2 concentrations are relatively stable but undergo photochemical reactions resulting in a distinct diurnal pattern with higher NO2 towards higher SZAs. In addition, the dSCDs also increase towards higher SZAs due to the longer light path through the stratosphere. Therefore, for some early or late measurements, fast changes in the stratospheric NO2 over the course of a flight occurred. This change is a smooth function with time of day for a certain location. However, changing the location of the aircraft also changes the local SZA. A polynomial of the order of three was fitted through all the zenith dSCDs (the frequency for zenith measurements varied, but was up to every 30 s towards the latter half of the campaign, see also below) of such a flight with aircraft altitudes of 1800 m above sea level (ASL) or more. This polynomial was then subtracted from the nadir dSCDs which yields an effective tropospheric SCD (tropSCD). The polynomial approach was chosen instead of a linear interpolation of the zenith dSCDs in order to better account for slight variations in the possible residual tropospheric NO2 originating from possible tropospheric NO2 above the flight altitude as well as changes in stratospheric NO2 due to the slight changes in the local SZA. This leads to a relatively high uncertainty of 0.5 – 2.0 × 1015 molecule cm−2 in the stratospheric correction when the polynomial had to be applied to the measurements. For many flights, the nadir dSCDs can be directly interpreted as a tropSCD assuming that the zenith reference spectrum, obtained at altitudes >3 km, only contains stratospheric NO2 and that the change in stratospheric NO2 is negligible during a 4 h flight. Of 41 flights, 18 indicated a changing stratospheric NO2 load during parts of the flight, and those were corrected with a polynomial.
 The original or corrected tropSCD is then converted to a tropVCD with a geometric air mass factor (geoAMF):
 For this geoAMF, the assumption is made that the photons reach the telescope after being reflected from the Earth's surface in the nadir point. Hence, the slant column in comparison to the vertical column is weighted by the secant function of the SZA plus unity for the part of the lightpath from the nadir point into the telescope. The aircraft height and hence the integration height is implicitly included in this equation by applying the trigonometric function of a right-angled triangle. After extensive radiative transfer calculations simulating a range of possible scenarios for NO2 mixing ratios and mixing heights, optical aerosol parameters, AOD, and surface albedo, the geometric approximation was found to be an adequate approximation and chosen here to be independent of climatology or model NO2 profile data which would be needed as a priori in radiative transfer simulated AMFs. Baidar et al.  have shown that this geometric conversion yields reliable results within 10% comparing with ground-based MAX-DOAS instruments. The error due to the use of geoAMF depends on the SZA and is accurate to within 6% for SZA <20°, 10% for SZA <50°, and 20% for SZA <60° when compared to radiative transfer simulation performed with the McArtim code [Deutschmann et al., 2011] for the predominant conditions encountered during the CalNex/CARES campaigns [Baidar et al., 2013]. Most flights were conducted at SZA <60°. The uncertainty on the NO2 cross section is ca. 5% depending on the temperature [Vandaele et al., 2002]. For the nadir dSCD, the 1 sigma DOAS fitting error is ca. 3% and the limit of detection ca. 3.2 × 1015 molecule cm−2 for 2 s integration time. In addition, the uncertainty in the geoAMF (see above) and the uncertainty of 0.5 – 2.0 × 1015 molecule cm−2 in the stratospheric correction have to be taken into account. An overall estimate for the uncertainty of the tropVCD is 10 – 20% depending on the SZA. Please note that the overpass of the AURA satellite occurs at about 17° SZA over California and therefore at a time closest to the minimum uncertainty of the AMAX-DOAS measurements.
 The CU-AMAX-DOAS data acquisition software was improved over the course of the two months resulting in integration times of 2 s for the latter half of the campaigns. The scanning sequence when flying at altitudes above 2 km above ground level (AGL) included the angles 90°, −2°, −5°, −10°, −20°, −90°, 0°, 2°, 5°, 10°, and 20° (positive angles upward and negative downward), with a larger number of observations on the downward looking angles. Typically, nadir spectra were recorded every 12 – 15 s. With a typical aircraft speed of 65 m s−1, this corresponds to a horizontal translation of about 900 m. The FOV of the telescope gives a nadir footprint of ~20 m across and ~550 m along the track while flying at 4 km altitude for 2 s integration times.
2.2 NASA OMI Instrument and NO2 tropVCD Retrievals
 OMI flies on the NASA Aura satellite which was launched into a near-polar sun-synchronous orbit in 2004 [Levelt et al., 2006]. The crossing of the equator occurs at 13:45 local time. Two-dimensional CCD detectors span the wavelength range from 270 nm to 500 nm (FWHM 0.45 nm – 1.0 nm) in one dimension, and the other dimension is binned to monitor 60 different ground footprints perpendicular to the flight direction in a push broom manner. The width of the resulting swath is ca. 2600 km, and global coverage is obtained within one day. The size of the ground footprint varies across the swath from 13 × 24 km2 at nadir to ~40 × 160 km2 for the edge of the orbit due to the optical aberrations and asymmetric alignment. Due to the so-called row anomaly caused by partially blocked entrance optics, some of the 60 pixels have to be excluded in the further analysis.
 The NO2 tropVCDs data product used here (v2.1) is the standard NASA OMI NO2 product and is based on the algorithm described in Bucsela et al. . Briefly, the recorded spectra are analyzed with the DOAS method in a fitting window from 405 nm to 465 nm applying the Vandaele et al.  NO2 absorption cross section and a reference solar irradiance spectrum. The obtained SCDs are then corrected for instrumental artifacts. This is called destriping since the effect varies across the orbital track. To separate stratospheric and tropospheric components, application of stratospheric AMFs to destriped SCDs yield initial VCDs. Areas of tropospheric contamination in the stratospheric NO2 field are identified using monthly mean tropospheric NO2 columns from GMI simulations. Those regions are then masked, and the residual field of the stratospheric VCDs measured outside the masked regions is interpolated to estimate stratospheric NO2 columns for each measurement. The stratospheric SCDs are subsequently subtracted from the original SCDs yielding the tropSCDs. The tropospheric AMFs (tropAMF) are calculated using a precomputed scattering-weight table from TOMRAD [Davé, 1965] and monthly mean NO2 profiles from the GMI simulation. The algorithm uses OMI-based reflectivity [Kleipool et al., 2008], a cloud fraction and cloud pressure derived as described by Acarreta et al. , temperature profiles from the GEOS-5 meteorological field, and the ETOPO5 topography. The tropopause height is obtained from GEOS-5 monthly tropopause pressures. The effects of aerosols on the OMI NO2 retrieval is implicitly accounted for through the use of the OMI-derived surface reflectivity which is usually larger than the true surface reflectivity due to scattering from aerosols [Torres et al., 2007], and through the OMI cloud parameters [Boersma et al., 2011].
 The uncertainty in the SCDs is ca. 1015 molecule cm−2 which corresponds to about 10% of the total slant column for polluted regions, and the stratospheric corrections leads to an uncertainty of ca. 2 × 1014 molecule cm−2 [Bucsela et al., 2013]. Stratospheric AMFs have an uncertainty of 1 – 2% and the tropAMF uncertainties range from ca. 20% for low cloud fraction to 30 – 80% for high cloud fraction, but are highly dependent on the NO2 profile shape [Wenig et al., 2008; Bucsela et al., 2013]. The overall error on the tropVCD is <30% under clear-sky conditions and typical polluted conditions (>1 × 1015 molec cm−2) [Boersma et al., 2009; Hains et al., 2010; Irie et al., 2012; Bucsela et al., 2013].
2.3 Albedo Sensor
 The University of Colorado albedo instrument consists of two four channel radiation sensors (Skye instruments SKR 1850) mounted to the top and the bottom of the NOAA Twin Otter research aircraft pointing straight upwards and downwards. Each of the four telescopes is equipped with a custom interference filter with ~10 nm wide transmission centered at 361 nm, 479 nm, 629 nm, and 868 nm. The sensors are fitted with a cosine correcting diffuser plate to measure irradiance from the hemispherical distribution. To ensure the upward and downward facing channel pairs at any given wavelength are directly comparable, a calibration factor for each channel was determined by simultaneous zenith measurement of solar radiation during a bright sunny day. 1 Hz data were recorded during all flights. The ratio of the normalized up-welling counts to the normalized down-welling counts is defined as the surface albedo. Atmospheric backscatter for higher flight altitudes was corrected as follows: Radiative transfer simulations show that outside the aerosol layer, up-welling and down-welling radiation is a linear function of sensor altitude. Surface albedo measurements from high altitude flights were corrected for the base altitude of 1100 m AGL which was assumed to be just outside the aerosol layer. After this correction, surface albedo measured during high altitude flights showed good agreement with low altitude flights (<1000 m, See Figure 7), which were conducted to minimize the need for atmospheric correction. During postprocessing, the 1 Hz radiation data from the two sensors were averaged for 30 s to maximize signal to noise. The overall uncertainty on the surface albedo is ±5%. The instantaneous footprint of hemispherical irradiance measurements is a circle with ~2.5 km radius, while flying at 2.5 km, which is smaller than the OMI pixel size (see section 2.2).
2.4 Flight Planning and Measurements
 The Twin Otter flight plan included usually a morning flight (ca. 10 – 14 h local Pacific Daytime Time) with emphasis on a combination of low approaches to obtain profile information and constant flight altitude at ca. 2 – 2.5 km AGL, just outside the boundary layer, followed by an afternoon flight (ca. 15 – 19 h) with a constant flight altitude of ~3.5 – 4 km AGL. Overall, we performed 51 research flights on 30 days over the two month period, but only 41 flights took place on the same days as OMI measurements. This is due to the row anomaly of this instrument (see section 2.2). Coordination with satellite overpasses was not the primary focus of the flight planning. Rather, the frequent OMI overpasses coupled with the large swath width of the OMI instrument result in a high number of flights that are suitable for satellite validation.
3 Satellite Validation
3.1 Comparison of All Data
 In a first step, tropVCDs are grouped to calculate the median values for specific regions over California covering the whole two months, i.e., individual measurements of the two observational platforms are not directly compared, but rather the respective medians of distributions within a specific region. The regions, as outlined by boxes in Figure 2, are chosen at locations with sufficient data points of AMAX-DOAS measurements. These mainly urban areas include the Greater Los Angeles Area (boxes 1 – 5), Bakersfield (boxes 6 – 7), and Sacramento (boxes 8 – 10). OMI tropVCDs are selected when the center of a pixel is located within a box on the same day a Twin Otter flight took place over this box. Days on which either data set is not available are excluded. There is no other temporal coincidence criterion applied resulting in possible time differences between the AMAX-DOAS measurements of up to 7 h before and 5 h after the satellite overpass. Figure 3 shows the medians split by region for the two instruments. Both data sets follow a similar trend: high NO2 tropVCDs are observed over the Los Angeles Metropolitan Area (boxes 1 and 2) where many major motorways intersect. The Los Angeles International Airport (LAX) and the Port of Los Angeles are both situated in box 2. The Inland Empire also displays very high NO2 (box 3). Lowest NO2 is found west of Bakersfield (box 7). In general, the regions around Bakersfield and Sacramento are cleaner than the Greater Los Angeles Area. As expected, NO2 amounts are closely related to population and transportation. The population of the Greater Los Angeles Area is 17,877,006 (2010 US census, http://2010.census.gov/2010census/data/). In comparison, the population of Sacramento is 466,488 and Bakersfield 347,483. In the so-called South Coast Air Basin (SCAB), a geopolitical area defined for the purpose of air quality management and delimited by the Pacific Ocean to the west and mountain ranges in all other directions, on-road motor vehicles and other mobile sources together account for 91% of the total NOx emissions (Emission data for 2008 by California Air Resources Board: http://www.arb.ca.gov/app/emsinv/fcemssumcat2009.php).
 The satellite data are lower than the AMAX-DOAS measurements for enhanced NO2, see boxes 1 and 3. This is also reflected in the slope of 1.4 ± 0.1 for the linear regression analysis of the two data sets (Figure 4, CU-AMAX-DOAS vs. NASA OMI, Pearson correlation coefficient R = 0.97, offset −0.7 ± 0.5 × 1015 molecule cm−2). If the data from boxes 1 and 3 are omitted, the slope is reduced to 0.9 ± 0.1 and hence insignificantly different from 1. The Pearson correlation coefficient is 0.96 and the offset 0.4 ± 0.3 × 1015 molecule cm−2. Please note, the simple linear regression does not take the uncertainty of the individual data points into account, and these offsets are much smaller than the range of the 25th and 75th percentile of the data (see Figure 3) or the uncertainty of the stratospheric correction. The reason for the underestimation of the OMI data for enhanced NO2 most likely originates from the selection criteria for the satellite pixels. As stated above, the center of the ground footprint has to be within the box. Furthermore, also pixels at the side of the OMI swath are included and those can be up to 160 km wide. This results in a bias to background values since the actual area sampled by OMI included in the average will extend beyond the delimiters of the boxes by up to 80 km. Especially, the highly polluted boxes are surrounded by cleaner areas, e.g., box 3 is adjacent to the San Gabriel Mountains and the High Desert located to the north and west.
 Only for the Greater Los Angeles Area (boxes 1 – 5), statistics were good enough to separate the data into weekend and weekday measurements. A clear decrease can be observed (see Figure 3) which is caused by a reduction of heavy-duty diesel-fueled vehicles and is consistent with previous studies for California [e.g., Marr and Harley, 2002; Chinkin et al., 2003; Pollack et al., 2012 and references therein]: the NO2 is lower by 38 ± 24% and 33 ± 11% for AMAX-DOAS and OMI, respectively. In the regression analysis (see Figure 4), the slope for the weekday data of 1.8 ± 0.3 is clearly larger than the 1:1-line, whereas the weekend regression with a slope of 1.1 ± 0.2 is consistent with a slope of unity. Both offsets are insignificantly different from zero. However, the two slopes are statistically different indicating as already mentioned above, the overestimation of the AMAX-DOAS data in comparison to the OMI measurements is nonlinear. The reason for this is the exponential increase of the satellite pixels towards the edge of the swath.
3.2 Comparison Based on Individual OMI Pixels
 The large number of data points of the CU-AMAX-DOAS instrument allows for a validation of the NASA OMI data on a pixel level. Figure 5 shows an example flight on 1 June 2010 where the AMAX-DOAS tropVCDs are plotted on top of the gridded OMI data. The individual AMAX-DOAS measurements can be combined within the area of a satellite pixel to calculate a mean AMAX-DOAS NO2 tropVCD which can then be compared to the specific OMI tropVCD. In order to identify the AMAX-DOAS data points within that area that is given by the four corner points of the OMI pixel, the so-called point-in-polygon problem was approached with a ray-casting algorithm. The flight presented in Figure 5 is from earlier in the campaign where the detector exposure times were 20 s. Nevertheless, there are up to four AMAX-DOAS measurements per OMI pixel.
 Several factors can affect the quality of the comparison exercise. In the following, we discuss five key parameters: (1) Since the NO2 lifetime is in the range of a few hours and also transport can take place, the time difference between the measurements is very important. (2) Although most sources of NO2 are close to the surface, our vertical profile measurements during this campaign have shown that the NO2 can be mixed in the BL up to ca. 2 km [Baidar et al., 2013] and hence the aircraft altitude determines the fraction of the NO2 tropospheric column sampled by the AMAX-DOAS. (3) As was observed above, the quality of the comparison suffers when the large pixels at the sides of the swath are included, and hence this parameter is tested here as well. (4) Obviously, the cloud radiance fraction determines how representative the measured OMI column is for the whole pixel. In general, the cloud radiance fraction during the campaign was rather low though.
 (5) Further, we define a quantity we call the normalized distance to assess the representativeness of the AMAX-DOAS measurements within a pixel with respect to the sampled area. The OMI pixels are divided into similarly sized subgrid cells with a side length of ~7.5 km each resulting in 3 × 2 to 20 × 4 dimensional subgrids. This variation is due to the increasing footprint of the OMI measurements from the middle to the side of the swath. In a first step, it is checked whether the above defined subgrid cells contain any data points. Matrices with the dimension of the subgrid are defined for each of these individual cells. The step distance between occupied subgrid cells are assigned as the matrix elements for the individual occupied cell matrix. For example, the distance between adjacent occupied cells is 1, between diagonally adjacent occupied cells 2, a knight's move distance is 3, etc. These individual elements are summed up and divided by the value of the best case scenario, i.e., every subgrid cell contains at least one data point. Consequently, the normalized distance is a number between 0 and 1 and an approximate measure for the distribution of the AMAX-DOAS data points within an OMI pixel. How to obtain the normalized distance is sketched in Figure A1 for an example of three occupied cells of a 3 × 2 subgrid. This normalized distance is less dependent on the actual number of data points which varied during the campaign due to different integration times.
 The data for the example flight of Figure 5 are summarized in Figure 6f with each spatially coincident measurement identified by a letter and number coordinate as defined in Figure 5. Figures 6a – e show the key parameters as described above: The normalized distance of the AMAX-DOAS data points (Figure 6a), the OMI cloud radiation fraction (Figure 6b), the mean time difference between the AMAX-DOAS measurements and the OMI measurement (Figure 6c), the aircraft altitude (Figure 6d), and the number of AMAX-DOAS data points (Figure 6e). During this example flight, the included data points are from the OMI pixel numbers 18 to 24 (with 30 and 31 being the center of the orbit) within the OMI swath. With a combination of selection criteria as indicated by the horizontal red lines in Figures 6a – d (normalized distance >0.01, cloud radiance fraction <50%, OMI overpass within ±2 h, flight altitude >2 km), the initial 31 data points are reduced to 13 as indicated by the blue marks below panel f. The resulting slope of the linear correlation is 0.8 ± 0.2, with a moderately strong correlation coefficient of R = 0.78 and an offset of 1.1 ± 0.9 × 1015 molecule cm−2. Please note the uncertainties of the measurements are neglected in the regression since the uncertainty for each individual data point is unknown (see sections 2.1 and 2.2). This and all following regression results are based on the AMAX-DOAS data acting as ordinate and the OMI data as abscissa in the Cartesian coordinate system.
 In the following, this kind of regression analysis is applied to all of the available data. As before, the initial selection criterion is that the NASA OMI and the CU-AMAX-DOAS measurements were performed on the same day. This results in 1016 individual OMI measurements being initially included in the comparison and 34,058 colocated AMAX-DOAS measurements for the two months of the measurement period. For the comparison of the two data sets, the AMAX-DOAS measurements are first averaged within the subgrid cells as defined for the calculation of the normalized distance, and the resulting mean values are then again averaged. This two-step calculation of the mean is done to avoid any weighting towards heavily sampled areas of the OMI pixel. Table 1 shows the results for the linear regression analysis for the averaged AMAX-DOAS measurements compared to the OMI data and how the results change when constraining the above described key parameters.
Table 1. Regression Analysis for All Data on the Pixel Levela
 In general, the correlation coefficient increases with more stringent limits and so does the slope up to values significantly higher than 1 (see Table 1), i.e., the AMAX-DOAS measurements are higher than the NASA OMI ones (by up to 39 ± 8%). The only exception to the increase of the slope with stricter coincidence criteria is when removing the large OMI pixels at the sides of the swaths from the comparison; then the slope actually decreases (although not significantly within the uncertainty range). The offset does not seem to follow a certain pattern, but is mostly positive. However, the numbers are smaller than the uncertainty on the stratospheric correction of 0.5 – 2 × 1015 molecule cm−2 for the CU-AMAX-DOAS measurements and 0.2 × 1015 molecule cm−2 for the NASA OMI. This somewhat surprising trend in the slope can be explained when viewed in relation to the results described in the following exercise.
 Table 2 and Figure A2 summarize the effect of successively applying the weakest selection criteria to the full data set of the initial 1016 OMI pixels again. The weakest criteria were chosen to leave sufficient data points for good statistics. After removing the sides of the swath, adding any other limitation results in a strong correlation of R ≥ 0.82 and does not change the slope significantly from 1. In the end, for the remaining pixels, the agreement is excellent, and the concluding slope is 1.03 ± 0.09. After removing the large pixels of the side of the swath, constraining the normalized distance decreases the standard deviation of the fit and increases the correlation coefficient showing that the normalized distance is indeed an important factor. Limiting the cloud radiance fraction only results in negligible effects since the pixels had more or less already a very low cloud fraction to start with. Reducing the difference in the OMI overpass time to ±2 h within the AMAX-DOAS measurements removed many of the data points, and the results are not very conclusive anymore. The standard deviation of the fit as well as the uncertainty on the slope significantly increased. The consecutive removal of measurements with an aircraft altitude <2 km does not change the picture anymore which suggests that most of the NO2 is below 1.8 km, the initial selection criterion.
Table 2. Regression Analysis Results for All Data on the Pixel Level for a Combination of the Best-Off-Scenarios Based on Table 1a
 As mentioned above, these results seem to be in contrast to Table 1 where the individual limiting criteria mostly increased the slope, i.e., the AMAX-DOAS data seem to overestimate the NO2 in comparison to the OMI pixel. This previous regression analysis was performed including all pixels of the OMI swath. The area of the footprint of a nadir AMAX-DOAS pixel is 0.011 km2 when flying at 4 km altitude and integrating for 2 s (see section 2.1 last paragraph). This is relatively small in comparison to the OMI footprint area of 312–6400 km2. This indicates that in combination with the small FOV of the AMAX-DOAS instrument, the NOAA Twin Otter is not fast enough to representatively sample the large OMI pixels in a sufficiently short time interval. Since the flight tracks were mainly over polluted areas, the MAX-DOAS measurements are automatically biased to the enhanced NO2. In order to verify this, even more satellite data are needed so that the regression presented in Table 1 could be amended to only include OMI data with pixel numbers of 6–55 or even 11–50. However, OMI pixels in the same swath and adjacent swaths are measured almost simultaneously. Hence, to improve on this validation technique, extended time periods of measurements, several aircrafts measuring simultaneously, a larger FOV of the AMAX-DOAS instrument, faster aircrafts, or a combination of the four is needed.
4 Sensitivity Studies on the Satellite AMF
 In this section, the assumptions of the satellite radiative transfer are tested with available auxiliary measurements of trace gas profiles, aerosol extinction profiles, and surface albedo. As can be seen from Figure 1, the sensitivity of the measurements towards the actual NO2 profile changes with altitude. Hence, the tropAMF depends on the a priori trace gas profile:
with BAMFi and VCDia priori being the BAMF and the trace gas a priori partial vertical column of the layer i and tropVCDa priori the integrated a priori profile. The summation is performed from the surface to the tropopause for discrete layers. If the true profile is different from the a priori profile, this can introduce significant errors in the tropVCD retrieved with this tropAMF.
 As mentioned above, the satellite retrieval uncertainty is dominated by the uncertainty in the tropAMF calculation and there the most important parameters are trace gas concentration profile and surface albedo [Boersma et al., 2004; Richter and Burrows, 2002]. By using independent profile observations measured by the AMAX-DOAS instrument while flying low approaches over airports [see Baidar et al., 2013] as well as surface albedo by the radiometers, sensitivity studies were performed to test the assumptions made in the NASA OMI retrievals. As mentioned above, the radiative transfer calculation of the OMI retrievals parameterizes the aerosol load in the atmosphere in combination with the surface albedo by applying the so-called effective albedo calculated from the measurement itself. Here, a case was chosen for an OMI nadir pixel for Los Angeles for July 2010 which was deemed as relatively low cloud by the OMI retrieval algorithm having an effective albedo of 0.083. The corresponding NO2 profile is given in Figure 8a and was obtained from the GSFC GMI CTM model. The resulting tropAMF is 1.41 at 435 nm.
 Figure 7 shows the surface albedo as measured from the Twin Otter in summer 2010 over the Greater Los Angeles Area, parts of the High Desert and over the San Bernardino Mountains. Mostly, the surface albedo is around 0.10 ± 0.02. Only over some of the mountain ranges and the Pacific Ocean, the surface albedo is lower: ca. 0.03 – 0.07. Figures 8a and 8b show the NO2 and the 477 nm aerosol extinction profiles measured during four low approaches over the Greater Los Angeles Area (location marked in Figure 2b) on 16 July 2010. Both shape and absolute values of the trace gas profiles differ significantly from the model profile.
 Here, tropAMFs are calculated for the OMI nadir viewing geometry at a SZA of 17° for a clear-sky scenario. Simulations are performed for the five different NO2 profiles from Figure 8a. The McArtim code is employed because aerosol extinction can be treated explicitly. The aerosol optical properties were chosen as 0.94 for the single scattering albedo, typical for wavelengths >400 nm, and 0.68 as asymmetry parameter, typical for polluted environments. These are the setting used for retrieving the NO2 and aerosol profiles shown in Figure 8 using the same radiative transfer model McArtim. The aerosol extinction profiles corresponding to the NO2 profiles at the four different locations are applied (see Figure 8b). Calculations were also performed for a Rayleigh atmosphere with 0.083 (corresponds to the effective albedo) and 0.1 surface albedo. Profiles for pressure, temperature, and ozone were taken from the US standard atmosphere to be consistent with what was used during the profile retrievals [Baidar et al., 2013]. However, using the GEOS-5 meteorological field (which drives the GMI model) pressure and temperature profiles increases the tropAMFs by ~1 – 2%. The tropAMF obtained by mimicking the OMI retrievals by using the albedo settings (i.e., effective albedo of 0.083 and no aerosols) and the NO2 model profile results in a tropAMF of 1.36 which is within 4% of the actual OMI tropAMF applied to the measured slant columns. The aim of these radiative transfer calculation comparisons is not to reproduce the exact OMI tropAMF, but rather to study the differences to the standard scenario, i.e., 1.36 tropAMF in order to identify the main contributors as well as obtaining an estimate for the satellite tropAMF uncertainty.
 Table 3 summarizes the results for the different scenarios. On average, the GMI model profile results in a larger tropAMF in comparison to the measured NO2 profile tropAMF by 11–15%. An increase in the surface albedo of 20% results in an average tropAMF increase of 9% for the measured profiles. When using the GMI model profile in combination with the different measured aerosol extinction profiles, the tropAMF changes by ≲1%. The weak dependence of the satellite (but also the AMAX-DOAS) tropAMF on the aerosol extinction profile was already explained in the introduction. Using a combination of the Twin Otter measurements for surface albedo, aerosol extinction profile, and NO2 profile and giving equal weights to the four individual profile combinations yields on average a tropAMF of 1.31. This value is about 4% lower than the tropAMF of 1.36 using the OMI settings, i.e., an effective albedo of 0.083 in combination with a Rayleigh atmosphere, and the model NO2 profile. Looking at the individual measured profiles, the maximum deviation is for the coastal profile at Santa Monica with 12%. It is worth noting that this value of 12% is smaller than the differences caused by only using the different NO2 profiles, i.e., up to 15%. Applying the measured effective albedo seems to partly offset the nonideal choice of the model trace gas profile.
Table 3. Satellite tropAMF Sensitivity Studies for 17° SZA and 435 nma
 In summary, the NASA OMI tropAMF for the Los Angeles area for July 2010 is only as good as the estimate for the albedo and especially the trace gas profile. Considering the assumptions in the simulations and the range of values, a tripling of the average difference seems to be appropriate to estimate the satellite tropAMF uncertainty for the Los Angeles area for the summer to be 12%. This is much better than the estimate described in section 2.2 for OMI retrievals in general (ca. 20% for low cloud fraction, 30 – 80% for high cloud fraction [Wenig et al., 2008; Bucsela et al., 2013]), but is consistent with the results of the linear regression study in the previous section.
5 Additional Error Sources
 Error sources in the OMI and the AMAX-DOAS data products which have not been discussed so far in this manuscript are the effect of the orography on the field in the RTM calculations, the temperature dependence in the NO2 absorption cross section, and the contribution to the NO2 column from above the aircraft altitude: The radiative transfer models used in this study treat the atmosphere and the surface elevation as isotropic. However, the terrain height of the SCAB is highly variable. On the other hand, areas with highest NO2 are usually not in the foothills of the surrounding mountains. Also, this effect should become significant only for large SZAs in combination with the position of the sun in the direction of a mountain range. Hence, the influence is most likely negligible, especially when investigating statistical ensembles of data. A detailed study is beyond the scope of this manuscript.
 The AMAX-DOAS analysis was performed with a combination of two NO2 cross sections at 220 K and 294 K. When using only the cross section at 294 K in the analysis, the differences in those dSCDs to the dSCDs obtained with the combination of the two absorption cross sections are negligible. However, using the 220 K cross section yields 20% smaller dSCDs. This highlights the need for treating the temperature dependence of the NO2 absorption cross section in the two retrievals in a similar way for a meaningful comparison. In the satellite algorithm, the temperature dependence of the cross section is treated explicitly by assimilating a temperature profile and calculating a correction factor for the cross section [Bucsela et al., 2013]. The average temperature in the lowest kilometer in the model profile for July over California is 295 K and therefore close to the value in the AMAX-DOAS retrievals.
 NO2 can be produced in the free troposphere from lightning, but California was mainly cloud free during the campaign. However, the GMI model profile yields a partial column of 4 × 1014 molecule cm−2 from 5 km to the tropopause. This partial column above the aircraft is comparable to the offsets found in Table 1.
 The CU-AMAX-DOAS retrieval error is already low, but could possibly be improved for higher SZA by using optimal estimation to obtain a vertical column including observations from additional viewing angles. This will be examined in a follow-up research paper.
6 Summary and Conclusions
 In this paper we presented (1) a comparison of NASA OMI with CU-AMAX-DOAS NO2 median tropVCDs over 2 months for defined areas and a comparison on an individual OMI pixel basis over California in summer 2010, and (2) sensitivity studies on the satellite AMF. The results can be summarized as follows:
 Fair agreement was found for the temporally and spatially averaged data comparison (section 3.1). The slope between the tropVCDs of the two instruments is 1.4 ± 0.1 and the correlation coefficient 0.97. This is caused by a combination of large OMI pixels at the side of the swath and the SCAB being surrounded by relatively unpolluted areas. Therefore, large pixels should be excluded if the coincidence criterion is based on pixel center coordinates.
 The mean AMAX-DOAS tropVCDs were calculated for coincident OMI pixels. A regression analysis was performed and the successive application of a combination of individual selection criteria to the data (i.e., pixel number in the OMI swath, normalized distance, cloud radiance fraction, satellite overpass time, and aircraft altitude) led to a strong correlation of 0.85 and a slope of 1.03 ± 0.09 in the end showing good agreement. The main driver seems to be the removal of the large pixels from the comparison. The AMAX-DOAS footprint area of ~0.011 km2 is relatively small in comparison to the OMI footprint area of 312–6400 km2. Optimizing the normalized distance is also important. The cloud radiance fraction and the aircraft altitude have only minor impacts on the results since the former is very low anyways over Los Angeles in summer and in the latter case since most of the tropospheric NO2 seems to be confined below 1800 m.
 The statistics of this validation technique can be improved by extending the time periods of measurements, coordinating several aircrafts at the same time, developing an AMAX-DOAS instrument with a larger FOV, deploying a faster aircraft or a combination of the four.
 Sensitivity studies on the satellite AMF showed that the radiative transfer is rather independent of the range of aerosol load as encountered during the campaign, but highly dependent on the surface albedo and the trace gas a priori profile shape.
 The uncertainty of the NASA OMI tropAMF is estimated to be 12% for summer over the Los Angeles area. Notably, the area probed is characterized by a rather high surface albedo (here 10% at 479 nm) and low AOD. A generalization of the satellite uncertainty over areas with a different surface albedo may not be straightforward.
 The observed weekly cycle of the NO2 is consistent with previous studies. A decrease of 38 ± 24% and 33 ± 11% for AMAX-DOAS and OMI measurements, respectively, is found for the weekend in comparison to weekdays. While the effects on ozone are not the subject of this study, we note that this decrease in NO2 leads to higher ozone during weekends in June (consistent with Pollack et al. , and references therein), but lower ozone during a hot weekend case in July [Baidar et al., 2012].
 The aircraft observations reported here were only partially intended for satellite validation. Purposeful validation, flying grids coincident in time and space with the satellite overpass, and adjustments to the actual AMAX-DOAS instrument could provide a more detailed data set. All this said, it can be concluded that (1) CU-AMAX-DOAS is well suited for satellite validation when the above precautions are taken into account, and the data collected over California are a valuable data set to validate other satellite instruments and (2) the NASA OMI tropospheric NO2 product (v2.1) delivers high quality data for the Californian summer season.
 This study was supported by the California Air Resource Board contract 09-317, the National Science Foundation CAREER award ATM-847793, CU start-up funds (RV) and an ESRL-CIRES graduate fellowship (SB). We thank T. Deutschmann (University of Heidelberg, Germany) for providing McArtim, C. Fayt and M. v. Roozendael (IASB-BIRA, Belgium) for WinDOAS, S.-W. Kim, A. O. Langford, and C. J. Senff (NOAA, Boulder) for discussions, A. Schneider for the GPS visualizer (http://www.gpsvisualizer.com/), and the NOAA Twin Otter flight crew for their support during the campaigns. We acknowledge Jochen Stutz for providing AERONET data from the Caltech site.