We describe a new algorithm to retrieve SO2 from satellite-measured hyperspectral radiances. We employ the principal component analysis technique in regions with no significant SO2 to capture radiance variability caused by both physical processes (e.g., Rayleigh and Raman scattering and ozone absorption) and measurement artifacts. We use the resulting principal components and SO2 Jacobians calculated with a radiative transfer model to directly estimate SO2 vertical column density in one step. Application to the Ozone Monitoring Instrument (OMI) radiance spectra in 310.5–340 nm demonstrates that this approach can greatly reduce biases in the operational OMI product and decrease the noise by a factor of 2, providing greater sensitivity to anthropogenic emissions. The new algorithm is fast, eliminates the need for instrument-specific radiance correction schemes, and can be easily adapted to other sensors. These attributes make it a promising technique for producing long-term, consistent SO2 records for air quality and climate research.
 Sulfur dioxide (SO2) is an important pollutant gas that can have profound impacts on the Earth's environment. It is a designated criteria air pollutant in many countries, and also a precursor of sulfate aerosols that can significantly affect air quality and climate [e.g., Charlson et al., 1992]. With a relatively short atmospheric lifetime, the average surface concentration of SO2 spans several orders of magnitude between polluted and pristine regions [Chin et al., 2000]. On the other hand, from time to time, sizable transient SO2 plumes can travel into remote oceanic areas [e.g., Hsu et al., 2012]. Given this large inhomogeneity in its distribution, it is imperative to develop capabilities of measuring SO2 globally with good accuracy and precision over relatively small spatial and temporal scales.
 Satellite measurements of global SO2 pollution have undergone substantial improvements over the past 10–15 years owing to the launch of several hyperspectral UV-Visible instruments. Among them is the Ozone Monitoring Instrument (OMI), a Dutch-Finnish sensor flying on NASA's Aura spacecraft that provides daily global coverage at high spatial resolution (13 × 24 km2 at nadir) [Levelt et al., 2006]. The operational OMI level-2 (L2) planetary boundary layer (PBL) SO2 data are produced using the Band Residual Difference (BRD) method that utilizes three selected wavelength pairs to maximize sensitivity to PBL pollution [Krotkov et al., 2006]. While useful for monitoring strong anthropogenic sources [e.g., Fioletov et al., 2011; Li et al., 2010], the OMI PBL SO2 product suffers from the effects of random instrument noise as well as systematic biases [e.g., Lee et al., 2009]. A background correction and multiyear pixel averaging can help to mitigate these issues but may introduce new biases and restrict the time resolution of data analyses [Streets et al., 2013]. Other methods, such as the Iterative Spectral Fitting (ISF) algorithm [Yang et al., 2009], have had some success improving the quality of OMI SO2 retrievals [e.g., He et al., 2012]. Operational implementation of the ISF algorithm, however, has proved difficult owing to the amount of computation involved in the radiative transfer calculations for many wavelengths, and the empirical corrections required to remove retrieval artifacts.
 In this study, we introduce a fundamentally different approach to retrieve SO2 from OMI-measured radiance and irradiance data. Our method is based on principal component analysis (PCA), a statistical technique often employed to reduce dimensionality while retaining the information content of a multivariate data set, by transforming it into a subspace spanned by a set of orthogonal vectors (principle components, PCs). PCA has been applied to compress data and retrieve temperature and moisture profiles from high-resolution infrared satellite instruments [e.g., Huang and Antonelli, 2001]. Guanter et al.  and Joiner et al.  used PCA-based approaches to retrieve terrestrial chlorophyll fluorescence from satellite and ground-based spectral data. As demonstrated below, our algorithm shares a similar general framework with these approaches and can significantly improve the quality of OMI SO2 retrievals as compared with the current operational PBL product.
2.1 General Framework
 To illustrate our approach, we start from the widely used differential optical absorption spectroscopy (DOAS) method for trace gas retrievals. If there are n gases with absorption cross sections σg(λ) at a given wavelength λ, the Sun-normalized Earthshine radiance at the top of the atmosphere (TOA), I(λ)/I0(λ), can be modeled with the weak absorption Beer-Lambert law [e.g., Platt and Stutz, 2008] as
where I(λ) and I0(λ) are the Earthshine radiance and solar irradiance at TOA, respectively, Sg is the number density of gas g along the optical path (slant column density, SCD), P(λ) is a polynomial term representing broadband effects including atmospheric Rayleigh and aerosol/cloud Mie scattering and surface reflectance, and RRS(λ) is a term to account for the rotational-Raman scattering (also known as the Ring effect). Sg can be estimated through least squares fitting that minimizes the differences between the measured and modeled radiance spectra (i.e., left- and right-hand sides of equation (1)). It may then be converted to a vertical column density (Ωg or VCD) with an estimate of the air mass factor (AMF). The AMF is typically calculated at a single wavelength based on a prescribed vertical profile of gas g along with other assumptions.
 Uncertainties in the DOAS fitting can arise from inaccurate modeling of the various physical processes in equation (1) as well as artifacts in the radiance measurements (e.g., stray light). For example, the rotational-Raman effect is very difficult to model accurately in the SO2-relevant spectral window since it involves the filling-in of both telluric and solar lines and is sensitive to cloud properties. The measurement artifacts often require the addition of an effective absorber term in the fitting, but modeling of them can also be quite complicated and may or may not fit the formulation in equation (1). As with the DOAS method, the BRD and ISF algorithms also rely on empirical, instrument-specific corrections to the radiance data in order to reduce retrieval noise and biases.
 Instead of attempting to model all these various factors, we propose to replace them with characteristic features derived directly from the measured Sun-normalized radiances. In this algorithm, the PCA technique is applied to the radiance data to extract a set of PCs that capture most of measurement-to-measurement variation of the radiances (in the absence of the signal of interest). For our problem, we may use data from a region presumed free of SO2 (e.g., the equatorial Pacific). Then, the derived PCs will capture physical and measurement details other than those associated with SO2 absorption. The PCs are ordered so that the first PC explains the most of variance, the second PC explains the second most of variance, and so on. A set of nν PCs (νi) can be used along with the sensitivity of the radiances to the SO2 column (SO2 Jacobians, ) to form a forward model:
where N is a measured N value spectrum (N(λ) = −100 × log10(I(λ)/I0(λ)). For polluted regions with actual SO2 signals, the forward model can be inverted through standard least squares fitting to simultaneously retrieve the VCD of SO2 (ΩSO2) and the coefficients of the PCs (ω). Note that an assumption here is that a linear combination of PCs calculated from SO2-free regions can well describe the non-SO2 affected radiances in SO2-polluted areas. In most cases this assumption should hold true given the relatively weak absorption by SO2 outside of polluted regions. The use of SO2 Jacobians for the entire fitting window also removes the step for converting SCD to VCD using an AMF.
2.2 Application to the OMI Instrument
 OMI level 1B (L1B) radiance and irradiance data in the spectral window of 310.5–340 nm were used in this study, together with the VCD of O3 (ΩO3) from the L2 OMTO3 product [Bhartia and Wellemeyer, 2002]. This spectral window includes the strong SO2 absorption band at 310.8 nm and minimizes potential interferences due to stray light at shorter wavelengths. Our experiments also showed that the inclusion of wavelengths > 340 nm had no discernible impacts on retrievals. To better account for the orbit-to-orbit measurement artifacts, we analyzed data from one orbit at a time. Because the 60 cross-track positions (rows) of OMI are individual detectors (and essentially different instruments), we also treated each row of each orbit separately and filtered out pixels with slant column O3 (SO3) > 1500 DU (Dobson unit, 1 DU = 2.69 × 1016 molecules/cm2); large SO3 can diminish the measurement sensitivity to SO2. SO3 was calculated from ΩO3, the solar zenith angle (θ0), and the viewing zenith angle (θ),
 After data screening, about 900–1300 pixels of various cloud fractions remained in each row for the PCA. We tested a few different sets of input spectra for generating the PCs: (1) the N value spectra, (2) the N value spectra normalized against 340 nm, and (3) the N value spectra after a fitted second-order polynomial were subtracted from each spectrum. As the retrievals of SO2 were generally very similar for these different PCAs, hereafter we focus on the first method.
 Given the presence of transient SO2 plumes, one challenge is how to differentiate between SO2-free and SO2-polluted regions. We note that for the vast majority of pixels, SO2 absorption is normally not strong enough to cause significant changes in the radiances. It is thus unlikely for the PC(s) associated with or affected by SO2 absorption (vSO2) to be among the first few leading PCs, even if PCA is conducted on an entire row without first screening out polluted scenes. As long as nv is sufficiently small to exclude vSO2 from equation (2), reasonable initial estimates of SO2 (ΩSO2_ini) can be obtained. A second step PCA can then be applied to pixels with small ΩSO2_ini (in this study the threshold was set at ±1.5 standard deviations for each orbit/row) to extract a new set of PCs to update equation (2), followed by updated retrievals of SO2. This step can be repeated. We found that the changes in the retrieved SO2 generally became very small within two iterations. We conducted the second step PCA and retrievals for three segments of each row: a “tropical” region with SO3 < 100 DU + min(SO3), and two regions north and south of it. The resulting PCs for each segment more closely matched the measurements than the PCs acquired using the entire row. The use of these regionally derived PCs reduced retrieval biases.
 Another important consideration is how to determine nv, the number of PCs to use in equation (2). Too few PCs will lead to large biases in SO2 while too many may cause over fitting. Our test results indicated that in most cases, at least 20–30 PCs were necessary, while occasionally in the presence of relatively strong SO2 signals, no more than 8 PCs could be used. Instead of using a constant nv, we determined it for each row by checking the correlation between PCs (after the fifth) and the SO2 Jacobians. For example, if significant correlation at the 95% confidence level existed between the ith PC and SO2 Jacobians, only the preceding i-1 PCs would be included. We found this to be an effective way to prevent the inclusion of vSO2 and collinearity in equation (2). To maintain computational efficiency, an upper limit of 30 was set for nv. The differences in SO2 due to the use of a greater upper limit (e.g., 50) were found to be marginal, especially for polluted areas.
 The VLIDORT radiative transfer code [Spurr, 2008] was employed to calculate SO2 Jacobians. To facilitate the comparison between the new algorithm and the operational PBL product, we used the same fixed atmospheric profiles as in the operational algorithm, and also assumed the same surface albedo (0.05), surface pressure (1013.25 hPa), fixed solar zenith angle (30°), and viewing zenith angle (0°). For SO2, a climatological profile over the summertime eastern U.S. was used. For O3 and temperature, the OMTO3 standard midlatitude profiles with ΩO3 = 325 DU were used. Details can be found in Krotkov et al. . In the future, we plan to expand the look-up table for SO2 Jacobians to more realistically account for different measurement conditions. It should also be noted that while the PCA was conducted for pixels of all-sky conditions, we focus on relatively cloud-free scenes in the following sections, given that the calculated SO2 Jacobians are not suitable for cloudy conditions.
 As an example, Figures 1a–1c show the typical first few leading PCs extracted from the N value spectra of an entire row. The first PC essentially represents the mean spectrum of all the pixels. The second PC closely follows the spectral feature of the O3 cross section, suggesting that O3 absorption is a dominant contributor to the variance in the window. The third PC may be related to the surface contribution. It is difficult to assign a well-defined geophysical meaning to the fourth, the fifth, and the following PCs, but they probably reflect the rotational-Raman effect or various measurement artifacts such as the wavelength shift between radiance and irradiance spectra. The residuals from two different least squares fittings for a pixel near Hawaii are also shown in Figure 1d. While the same set of 30 PCs were used in both fittings, only one (red line) included the SO2 Jacobians. As can be seen from the figure, the inclusion of SO2 Jacobians had little effects at wavelengths > 320 nm, but substantially reduced the residuals in the strong SO2 absorption bands at 310.8 and 313 nm. The initial estimate of SO2 in the pixel was 2.21 DU, implying the influence of a nearby volcano.
 Figure 2 compares the global monthly mean SO2 for August 2006 from the PCA algorithm and the operational OMI L2 PBL SO2 product. The new algorithm largely reduces the systematic biases in the operational data, removing the step changes along 30°N and 30°S (probably related to the O3 profile shape change in the OMTO3 algorithm), the positive values over the Tibet Plateau and the Rocky Mountains, and also the large negative values at higher latitudes. Meanwhile, the major known SO2 source regions including eastern China, the eastern U.S., Mexico City, the industrial region in South Africa, as well as various degassing volcanoes are clearly discernible in the new retrievals. The SO2 plume in the South Pacific (20°S, 170°W) was from the submarine eruption of the Home Reef volcano in Tonga that started on 07 August 2006. A close-up look at the eastern U.S. (Figure 3) further reveals the improvements made in the new algorithm. With reduced noise and biases, the large point sources in the region, such as the power plants in the Ohio River valley, Atlanta, and mid-Atlantic coast can be more clearly distinguished. More regional examples are provided in the supporting information. In some cases, the PCA algorithm may potentially be employed to monitor SO2 pollution at higher temporal resolutions, as shown in the daily and weekly SO2 maps also available in the supporting information.
 The mean and standard deviation of the PCA SO2 and the operational OMI PBL SO2 were calculated for the equatorial Pacific (10°S–10°N, 120°W–150°W) to compare the noise levels of the two retrievals (Table 1). For this presumably SO2-free region, the standard deviation of PCA-retrieved SO2 is ~0.5 DU, half that of the operational OMI product (~1.0 DU). The day-to-day variation of the mean PCA-retrieved SO2 over the region (between −0.03 and 0.02 DU) is also smaller than that of the operational product (between −0.14 and 0.09 DU). The improvements in the PCA retrievals are likely due to the use of more wavelengths and better characterization of orbit-to-orbit measurement artifacts (e.g., due to small changes in uncorrected detector dark currents).
Table 1. The Statistics of the PCA-Retrieved and the OMI Operational PBL SO2 Over the Equatorial Pacific (10°S–10°N, 120°W–150°W) in August 2006a
 In summary, we have developed a new SO2 retrieval algorithm based on principal component analysis of satellite-measured radiance data. Preliminary application of the new algorithm to OMI suggests that it can greatly reduce systematic biases in the current operational OMI PBL SO2 data, and it suppresses the retrieval noise by a factor of 2. Our approach takes advantage of the fact that usually only a small portion of each satellite orbit has discernible SO2 absorption signals, and data from the rest of the orbit can be used to characterize and extract other physical and measurement details. While also relying on the least squares fitting of the measured radiances, our method differs from the DOAS approach in that its forward model contains basis functions mostly derived from the data, instead of various precalculated reference spectra. This decreases the uncertainties associated with modeling and instrumental errors and speeds up the calculation. With much less computation required, the new PCA algorithm is much faster than the full spectral fit and requires only about 4–5 min to process an entire OMI orbit using a single state-of-the-art CPU.
 Another advantage of our PCA-based algorithm is that it largely eliminates the need to develop specific, empirical corrections to the radiance data for each instrument. Rather, measurement artifacts are accounted for by the PCs directly extracted from the radiance data. This reduces the potential artifacts/biases introduced by instrument-specific data correction schemes. The algorithm can be easily adapted to other satellite sensors, and this feature makes it particularly useful for building long-term, consistent SO2 data records. In fact, we have tested the algorithm on the Ozone Mapping and Profiler Suite (OMPS) nadir mapping instrument flying on the Suomi National Polar-orbiting Partnership satellite. Using the algorithm with minimal changes (the only major change being the use of instrument-specific slit functions for SO2 Jacobians), we achieved very consistent, high quality SO2 retrievals from both OMI and OMPS.
 Next, we plan to expand the calculations of SO2 Jacobians to account for different viewing geometries, surface albedo, and O3 and SO2 profiles. This is expected to further reduce retrieval noise and biases especially for oceanic regions. We will also more thoroughly evaluate the data quality, including an analysis of error propagation to estimate retrieval errors due to measurement noise. For data validation, the PCA retrievals will also be compared to existing airborne SO2 measurements over the U.S. and China, as well as other data sources. Finally, we will investigate the possibility of applying the algorithm to other trace gas species. Some trace gases (e.g., HCHO) have fairly inhomogeneous spatial distributions similar to SO2 and could be suitable for the approach.
 We acknowledge the NASA Earth Science Division for funding of OMI SO2 product development and analysis. The Dutch-Finnish-built OMI instrument is part of the NASA EOS Aura satellite payload. The OMI instrument is managed by KNMI and the Netherlands Agency for Aero-space Programs (NIVR).
 The Editor thanks Simon Carn and an anonymous reviewer for their assistance in evaluating this paper.