Phytoplankton cell size is important to biogeochemical and food web processes. The goal of this study is to estimate phytoplankton cell size distribution from satellite imagery of spectral remote sensing reflectance (Rrs(λ)). Previous studies have indicated phytoplankton size classes have distinctive absorption spectra despite the physiological and taxonomic variability within an assemblage. For this study, the chlorophyll specific absorption spectra for phytoplankton size class extremes, pico- and microphytoplankton, are weighted by the percent microplankton (Sfm) and are the basis of phytoplankton size retrieval from SeaWiFS imagery. Satellite retrievals of Sfm are done through implementation of a forward optical model look-up table (LUT) that incorporates the range of absorption and scattering variability due to phytoplankton size, chlorophyll concentration ([Chl]) and dissolved and detrital matter (acdm(443)) in the global ocean from which Rrs(λ) is calculated by the radiative transfer software, Hydrolight. The Hydrolight modeled Rrs(λ) options for a given combination of [Chl] and acdm(443) within the LUT vary only due to Sfm. For a given pixel, the LUT search space was limited by satellite imagery of [Chl] and acdm(443). Within the narrowed search space, SeaWiFS Rrs(λ) was matched with the closest LUT Rrs(λ) option and the associated Sfm was assigned. Thresholds at which changes in Rrs(λ) due to Sfm could be discerned were established in terms of [Chl] and acdm(443). In situ high-precision liquid chromatography–derived estimates of cell size are used in conjunction with matched daily satellite estimates of Sfm for validation and agree well. A single month is displayed as an example of the Sfm retrieval.
 Phytoplankton play an important role in pelagic food web processes and biogeochemical cycles. All aspects of life are influenced by an organism's size [Chisholm, 1992] from growth to grazing. Many biogeochemical processes are directly related to the distribution of phytoplankton size classes in a given environment or time [Longhurst, 1998], and size distribution is a major biological factor that governs the functioning of pelagic food webs [Legendre and Lefevre, 1991]. The physical environment (turbulence, light, and nutrient availability) largely controls the distribution of cell size [Margalef, 1978; Cullen et al., 2002]. Phytoplankton size plays a significant role in the flux rate and flux efficiency of carbon to the deep ocean [Boyd and Newton, 1995; Guidi et al., 2009]. Therefore, understanding the mechanisms controlling the size structure of the phytoplankton community in response to environmental forcing is essential to understanding temporal and spatial variations in food web structure, the regulation of the biological pump, and the ability of the ocean to act as a long-term sink for atmospheric carbon dioxide.
 The retrieval of phytoplankton functional types from satellite imagery has recently become an active area of research. The efforts have included empirical relationships between spectral shape and amplitude of water-leaving radiance and dominant phytoplankton groups determined from high-precision liquid chromatography (HPLC) pigments [Alvain et al., 2005, 2008], empirical ranges of bio-optical properties applied to satellite-derived products [Aiken et al., 2007], development of bio-optical models to identify blooms of coccolithphores [Brown and Podesta, 1997], Trichodesmium spp. [Subramaniam et al., 2002], and diatoms [Sathyendranath et al., 2004], and the use of known ecological, geographical, optical, and physical characteristics of phytoplankton to identify particular groups [Raitsos et al., 2008]. There have also been efforts specifically focusing on retrieval of phytoplankton size regionally [Ciotti and Bricaud, 2006] and globally [Uitz et al., 2006; Hirata et al., 2008; Bracher et al., 2009; Kostadinov et al., 2009]. The Uitz et al.  and Hirata et al.  approaches determine empirical relationships and then apply these relationships to satellite-derived chlorophyll concentration to map phytoplankton size classes. Uitz et al.  establish empirical relationships between near surface chlorophyll, water column stratification state (mixed or stratified), and size, while the Hirata et al.  empirical relationships are determined between chlorophyll, spectral shape, and size. Bracher et al.  used the Scanning Imaging Absorption Spectrometer for Atmospheric Chartography (SCIAMACHY), a sensor developed for trace constituent composition in the atmosphere with a 30 km pixel size, to differentiate absorption by diatoms and cyanobacteria through differential optical absorption spectroscopy. Most recently, Kostadinov et al.  has derived phytoplankton size distribution information from the Sea-Viewing Wide Field-of-View Sensor (SeaWiFS) through a look-up table based on relationships between the backscattering slope and the slope of the particle size distribution. The approach retrieves the parameters of a particle size distribution (slope and reference particle abundance) and calculates derived products of total and size fractionated volume of particles in the pico, nano, and micro size classes.
Ciotti and Bricaud  retrieve an estimate of phytoplankton size composition, along with magnitudes and spectral shapes of colored detrital matter and phytoplankton, through inversion. They investigate the performance of their approach regionally on in situ radiometric and SeaWiFS radiance data in continental shelf waters of Brazil. Ciotti and Bricaud  leverage the relationship between spectral phytoplankton absorption and phytoplankton size established by Ciotti et al. , which described a strong covariation of the size of dominant phytoplankton in the water column and several factors controlling the spectral shape of the absorption coefficient. Although simple parameterizations have already been proposed regarding changes in the absorption spectra of phytoplankton with chlorophyll a [Bricaud et al., 1995; Sathyendranath et al., 2001], the parameterization proposed by Ciotti et al.  has an explicit ecological interpretation and no direct dependence on chlorophyll a. The approach presented in this study also utilizes the Ciotti et al.  relationship.
 This study constitutes another approach to retrieve phytoplankton size from global satellite imagery. Similar to other satellite products such as chlorophyll concentration and primary production, there are often many algorithms or approaches proposed along the way to community consensus [O'Reilly et al., 1998; Campbell et al., 2002; Carr et al., 2006; Friedrichs et al., 2009]. While the goal of retrieving phytoplankton cell size from satellite imagery remains the same between this study and others preceding, the originality lies in the technical approach. The approach described here retrieves percent microplankton (Sfm) through the implementation of an absorption-based forward optical model look-up table (LUT). The LUT incorporates the range of absorption and scattering variability due to phytoplankton size, chlorophyll concentration and dissolved and detrital matter in the global ocean from which Rrs(λ) is calculated by the radiative transfer software, Hydrolight. When the LUT was interrogated for a given image pixel, the search space of the LUT was limited by satellite imagery of chlorophyll concentration ([Chl]) and absorption due to dissolved and detrital matter at 443 nm (acdm(443)). Within the narrowed search space, SeaWiFS Rrs(λ) is matched with the closest LUT Rrs(λ) option and the associated Sfm is assigned. Careful consideration was placed on documenting the conditions under which phytoplankton cell size was detectable in SeaWiFS Rrs(λ). Thresholds at which changes Rrs(λ) due to Sfm could be discerned were established in terms of [Chl] and acdm(443). In situ HPLC-derived estimates of cell size are used in conjunction with matched daily satellite estimates of Sfm for validation.
2.1.1. Satellite-Derived Observations
 We utilized satellite data sets derived from SeaWiFS. These included spectral normalized water-leaving radiance (nLw(λ), mW cm−2μm−1 sr−1) and GSM01 (Garver-Siegel-Maritorena version 1) inversion products of chlorophyll a concentration ([Chlsat], mg m−3), the absorption coefficient for colored dissolved and detrital materials at 443 nm (acdmsat (443), m−1), and the particulate backscattering coefficient at 443 nm (bbpsat(443)) [Maritorena et al., 2002]. Monthly global area coverage (GAC, level 3 mapped data, 9 km resolution) imagery was obtained for phytoplankton cell size retrieval. For validation, daily local area coverage (LAC, level 3 mapped data, 1 km resolution) for the U.S. east coast North Atlantic and GAC for other global regions were obtained that coincided with the day that in situ observations were made. The nLw(λ) monthly and daily mapped data were generated by the NASA Goddard Space Flight Center (GSFC) Distributed Active Archive Center (DAAC) with reprocessing 5.2 [Feldman and McClain, 2008] (http://oceancolor.gsfc.nasa.gov/). The spectral nLw data was converted to remote sensing reflectance (Rrssat(λ), sr−1) by dividing by the extraterrestrial solar irradiance (F0(λ), mW cm−2μm−1) [Thuillier et al., 2003] (http://oceancolor.gsfc.nasa.gov/DOCS/RSR_tables.html). The GSM01 products are generated and hosted by the ocean optics group at the University of California Santa Barbra (ftp://ftp.oceancolor.ucsb.edu//pub/org/oceancolor/REASoN/) [Maritorena et al., 2002].
2.1.2. In situ Observations
 The influences of cell size and pigment composition on light absorption by phytoplankton is largely documented [Bricaud et al., 2004]. The relative biomass proportions of pico-, nano-, and microplankton in natural populations can be estimated from the concentrations of pigments which have taxonomic significance [Vidussi et al., 2001]. Numerous publicly available in situ data sets containing a suite of HPLC pigments were obtained (Table 1). Only observations that coincided with the first 10 years of the SeaWiFS mission (September 1997 to August 2007) were used. The database encompasses stations sampled in case 1 (phytoplankton dominated; generally open ocean) and case 2 (dominated by dissolved and detrital materials in addition to phytoplankton; generally coastal) waters, various hydrological, and trophic conditions ranging from oligotrophic to eutrophic. The geographic distribution of the stations is uneven (Figure 1). However, from a similar HPLC database with significant overlap with the observations used in this study, Uitz et al.  reported the mean, median, and shape of the frequency distribution of HPLC-measured chlorophyll concentration is similar to that derived by the whole ocean from ocean color imagery [Antoine et al., 2005] and thus can be considered representative of the global ocean.
 HPLC pigment analysis allows for the determination of a suite of phytoplankton pigments including chlorophyll a (sum of chlorophyll a, divinyl chlorophyll a and chlorophyllide a) and accessory pigments. Accessory pigments indicate the types of phytoplanktonic groups and can be used as biomarkers. In order to condense the information contained in the full suite of pigments, pigment indices are constructed to quantify taxonomic composition by using a minimal set of pigments [Claustre, 1994; Gieskes et al., 1988; Vidussi et al., 2001]. The seven major pigments selected as being representative of distinct phytoplankton groups include fucoxanthin (fuco), peridinin (perid), 19'-hexanoyloxyfucoxanthin (hex-fuco), 19'-butanoyloxyfucoxanthin (but-fuco), alloxanthin (allo), chlorophyll b and divinyl chlorophyll b (chl b), and zeaxanthin (zea). The relative biomass proportions of picophytoplankton (<2 μm), nanophytoplankton (2–20 μm), and microphytoplankton (20–200 μm) [Sieburth et al., 1978] were estimated from the concentrations of pigments which have taxonomic significance and can be associated with a size class [Uitz et al., 2006; Vidussi et al., 2001]. Although this method provides an approximate proportion, it has been demonstrated to provide reliable results at regional [Vidussi et al., 2001] and global scales [Uitz et al., 2006]. Pigment taxonomic significance and detailed calculations can be found in the works of Uitz et al.  and Vidussi et al. .
 Because of the collection and processing of HPLC samples by numerous groups, equipment, and protocols, quality control steps were needed to ensure a coherent data set. Quality assurance was carried out similar to Uitz et al. . Data observations were removed when the set of biomarker pigments was incomplete. The detection limit of the HPLC method depends on the sensitivity of the equipment and on filtered volumes, thus samples with chlorophyll a concentration <0.001 mg m−3 were rejected. For accessory pigments, concentrations below 0.001 mg m−3 were reset to zero. This rejection had no significant impact on the amount of useful data since such low concentrations were only encountered at great depths. The sum of the concentrations of major accessory pigments are tightly correlated with total chlorophyll a and covary in a quasi-linear manner [Trees et al., 2000]. This information was used to identify and eliminate outlying data that exceeded three standard deviations of the mean covariance for the sum of the accessory pigment concentrations. Because of the remote sensing objective of this study, we chose to use observations collected within the first 5 m of the water column and eliminated any observations made in waters shallower than 30 m. After applying the quality assurance procedures 4564 data points remained (Table 1).
 The HPLC-derived percent microplankton estimates SfmHPLC were matched with daily satellite imagery to inform the development of the correction scheme (described in section 3.4.1) and to provide validation for satellite percent microplankton Sfmsat retrieval. For imagery that coincided within the day that in situ observations were made, the single pixel that contained the in situ location was extracted from GAC (9 km resolution) and U.S. east coast LAC (1 km resolution) imagery. There were 860 match-ups between the in situ HPLC observations and satellite imagery (832 and 28 data points for GAC and LAC imagery, respectively; Table 1). We randomly split the in situ HPLC pigment data set in half, using one half for the correction scheme and reserved the other portion for validation. This type of validation is subject to temporal and spatial mismatch between the imagery and validation data points. While the imagery consists of a bulk estimate within the first photic depth of a 1 or 9 km patch of water, the ship observations represent a single point in space and time.
2.2. Optical Model and Look-up Table Construction
 The optical model constructed in this study begins with a global range of optically active ecosystem state variables and computes spectrally resolved inherent optical properties (IOPs). The three optically active constituents considered include water itself, cell size-weighted phytoplankton, and the combined effect of colored dissolved minerals and detrital particles. The IOPs were used to model apparent optical properties (AOPs), specifically Rrs(λ), through the radiative transfer software, Hydrolight 4.2 [Mobley, 1994; Mobley and Sundman, 2006a, 2006b] (Figure 2).
Ciotti et al.  showed that there are notable differences in the shape of the phytoplankton absorption spectra normalized to their average value for pico- (<2 μm) and microphytoplankton (>20 μm). The absorption spectra of a given phytoplankton size class changes with intracellular chlorophyll concentration [Duysens, 1956]. For example, compared with microplankton, picoplankton have a higher chlorophyll-specific absorption and their spectra exhibit larger peak heights relative to their troughs [Bricaud et al., 1983] (Figure 3a). Ciotti et al.  demonstrated that despite taxonomic and physiological variability in phytoplankton community structure, variation in the spectral shape of the chlorophyll-specific absorption coefficient could be described by changes in dominant cell size. Similar to Mouw and Yoder , the desired phytoplankton size parameter is percent microplankton (denoted Sfm, percent microplankton, to differentiate from the Ciotti et al.  parameterization) rather than percent picoplankton (Sf). The retrieval of percent microplankton allows for comparison with both current and historical (reported as either >20 μm or <20 μm due to filter size) phytoplankton size fractionated data sets. In this way, Sfm is the difference of one and Sf (Sfm = 1 − Sf) and varies from 0%, where phytoplankton is dominated by picoplankton to 100%, where it is dominated by microplankton. Intermediate percentages represent possible situations between the two extremes. The chlorophyll-specific absorption of phytoplankton is modeled as a spectral mixing model [Ciotti et al., 2002],
where spectral chlorophyll-specific absorption due to picophytoplankton (a*pico(λ), m2 mg−1) [Ciotti and Bricaud, 2006] and microphytoplankton (a*micro(λ), m2 mg−1) [Ciotti et al., 2002] are known, percent picoplankton (Sf, varies between 0 and 100%) weights the spectra between the size extremes, and Sfm = 1−Sf. The spectral phytoplankton absorption coefficient aph(λ) (m−1) is derived by multiplying the chlorophyll concentration ([Chl], mg m−3) by the chlorophyll-specific size weighted absorption (a*ph(λ), m2 mg−1),
We used the GSM01 inversion products (acdmsat(443), bbpsat(443), and [Chlsat]) and thus the absorption and scattering functions are modeled after Maritorena et al. . The total absorption of detrital particulates and dissolved materials are considered as a single term (acdm(λ), m−1) due to their similar spectral shape and slope and the combined treatment of these parameters in the GSM01 inversion scheme,
where the reference wavelength (λ0) is 443 nm, the slope of exponential relationship (S) was determined by Maritorena et al.  to be 0.0206 nm−1, and the GSM01 inversion retrieves acdmsat(443) (Figure 3b).
The power exponent was not varied due to cell size. Empirical studies have demonstrated that backscattering becomes spectrally neutral (λ0) when [Chl] nears 2 mg m−3 [Morel and Maritorena, 2001]. As discussed later, the satellite detection thresholds do not allow for Sfm differentiation when [Chlsat] nears 2 mg m−3; thus, the spectral neutrality is less of a concern within our detection thresholds. The backscattering due to particles at the reference wavelength (443 nm) was modified from the work of Behrenfeld et al. . They determined a monthly mean relationship between chlorophyll concentration and bbp at 440 nm for September 1997 to January 2002. We have modified the relationship for use with 443 nm and the inclusion of the first 10 years of SeaWiFS data,
Spectral backscattering was converted to total scattering (bp(λ)) with the backscattering ratio (B, 1%) as in the work of Morel and Maritorena ,
 A look-up table (LUT) was constructed from a full-factorial experimental design that independently varied [Chl], phytoplankton size distribution represented as percent microphytoplankton (Sfm), and acdm(443) over a global ocean expected range (described in section 3) for each parameter. For a given combination of modeled IOPs (absorption and scattering), the radiative transfer software, Hydrolight 4.2 estimated AOPs (Rrs(λ)).
3.1. LUT Search Space
 To minimize the number of Hydrolight simulations needed to construct the LUT, the range of global variability in chlorophyll concentration, percent microplankton, and absorption due to colored dissolved and detrital materials were considered. Empirically determined data envelops for parameter ranges of SfmHPLC and acdmsat(443) were constructed with respect to [ChlHPLC] and [Chlsat], respectively (Figure 4). This approach does not attempt to correlate SfmHPLC and [ChlHPLC] or acdmsat(443) and [Chlsat] rather determine reasonable bounds of global observations. These data bounds were used to define the parameter ranges for the LUT construction. When the LUT was interrogated for a given image pixel, the search space of acdmLUT(443) and [ChlLUT] were limited by GSM01 retrievals of acdmsat(443) and [Chlsat]. The Hydrolight modeled remote sensing reflectance RrsLUT (λ) scenarios for a given combination of acdmLUT(443) and [ChlLUT] (selected from the acdmsat(443) and [Chlsat] data) vary only due to the bins of percent microplankton within the LUT, SfmLUT(LUT ranges for acdmLUT(443), [ChlLUT], and SfmLUT are described below).
 [ChlHPLC] versus SfmHPLC resulted in a sigmoid shape (Figure 4a). This shape has been noted by others who have investigated a subset of this HPLC data compilation [Bricaud et al., 2004; Devred et al., 2006]. The central fit of the distribution (SfmHPLC(central) = 1 − exp(−1 × [ChlHPLC]) is similar to the fit reported by Devred et al.  but modified to represent percent microplankton rather than percent picoplankton. The upper (SfmHPLC(upper) = 1 − exp(−1.5 × ([ChlHPLC] × 10)) and lower (SfmHPLC(lower) = 1 − exp(−1 × [ChlHPLC] × 0.12)) bounds of the distribution were beyond three standard deviations of the nonlinear regression and contained greater than 99% of the observations.
 The bounds of the [Chlsat] versus acdmsat(443) relationship were determined using GSM01 inversion products. Monthly GSM01 [Chlsat] and GSM01 acdmsat(443) observations were averaged over the decadal SeaWiFS record. The average values plotted against each other result in an exponential relationship (Figure 4b). The central fit of the data (acdmsat(443)(center) = 0.101 [Chlsat]1.15) was determined from regression analysis. The upper (acdmsat(443)(upper) = 0.29 [Chlsat]0.6) and lower (acdmsat(443)(lower) = 0.007 [Chlsat]1.87) bounds of the distribution were determined in the same manner as the [ChlHPLC] versus SfmHPLC relationship.
 Within the data envelopes, parameter ranges were established with a log2 scale to enable higher resolution at lower concentrations. The concentration ranges for [ChlLUT], acdmLUT(443), and SfmLUT were 0.01–2 mg m−3 with 85 levels, 0–0.325 m−1 with 78 levels, and 0–100% at 10% bins with 11 levels (i.e., 0%, >0–10%, >10%–20%, etc.), respectively. The resulting full-factorial design LUT consisted of 44,343 independent combinations of [ChlLUT], acdmLUT(443), SfmLUT, and RrsLUT (λ).
 The search space of the LUT is guided by GSM01 [Chlsat] and acdmsat(443) imagery. For a given combination of these parameters, RrsLUT(λ) varies only due to the bins of SfmLUT, and each RrsLUT(443) was unique. The closest RrsLUT(λ) value to SeaWiFS Rrssat(λ) was selected and SfmLUT was assigned. Once the SfmLUT value was assigned for a given pixel of an image, the terminology became Sfmsat. This matching procedure was repeated for each pixel resulting in global monthly mapped Sfmsat imagery.
3.2. Theoretical Feasibility and Detectable Ranges
 The analyses described in section 3.1 determined the dynamic ranges of the LUT calculations. In the following section, we determine the thresholds of [Chlsat] and acdmsat(443) that set the limits at which the SeaWiFS sensor can detect the effects of Sfm changes. RrsLUT(λ) was investigated in terms of spectral shape and magnitude to determine how well changes in Sfmsat can be detected in the presence of other optically active constituents. For a given [ChlLUT], both acdmLUT(443) and SfmLUT alter the shape and magnitude of the RrsLUT(λ) spectra. Figure 5a demonstrates a spectral magnitude shift due to changing SfmLUT, while [ChlLUT] and acdmLUT(443) are held constant at 0.5 mg m−3 and 0.002 m−1, respectively. Similarly, Figure 5b, displays a spectral magnitude and shape shift due to changing acdmLUT(443), while [ChlLUT] and SfmLUT are held constant at 0.5 mg m−3 and 50%, respectively. Increasing [ChlLUT] also causes a well-described spectral shift in RrsLUT(λ) (not shown). As [ChlLUT] increases, peak RrsLUT shifts toward the red region of the spectrum due to increasing absorption of blue photons [O'Reilly et al., 1998]. For SeaWiFS bands, peak RrsLUT shifts from 443 to 490 to 510 nm. Unlike the relationship between [ChlLUT] and RrsLUT, the peak RrsLUT does not shift bands with a change in SfmLUT. Therefore, the ability to retrieve Sfmsat is based on the fact that SfmLUT shifts RrsLUT(λ) primarily in magnitude rather than in spectral shape.
 Thresholds for [Chlsat] and acdmsat(443) were determined, above which, the dominant constituent masks any spectral shifting due to SfmLUT, below which RrsLUT(λ) options associated with varying SfmLUT become too close for differentiation. Thresholds at which changes in RrsLUT(443) due to SfmLUT could be discerned were established in terms of [ChlLUT] and acdmLUT(443). The RrsLUT(λ) spectra were normalized to 555 nm (referred to as rsLUT(443) from this point forward) to stabilize the shape of the spectra and emphasize the magnitude shift at the 443 nm band. For a given combination of [ChlLUT] and acdmLUT(443), there was a selection of rsLUT(443) associated with the SfmLUT options. The difference between rsLUT(443) associated with the largest and smallest SfmLUT options (i.e., 100% and 0%) (ΔrsLUT(443)) were compared to the normalized noise-equivalent remote sensing reflectance for SeaWiFS at 443 nm, NEΔrsLUT(443). This parameter characterizes the sensor's inherent noise in terms of radiance and is given by
where NEΔL(λ) (W m−2 sr−1μm−1) is the noise-equivalent radiance centered at wavelength λ [Barnes et al., 1994; IOCCG, 1998], and Fo(λ) is the extraterrestrial solar irradiance centered at wavelength λ [Thuillier et al., 2003]. Any ΔrsLUT(443) values greater than NEΔrssat(443) were defined as being above the sensors limit of detection and attributable to factors other than sensor noise fluctuation. The detection thresholds will be slightly different for other ocean color sensors, such as MODIS (Moderate Resolution Imaging Spectroradiometer) or MERIS (Medium Resolution Imaging Spectrometer), due to differences in noise-equivalent radiance specification. These analyses showed that Sfmsat could be retrieved from SeaWiFS when [Chlsat] was between 0.05 and 1.75 mg m−3 and acdmLUT(443) was less than 0.17 m−1 (Figure 6). Applying this sensitivity analysis to satellite decadal mean [Chlsat] and acdmsat(443) imagery, 84% of the global ocean falls within the chlorophyll thresholds and 99.7% falls within the acdmsat(443) threshold.
3.3. Phytoplankton Size Retrieval From Satellite
3.3.1. SeaWiFS Correction
 Initial investigation indicated the LUT and SeaWiFS Rrs(λ) spectra agreed within relative magnitude and spectral shape. However, SeaWiFs rssat(λ) was slightly lower than the Hydrolight simulated rsLUT(λ) in the blue region of the spectrum and slightly higher in the red region of the spectrum (Figure 7). On average, there was a 69 (29), 39 (24), 22 (15), 10 (8), and 72 (53) percent difference between the two spectra at 412, 443, 490, 510, and 670 nm (standard deviation indicated in parentheses). There was zero percent difference between the spectra at 555 nm due to normalization at that wavelength. The depressed blue region of the SeaWiFS rssat(λ) observations was suspected to be due in part to an imperfect atmospheric correction. Hydrolight took user-supplied concentration, absorption, and scattering properties and performed radiative transfer to calculate RrsLUT(λ) just above the surface of the ocean, thus not subjected to atmospheric correction. The selection of an incorrect atmospheric model and/or the inability of the current atmospheric correction to correct for absorbing aerosols can result in a spectrally biased oversubtraction of aerosol radiance, increasing with decreasing wavelength [Bailey and Werdell, 2006]. In addition, the Hydrolight simulations assume a given spectral shape for the absorbing and scattering properties (see equations (1)–(5)) and magnitude is determined from concentration. The spectral shape coefficients (S, η, a*pico, a*micro) are representative in an average, broad scale sense; however, the actual relationships may vary in nature captured by SeaWiFS, leading to discrepancies between rssat(λ) and rsLUT(λ). To account for the offset and align the LUT and SeaWiFS rs spectra, a correction scheme was needed.
 To develop the correction scheme (Figure 8), we utilized the in situ HPLC pigment data set from which SfmHPLC was estimated. Of the 4564 in situ HPLC observations, there were satellite matches for 860 of the in situ data points (832 and 28 data points for GAC and LAC imagery, respectively; Table 1). We randomly split the in situ HPLC pigment data set in half, using one half for the correction scheme described here and reserved the other portion for validation. We extracted the nearest pixel of the daily LAC and GAC SeaWiFS Rrssat(λ) data to the location of each ship-based HPLC pigment observation. This approach was subject to both spatial and temporal biases. The HPLC pigment data are ship point measurements, whereas the extracted pixels are an average over a 1 km (LAC) or 9 km (GAC) area. In an effort to reduce temporal biases as much as possible, we utilized daily imagery coincident within the day of the ship measurement.
 The Rrssat(λ) pixels that matched the locations of the in situ HPLC pigment measurements were subjected to the GSM01 inversion (code available at http://www.icess.ucsb.edu/OCisD/). The GSM01 inversion utilizes Rrssat(λ) as the input, estimating [Chlsat], acdmsat(443), and bbpsat(443) [Maritorena et al., 2002]. The compilation resulted in concurrent estimates of SfmHPLC, Rrssat(λ), [Chlsat], and acdmsat(443). This suite of parameters allowed for an in situ informed correction of SeaWiFS Rrssat(λ). The [Chlsat] and acdmsat(443) were used to narrow the LUT search space. For a given observation, we then had a selection of rsLUT (443) that varied only due to SfmLUT, an in situ estimate of SfmHPLC and SeaWiFS rssat(443). We found the SfmLUT value that was closest to the in situ SfmHPLC observation, the rsLUT(443) value associated with the selected SfmLUT value was extracted and directly compared to the rssat(443) value. The correction factor was calculated as rsLUT(443) divided by the rssat(443) for the matched Sfm as described above.
 We sought to find a way to index the correction factor determined point-by-point against satellite-derived products. There was a clear relationship between rssat(443), [Chlsat] and the correction factor with an exponential decay of rssat(443) with increasing [Chlsat] (Figure 9a). The highest correction factors were found at the lowest rssat(443) and [Chlsat] values and decreasing significantly with an increase in either parameter. This indicated that a predictable function was achievable between satellite-derived parameters and the correction factor needed to align rssat(443) with rsLUT(443). When the point-by-point correction was applied to rssat(443), it was evident that larger phytoplankton cells had a higher rssat(443) than smaller cells for the same given [Chlsat] (Figures 9b and 5a).
 To be able to use the correction factor scheme with satellite imagery, the in situ informed point-by-point correction factor data needed to be adapted into a surface that incorporates the range of global variability of chlorophyll concentration and remote sensing reflectance. To do this, we first linearized the relationship of the GSM01 [Chlsat] to rssat(443) found in Figure 9a by log10 transforming each parameter (not shown, similar to Figure 11). The linearized data was then interpolated onto a gridded surface with the resolution of [Chlsat] and rssat(443) identical to the resolution of these parameters in the LUT and a two-dimensional average filter was applied to smooth the surface.
 In order to test the performance of the interpolated and filtered correction factor surface, the point-by-point correction factors were plotted against the surface correction factors (from the split in situ data set, n = 430; Figure 10a). The extreme outliers of the regression were identified as the studentized residuals of the observations that fell beyond the Bonferroni (α splitting) critical value (3.81, α = 0.05) [Kleinbaum et al., 1998]. These outliers were removed and linear regression was performed again on the remaining observations (n = 425; Figure 10b). Remaining outliers (n = 12) were identified as the standardized residuals beyond three standard deviations of the regression and removed (Figure 10b). This resulted in 413 high-quality observations used to construct the final correction relationship that was applied to the satellite imagery. To ensure high-quality match-ups, the outlier identification applied in Figure 10b was applied to the full in situ data set. This eliminated 23 observations from the 430 observations reserved for validation. The geographic locations of the outliers for the full data set were not surprising, falling in regions with known satellite retrieval complications under standard processing procedures and included near-coastal waters, extreme high latitudes, and areas under the Saharan dust plume [Bailey and Werdell, 2006; Moulin et al., 2001] (Figure 10c). The remaining high-quality observations retained global distribution (Figure 10d). The point-by-point match-up of linearized [Chlsat], rssat(443), and the associated correction factor (Figure 11a) was used to generate a two-dimensional interpolated and filtered surface (Figure 11b). Correction factor values were extrapolated to the extremes of the detectable [Chlsat] and rssat(443) ranges following the dispersion of the correction values. The lowest observed correction value was assigned to the high [Chlsat] and high rssat(443) corners, while the highest observed correction factor was assigned to the low [Chlsat] and low rssat(443). The correction factor values in the region of the surface that was extrapolated were subject to greater uncertainty than those within the region of interpolation.
 Application of the correction surface to the satellite imagery was carried out on a pixel-by-pixel basis. For a given pixel, the [Chlsat] and rssat(443) values are known. The single correction factor that corresponds to the paired [Chlsat] and rssat(443) information is extracted from the correction surface (Figure 11b) and assigned to the given pixel. The SeaWiFS rssat(443) values are multiplied by the assigned correction factor to result in a corrected rssat(443) value for each pixel (Figure 7).
3.3.2. Sfmsat Retrieval
 Imagery of [Chlsat], acdmsat(443), and the corrected rssat (443) were used to generate maps of Sfmsat. Pixels that contained acdmsat(443) above the detection threshold (>0.17 m−1) and/or [Chlsat] that fell below (<0.05 mg m−3) or above (>1.75 mg m−3) the detection thresholds (see section 3.2) were masked and removed from further consideration. Of the remaining good pixels, the LUT search region was narrowed down based on [Chlsat] and acdmsat(443) pixel values. For a given [Chlsat] and acdmsat(443) combination, there were at most 11 choices of SfmLUT with associated rsLUT(443) values. Within the narrowed choices, the closest match between the SeaWiFS corrected rssat(443) and the rsLUT(443) options were determined and the SfmLUT value associated with the selected rsLUT(λ) was assigned as Sfmsat. The result was monthly global imagery of Sfmsat (Figure 12a). The pixels that were masked as low chlorophyll were reassigned to the >0–10% Sfmsat bin under the assumption that all cells were small at extremely low [Chlsat] (Figures 12b and 4a). The high [Chlsat] and high acdmsat(443) pixels were not reassigned due to great variability in possible Sfmsat for a given [Chlsat] value (Figure 4a).
 The in situ observations of SfmHPLC matched with daily LAC and GAC SeaWiFS imagery were used for validation in conjunction with the Sfmsat for the same imagery. Of the 430 in situ data points that were reserved for validation, 407 passed the outlier test (see Figure 10b). As a measure of classification success, the validation points are tracked with respect to how tightly they agree with the outlier determination (Figure 10b). Eighty-four percent of the data points fell within one standard deviation (RMSE) of the outlier regression (Figure 13). The remaining 12% and 4% of the data fell within two (2 × RMSE) and three standard deviations (3 × RMSE) respectively. The match-up dispersion is greater at lower Sfm values. This occurs because the difference in rssat(443) options associated with SfmLUT are closer together at lower Sfm and thus were more difficult to differentiate. The r2 and RMSE for regression between Sfmsat and SfmHPLC is 0.6 and 12.64 for all for all data points and 0.85 and 6.3 for data that fell within one standard deviation, respectively (Figure 13).
 We considered the performance of our method in the context of other phytoplankton functional type retrieval approaches. This approach retrieves only percent microplankton at discrete bins, while other approaches retrieve additional size classes or functional types. However, an important aspect of this study is the consideration of detectable ranges given the sensitivity of the satellite radiometer and the second order impact phytoplankton size imparts on Rrssat(λ). Each phytoplankton functional type retrieval method has a different measure of validation success. To be able to compare the performance of the Sfmsat retrieval in this study with other approaches, we calculated the identical statistical measure presented in other studies with our data (Table 2). Our validation is robust in comparison to other approaches. In all but one instance, our retrieval was closer to the in situ data. The Ciotti and Bricaud  validation was significantly stronger (RMSE of 0.172) than ours (RMSE of 12.64). Their effort was regional focusing on continental shelf waters off Brazil, while this effort is global. Considering the study of Uitz et al. , who retrieved phytoplankton cell size in the global ocean, the mean of their validation measure (log10[predicted/measured]) was −0.012, while the same statistic for our validation was considerably smaller (0.0054), indicating greater validation fidelity. Kostadinov et al.  validate the particle size distribution slope (r2 = 0.21) and the abundance of reference particles of 2 μm size (r2 = 0.256). Our validation of percent microplankton yielded a stronger correlation coefficient (r2 = 0.60). Hirata et al.  only validate with a single Atlantic Meridional Transect (AMT-07). Of the 26 AMT-07 match-ups, 19 were correctly classified (73%). If data within one standard deviation is considered correctly classified, then 84% of the retrievals from our approach were correctly classified.
 Algorithm development for the retrieval of phytoplankton functional types (size classification, taxa, or species) has recently become an active research pursuit [Sathyendranath et al., 2004; Alvain et al., 2005; Ciotti and Bricaud, 2006; Uitz et al., 2006; Aiken et al., 2007; Hirata et al., 2008; Raitsos et al., 2008; Bracher et al., 2009; Kostadinov et al., 2009]. Each of these approaches utilizes different aspects of a phytoplankton's optical and/or environmental niche to parse unique characteristics associated with a size class or taxa. This approach leverages the relationship between phytoplankton absorption and cell size. The analyses of in situ HPLC data (Figure 4a) performed here and also by Bricaud et al.  demonstrated the wide variability in phytoplankton size that can be associated with a given chlorophyll concentration. We utilized satellite-estimated [Chlsat] and acdmsat(443) [Maritorena et al., 2002] only as a guide to determine where in the look-up table to search to reduce computational time. The differentiation between percent microplankton bins is based on a magnitude shift in remotely sensed reflectance at 443 nm. Our approach applied to SeaWiFS, works for waters where 0.05 < [Chlsat] < 1.75 mg m−3 and acdmsat(443) < 0.17 m−1. Of the global mean [Chlsat] and acdmsat(443) climatology for the first 10 years of the SeaWiFS mission, 84% of the global ocean fell within the [Chlsat] threshold and 99.7% fell within the acdmsat(443) threshold.
 The correction of SeaWiFs rssat(443) was an essential step to the success of this LUT retrieval approach. Although the SeaWiFs and LUT Rrs(λ) spectra agreed relatively well, without the correction, many of the SeaWiFS rssat(443) observations would have fallen below the range of LUT rsLUT(443) options that vary due to the possible Sfm bins for a given [Chlsat] and acdmsat(443) scenario. Most often the offset was in the form of SeaWiFS rssat(443) being lower than the rsLUT(443) options (Figure 7). In this case, the minimum SfmLUT bin associated with the lowest rsLUT(443) option would have been selected, and the Sfmsat assigned would be the minimum allowed within the [ChlHPLC] versus SfmHPLC data envelops (Figure 4a) for the given [Chlsat]. The ability to match in situ data with satellite observations presented the opportunity to understand the systematic offset between the sources of Rrs(λ) and develop a correction scheme (Figures 8 and 11).
 When considering the need to correct spectra, it is important to keep in mind the potential error associated with both the satellite observations and the optical model used to construct the LUT. SeaWiFS has inherent uncertainty of the radiometric measurements and potential complications associated with the atmospheric correction. SeaWiFS nLw(λ) data maintains uncertainties close to the prelaunch target of ±5% for observations in deep clear water and the uncertainties are significantly higher in coastal case 2 waters [Bailey and Werdell, 2006]. There are implicit assumptions in the optical model. These include the combination of detrital particles and dissolved minerals into a single absorption coefficient with a constant slope, not allowing backscattering to vary with particle size and assuming all particles have a backscattering efficiency of 1%. These assumptions may be violated during blooms of specific species and in complex coastal waters. These are inherent limitations set by the instrument environment, processing, and model constructs. We have characterized under what conditions our retrieval will work given SeaWiFS noise-equivalent radiance (Figures 6 and 12b).
 The validation results are encouraging especially considering the inherent limitations of matching in situ point observations with satellite pixels that are separated temporally and spatially [Bailey and Werdell, 2006]. In situ HPLC observations are based on a small volume whereas the LAC and GAC satellite pixels have a resolution of 1 and 9 km, respectively. Even though the pixel that contained the HPLC observation location was selected, possible spatial variability within these pixels can be great, particularly in continental margin waters. The satellite imagery and in situ observations were sampled on the same day but separated by several hours. The spatial and temporal disparity in the matched pairs may introduce differences in the results that are more related to the variable and dynamic nature of the field rather than the methodology implemented. The temporal and spatial mismatch must be kept in mind when evaluating this type of validation. However, given these drawbacks, our approach displays strong validation fidelity in comparison to other phytoplankton functional type retrievals (Table 2).
 In Figure 12, we have displayed only a single month (May 2006) of the SeaWiFS decadal global Sfmsat retrievals. Detailed description of the temporal and spatial trends of the monthly Sfmsat maps will be presented in another manuscript. In our example image, the spatial trends agree with what we know about cell size distribution from in situ observations and other phytoplankton cell size investigations. Larger cells occur near coast regions, seasonal bloom regions, and areas of intense upwelling. The smallest cells are found in the center of ocean gyres [Cullen et al., 2002]. Although Sathyendranath et al.  did not retrieve phytoplankton cell size directly, their probability of diatom occurrence estimates can roughly be compared to our retrievals of percent microplankton (primarily diatoms). They worked with a much smaller study area and higher resolution imagery. Nevertheless, our Sfmsat estimates coincident with their study area of the Northwest Atlantic Zone off the eastern coast of Canada are similar in magnitude, reaching Sfmsat > 75% in intense bloom locations. Hirata et al.  has published a global distribution of phytoplankton size for several months in 2004. Rather than maps of percent contribution of a given size class, they map the dominant size class for a given pixel (pico-, nano-, or microplankton). By their classification, very little of the ocean is dominated by microplankton. The trends of nanoplankton through time in 2004 give an indication of monthly changing cell size. The Southern Ocean is very dynamic in January–March and November, with micro- and nanoplankton dominating. From May to September, the North Atlantic bloom is a predominant feature, moving northward as the season progresses. In July–September, equatorial upwelling is evident, reaching well across the Pacific Ocean. Many of these same features are expressed in the Sfmsat retrievals. In the Southern Ocean, there are Sfmsat patches of >40% in January–March and November. The progression of the North Atlantic bloom is highly evident with Sfmsat in the core of the bloom >40%, with patches >80%. Equatorial upwelling is fairly consistent across most months in the Sfmsat imagery with microplankton contributions generally between 20% and 30%. During July and September, Sfmsat increases to 30%–40% in places along the equatorial Pacific.
Uitz et al.  and Kostadinov et al.  both published a single month estimate for the fraction of microplankton. Comparing our Sfmsat retrieval to the single month of fraction of microplankton chlorophyll displayed in Uitz et al.  (June 2000), we retrieved a similar spatial distribution and magnitude. For the same month that Uitz et al.  displayed (not shown), we estimated 20%–30% microplankton in the equatorial Pacific, in the Southern Ocean west of South America, and south of the equator in the Indian Ocean. Many of the high-production regions such as the subpolar North Atlantic and North Pacific and upwelling regions contained greater than 60% microplankton. These observations are in agreement with the Uitz et al.  results. Kostadinov et al.  displays percent microplankton estimates for August 2007. For this same time period, the Sfmsat retrievals estimate microplankton contributions generally greater than 40% for the North Atlantic, North Pacific, and North Indian Ocean influenced by monsoons (not shown). There are large patches within these regions that may have higher microplankton contributions, which are not retrieved by the Sfmsat approach due to the retrieval thresholds. The Pacific and Atlantic equatorial upwelling regions and the Southern Ocean west of South America and Africa display Sfmsat estimates between 20% and 40%. These spatial patterns and magnitudes are in accord with the Kostadinov et al.  estimates.
 Understanding the mechanisms controlling the size structure of the phytoplankton community in response to environmental forcing is essential to understanding temporal and spatial variations in food web structure, the regulation of the biological pump [Laws et al., 2000; Tremblay et al., 1997], and the ability of the ocean to act as a long-term sink for atmospheric carbon dioxide [Eppley and Peterson, 1979; Falkowski et al., 2000]. The methodology and resulting global maps of Sfmsat has implications for assisting many oceanographic subdisciplines with investigations of primary production, biogeochemistry, and carbon cycling. Many biogeochemical ecosystem models include phytoplankton size [Doney et al., 2009; Le Quere et al., 2005], and other investigations have pointed to the importance of considering size in primary production estimates [Mouw and Yoder, 2005; Uitz et al., 2008]. Our study contributes to the expanding body of literature aimed at retrieving phytoplankton functional type information from satellite imagery in presenting an additional approach to retrieve phytoplankton size and carefully considers under what conditions size can be differentiated from satellite-measured reflectance spectra in the presence of other optically active constituents. While other approaches retrieve more than one phytoplankton size class or functional type, it is important to consider and quantify the thresholds for application in the context of the satellite sensor's sensitivity capability.
List of optical parameters
total spectral absorption (m−1).
spectral absorption due to pure seawater (m−1).
spectral absorption due to phytoplankton (m−1).
chlorophyll-specific spectral absorption due to phytoplankton (m2 mg−1).
chlorophyll-specific spectral absorption due to microphytoplankton (m2 mg−1).
chlorophyll-specific spectral absorption due to picophytoplankton (m2 mg−1).
spectral absorption due to the combined effect of colored dissolved and detrital matter (m−1).
absorption due to dissolved and detrital matter estimated from GSM01 inversion of SeaWiFS imagery (m−1).
absorption due to dissolved and detrital matter within the LUT (m−1).
spectral total scattering (m−1).
spectral scattering due to seawater (m−1).
spectral scattering due to particles (including phytoplankton) (m−1).
spectral backscatter due to particles (including phytoplankton) (m−1).
particulate backscattering estimated from GSM01 inversion of SeaWiFS imagery (m−1).
particulate backscattering ratio (bb/b).
power coefficient of the spectral backscattering relationship.
slope of the acdm(λ) relationship (nm−1).
wavelength, reference wavelength (443 nm) (nm).
chlorophyll concentration (mg m−3).
chlorophyll concentration estimated from GSM01 inversion of SeaWiFS imagery (mg m−3).
chlorophyll concentration specified for the LUT (mg m−3).
in situ HPLC chlorophyll concentration (mg m−3).
percent picoplankton (%).
percent microplankton (%).
percent microplankton estimated from SeaWiFS imagery (%).
percent microplankton specified within the LUT (%).
percent microplankton estimated from in situ HPLC pigments (%).
SeaWiFS normalized water leaving radiance (mW cm−2μm−1 sr−1).
 We would like to thank the SeaWiFS Project at NASA GSFC for the processing and distribution of the SeaWiFS monthly mapped nLw(λ) imagery. We would also like to thank the ocean optics group from the University of California Santa Barbara, specifically D. Siegel and S. Maritorena, for the processing and distribution of SeaWiFS global monthly mapped GSM01 inversion product imagery and also for providing the GSM01 inversion code. We would like to thank all of the investigators who collected and processed the in situ HPLC pigment observations utilized in this study and those that compiled the data and made it publicly accessible (NASA SeaBASS, U.S. JGOFS, France JGOFS, HOTS, BATS). We kindly acknowledge the assistance of K. Hyde with matching SeaWiFS daily GAC and LAC imagery to the in situ observations. B. Beckmann was instrumental in optimizing the code and computational time for the Sfmsat retrievals. Initial ideas for this project were generated at the 2004 University of Maine summer course “Ocean Optics: Radiative transfer and inversion of ocean color remote sensing,” supported by ONR, NSF, and NASA. This manuscript benefited from discussions with T. Rynearson, Y. Wang, J. O'Reilly, and M. Twardowski. The comments and suggestions of three anonymous reviewers greatly improved this paper. Funding for this project came from a Rhode Island Space Grant/Vetlesen Climate Change Fellowship, NASA Earth and Space Science Fellowship, and University of Rhode Island Graduate School Oceanography Alumni Fellowship all awarded to C. Mouw.