Advanced Clear-Sky Processor for Oceans (ACSPO), developed at National Environmental Satellite, Data, and Information Service, reports observed top-of-atmosphere clear-sky brightness temperatures (BT) in AVHRR channels 3B (3.7), 4 (11), and 5 (12μm) along with the sea surface temperatures (SST) retrieved from these BTs as a level 2 (L2) product. Additionally, ACSPO provides the corresponding BTs simulated with Community Radiative Transfer Model (CRTM), using Reynolds daily Level 4 (L4) optimum interpolation SST (OISST) and NCEP-GFS profiles as inputs. Accuracy of simulated BTs is critical for ACSPO functionalities, including detecting clouds, retrieving physical SSTs, monitoring sensor performance, and validating CRTM. This paper tests 11 gap-free gridded L4 SSTs for their potential use as first-guess fields in ACSPO to improve accuracy of simulated BTs. As a first step toward the objective, this study checks for consistency between various L4 products and ACSPO L2 SSTs. This SST consistency was shown earlier to be representative of, and equivalent to, the consistency between measured and simulated BTs, thus avoiding expensive CRTM calculations. The metrics employed in L4 comparisons include the global spatial L4-L2 SST biases and variances and their temporal stability. Also, the effect of L4 fields on the corresponding satellite-to-satellite consistency (calculated as L2-L2 double differences) is examined. Several L4 products, including the GHRSST Multi-Product Ensemble and Canadian Meteorological Centre analysis (CMC-0.2°), show better consistency with ACSPO L2 SST and will be explored in the future versions of ACSPO.
 Advanced Clear-Sky Processor for Oceans (ACSPO) is the National Environmental Satellite, Data, and Information Service (NESDIS) clear-sky radiance and sea surface temperature (SST) retrieval system [Liang and Ignatov, 2011]. Applied to Advanced Very High Resolution Radiometers (AVHRR), ACSPO provides the clear-sky top-of-atmosphere (TOA) observed brightness temperatures (BT) in AVHRR bands 3B (3.7), 4 (11) and 5 (12μm), along with their modeled counterparts, calculated using the fast Community Radiative Transfer Model (CRTM) [Han et al., 2006]. Currently, CRTM is implemented in ACSPO in conjunction with Level 4 (L4) daily Reynolds Optimum Interpolation SST (OISST) [Reynolds et al., 2007] and National Centers for Environmental Prediction Global Forecast System (NCEP-GFS, available athttp://nomad3.ncep.noaa.gov/pub/gfs/rotating/) upper air fields as inputs. Simulated BTs are used in ACSPO for cloud detection [Petrenko et al., 2010], physical SST retrievals [Petrenko et al., 2011], monitoring sensor performance [Liang and Ignatov, 2011], and CRTM validation [Liang et al., 2009; Liu et al., 2009]. The Model-Observation (M-O) BT biases are routinely monitored using a near-real time web-based diagnostic system: Monitoring of IR Clear-sky Radiances over Oceans for SST (MICROS;www.star.nesdis.noaa.gov/sod/sst/micros/) [Liang and Ignatov, 2011]. M-O biases are affected by errors in CRTM and in first guess SST and upper-air fields, whereas space-time mismatches between AVHRR data and CRTM simulated BTs at OISST and GFS grids, and interpolated to AVHRR pixels, may contribute to their respective noise [e.g.,Dash and Ignatov, 2008].
 In this study, several L4 SST fields have been tested, to identify more consistent inputs to CRTM in ACSPO and minimize the corresponding partial error. The most straightforward way to evaluate an L4 product is to input it in CRTM, and check if the consistency between the measured and simulated TOA BTs has improved. Performing this check in all sensor bands used for SST retrievals provides a very stringent test, as improvements must take place in all of them. However, running CRTM with multiple L4 SSTs and in multiple sensor bands is computationally challenging. Initial analyses toward this objective were performed by Liang and Ignatov , who only tested one additional L4 SST product as input to CRTM in ACSPO – the UK Met Office Operational SST and Sea Ice Analysis (OSTIA) [Donlon et al., 2012]. Having TOA BTs calculated with both OSTIA and OISST, they observed that CRTM BTs with OSTIA input are more consistent with the AVHRR measured clear-sky BTs, in all three SST bands 3B, 4, and 5. They additionally compared OSTIA and OISST L4 SSTs with ACSPO L2 SST, and concluded that the contrast is largest in SSTs, and progressively smaller in TOA BTs, in proportion to the atmospheric attenuation in the corresponding bands. This study builds upon the analysis inLiang and Ignatov  and extends it, by including several other L4 SSTs in the comparisons, and checking them for consistency with ACSPO L2 SST. Recently, Castro et al.  and Kennedy et al. employed a similar approach, by which the relative accuracy of various in situ data was evaluated using satellite L2 data as a “transfer standard.” Here, the same methodology is used to cross-compare various L4s, using ACSPO L2 as a “transfer standard.” Note that this study is based on global comparisons only and does not attempt to evaluate the various L4s on regional scales. However, regional analyses against Argo floats were performed inMartin et al.  and shown to be consistent with global analyses.
Section 2 describes the data used in this study. Section 3 explains the methodology of comparisons and defines the performance metrics. Section 4describes results of L4 comparisons with in situ and ACSPO L2 data. The affect of various L4 SSTs on the cross-platform consistency in L2 SSTs are also presented insection 5. In section 6 the ACSPO L2 data are validated against the in situ SST. Summary and future work is described in the final section.
 In this study, nine months of L4, L2, and in situ data from 13 August 2011 to 29 May 2012 were used. The three data sources are explained in this section.
 Eleven daily L4 gridded gap-free SST products have been tested: (1) Two Reynolds OISST analyses (one uses AVHRR and the other additionally uses the Advanced Microwave Scanning Radiometer (AMSR-E) SST data [cf.Reynolds et al., 2007]; these products are referred herein as OISST.v2: AVHRR and OISST.v2: AVHRR + AMSR, respectively (note that OISST.v2: AVHRR + AMSR production has been discontinued on 4 October 2011, due to the failure of AMSR-E on Aqua)); (2) the NCEP real-time global SST high resolution (RTG_HR) [Gemmill et al., 2007] and low resolution (RTG_LR) [Thiébaux et al., 2003]; (3) the Naval Oceanographic Office K10 SST (NAVO K10) (http://podaac.jpl.nasa.gov/dataset/NAVO-L4HR1m-GLOB-K10_SST); (4) the NESDIS POES-GOES Blended SST [Maturi et al., 2008]; (5) the IFREMER/CERSAT ODYSSEA produced in France [Piollé and Autret, 2011]; (6) the Group for High-Resolution SST (GHRSST) Multi-Product Ensemble (GMPE) [Martin et al., 2012]; and a group of foundation SSTs (the SSTs free of diurnal warming), including (7) UK Met office OSTIA [Donlon et al., 2012], (8) the Canadian Meteorological Centre analysis (CMC 0.2°) of Environment Canada [Brasnett, 1997, 2008], and (9) the Australian BoM Global Australian MultiSensor SST Analysis (GAMSSA) [Zhong and Beggs, 2008; Beggs et al., 2011]. The sources of these data sets are detailed in Table 1 of Dash et al. . All L4 SSTs blend different satellite data with quality controlled in situ data, except NAVO K10, POES-GOES and ODYSSEA, which only use satellite data. The GMPE product uses in situ and satellite L2 data indirectly, through ensemble representation of individual L4 members.
 Note that all these L4 fields, along with a few more, are routinely validated against in situ data (TL4 − Tin situ) and cross-compared (TL4 − TL4) in the L4 SST Quality Monitor (L4-SQUAM;http://www.star.nesdis.noaa.gov/sod/sst/squam/L4/) [Dash et al., 2012]. The first four moments of the distributions are used as a measure for the proximity of the two products, and the results are visualized using global maps, histograms and Hovmöller plots of the differences.
2.2. ACSPO L2
 ACSPO global products have been produced at NESDIS operationally since May 2008 from AVHRR sensors onboard NOAA-16, −17, −18, −19, and Metop-A satellites, and archived in the NOAA's Comprehensive Large Array data Stewardship System (CLASS;http://www.class.ncdc.noaa.gov). Employing clear-sky mask described inPetrenko et al. , SST retrievals are performed using regression split window Nonlinear SST (NLSST; daytime) and triple window MultiChannel SST (MCSST; nighttime) algorithms described in Liang and Ignatov . ACSPO processes two types of AVHRR L1b data, the 4 km global area coverage (GAC) from NOAA satellites and 1 km Full Resolution Area Coverage (FRAC) from Metop-A satellite. In this study, only GAC data available from several NOAA platforms and Metop-A are used (the latter is produced on the ground, by subsampling the global FRAC data to look like GAC), thus allowing for consistency checks.
 Note that ACSPO has not been assimilated in any L4 product tested here, except for POES-GOES blended. Likewise, no L4 product was used in ACSPO production, except OISST.v2: AVHRR. If anything, one may expect a closer agreement between ACSPO and these two L4 products, whereas for all other L4s tested here, the L2 and L4 fields may be viewed as independent. Data analyses insection 4 check for these possible biases in the comparisons.
2.3. In Situ Data
 In situ data used in this study come from the online near-real time in situ Quality Monitor (iQuam;http://www.star.nesdis.noaa.gov/sod/sst/iquam/) developed at NESDIS (F. Xu and A. Ignatov, manuscript in preparation, 2012). The iQuam performs three major functions, in near-real time: (1) quality controls in situ data available from the Global Telecommunication System (GTS) data stream, separately in four major categories – drifters, tropical and coastal moored buoys, and ships; (2) displays in situ data on the Web interface; and (3) serves Quality Controlled (QC'ed) data to outside users. Data are available from January 1991 onward. Only QC'ed drifter data from iQuam were employed in this study.
3. Methodology and Performance Metrics
 L4 products have been evaluated against QC'ed in situ drifters and ACSPO L2 SSTs. In situ data are only available in limited geographic areas, not fully representative of the global SST. Also, their quality may be suboptimal and nonuniform [e.g., Xu and Ignatov, 2010; Castro et al., 2012]. Furthermore, most L4 products include in situ SSTs in their blending analyses and, therefore, may not be fully independent of in situ data. Thus validation against independent data, e.g., Argo floats [Martin et al., 2012] or shipborne infrared radiometers [Donlon et al., 1998; Minnett et al., 2001], is required (note that Argo floats are assimilated in CMC 0.2° analysis). In this study, ACSPO L2 SST, TL2, is additionally used to cross evaluate various L4s [cf. Castro et al., 2012; Kennedy et al., 2012]. The premise of using satellite L2 SST is that it covers the global domain more fully, and its quality is more uniform, being produced from only one satellite sensor whose stability is easier to ensure than multiple in situ sources. Note that the satellite L2 is used here not as a “validation” but rather as a “transfer” standard, which allows ranking various L4s, but cannot determine their absolute accuracy and precision. All 11 L4 SSTs were matched up with satellite L2 SSTs, using a nearest neighborhood (NN) approach and no interpolation in time, with TL2 data used as a spatial anchor.
 An example of ΔTL4L2 = TL4 − TL2 map is shown in Figure 1a, using nighttime ACSPO L2 SST on 20 August 2011. The map shows a near-global coverage, with some 2.7 million matchups. The corresponding histogram of differences is shown inFigure 1b, with summary statistics superimposed (number of matchups, median bias, and robust standard deviation, RSD). The distribution is near-Gaussian, with only a few positive outliers (cf. the right tail in Figure 1b), likely due to residual cloud and/or ice contamination in the ACSPO L2 product, mostly in the southernmost latitudes. Median ΔTL4L2is close to zero with RSD ∼ 0.33 K. Note that these statistics come from a well populated near-global sample with a near-Gaussian distribution, and suggest that the L4 SST is on average very close to the ACSPO L2 SST, and matches its spatial structure very closely. The RSD of a distribution represents a robust measure of its statistical variability, in which the effect of outliers is minimized. In this study the RSD is estimated by dividing the interquartile range (IQR = P75 − P25) of the distribution with a scaling factor of 1.348, where P75 and P25 represent its 75th and 25th percentiles, respectively [e.g., Merchant and Harris, 1999].
 Both L4 and L2 SSTs are functions of latitude (λ) and longitude (φ). The L2 product is additionally stratified by Day and Night (D/N), whereas all L4s tested here do not resolve the diurnal cycle. One can therefore write
 Here, T(0) are the true values of SSTs (“information”) and ε are the corresponding errors (“noise”). The difference between the two SSTs, ΔTL4-L2, for a given Day/Night (D/N = D0/N0) will have contributions from the true (ΔTL4−L2(0)) and error (Δε = εL4 − εL2) terms as follows
 For a given combination of L2 and L4, the ΔTL4−L2(0)term may not be zero and furthermore, it may be L4-specific, depending on the L4 definition (e.g., skin, bulk, foundation,etc.). In what follows, we have opted to incorporate the ΔTL4−L2(0) term into the error term, Δε, whose median values, in addition to systematic errors in L4 and L2 fields, now include the ΔTL4−L2(0) differences. As a result, the spatial median bias superimposed in Figure 1b and representing a global average over all latitudes and longitudes, μΔε(D/N), should be used with care to measure proximity of the L2 and L4 fields. At night, the upper layer of the ocean is more uniform, and the ΔTL4−L2(0) is smallest and more uniformly distributed in space, thus leading to the corresponding standard deviations, σΔε(N), being dominated by the Δε term.
 Assuming that the errors in L4 and L2 SSTs are uncorrelated, the σΔε can be represented as
 Here, and are root mean squared errors in L4 and L2 SSTs, respectively. Note that from σΔε, one cannot infer the absolute values of , because of the pedestal; however, the relative values of “spatial noise” in various L4 SSTs can be ranked using σΔε.
 Note that both nighttime and daytime ACSPO L2 data can be used in the L4 comparisons. However, daytime data are contaminated by solar reflectance and subject to diurnal warming, thus resulting in larger . Although elevated will have no effect on the relative ranking of σΔε and , the larger pedestal reduces the relative differences in σΔε and consequently the contrast between the various L4 SSTs. Therefore, in what follows, only nighttime ACSPO L2 data will be used, and the “N” notation will be omitted from all equations.
 Relative ranking using the mean bias μΔε and standard deviation σΔε, for one particular day, tells how well the various L4s capture the spatial structure of the ACSPO L2 product on that day. To determine whether this ranking remains consistent from one day to another (i.e., the μΔε and σΔε statistics remain stable) analyses of time series are needed. Hence the following four statistics were analyzed:
 1. μ(μΔε) – average-in-time of the spatial mean (L4-L2) bias
 2. σ(μΔε) – variability-in-time of the spatial mean (L4-L2) bias
 3. μ(σΔε) – average-in-time of the variability-in-space of the (L4-L2) bias
 4. σ(σΔε) – variability-in-time of the variability-in-space of the (L4-L2) bias
 A “better” L4 product should more accurately capture the spatial variability in ACSPO L2, and have more stable (in time) L4-L2 differences; i.e., allσ(μΔε), μ(σΔε) and σ(σΔε) statistics should be closer to zero. On the other hand, the μ(μΔε) parameter should be used with caution in the current L4 comparisons, due to possible systematic biases in various L4 and L2 products, resulting from their different definitions. In what follows, this latter statistic is also reported, for completeness, with primary consideration given to σ(μΔε), μ(σΔε) and σ(σΔε), in relative L4 ranking.
 L4-SQUAM shows that for various L4s, the mean differences ΔTL4in situ = TL4 − Tin situ range from −300 to +500 mK (note that 1 mK = 0.001 K), and standard deviations from 160 to 650 mK. The time series of global median (μΔε) of ΔTL4in situ for the study period are plotted in Figure 2a, with the dotted lines representing the corresponding temporal means. All the L4 SSTs tend to be biased a little cold, compared with the in situ data, with the largest biases from −120 mK to −150 mK for NAVO K10 and ODYSSEA, respectively. As mentioned above, this bias may result from different definitions of the L4 products, for instance ODYSSEA is designated as “skin” product and therefore is expected to be colder than the bulk in situ data by ∼170 mK. The cause of the cold bias in the NAVO K10 product is not readily apparent. All other L4 SST fields tend to group together with mean values, μ(μΔε), from ∼0 to −50 mK and standard deviations σ(μΔε) < ∼30 mK (cf. Figure 2a). Recall that in situ SST data have been assimilated in most L4 SSTs (except NAVO K10, ODYSSEA, and POES-GOES blended) and validation of L4 against the very same in situ data is not independent. Therefore, in this study we have chosen to additionally compare these L4 SSTs with independent global L2 SST values produced by NESDIS ACSPO system.
 Time series of global median (μΔε) of ΔTL4L2 = TL4 − TL2 are plotted in Figure 2bwith Metop-A as L2 SST. For all L4 products, except ODYSSEA, the range of theμΔε is within ±70 mK. GMPE and CMC are grouped together and show smallest standard deviations, σ(μΔε), whereas all other SSTs show larger temporal variations with maximum variability for ODYSSEA. The mean, μ(μΔε), and standard deviation, σ(μΔε), statistics are also superimposed in the figure. Similar time series were also generated against NOAA-18 and NOAA-19 L2 SSTs but not shown here, in the interest of space. NOAA-19 data are stable, similar to Metop-A time series shown inFigure 2, except the corresponding absolute values of ΔTL4L2biases are ∼+40 mK warmer, suggesting that NOAA-19 SST is ∼40 mK colder than Metop-A SST. NOAA-18 data are unstable and discussed insection 5.
 The time series of global spatial RSD, σΔε, of ΔTL4in situ and ΔTL4L2 are shown in Figure 3. Those L4 products which have only satellite data as inputs (NAVO K10, POES-GOES and ODYSSEA), are generally less consistent with in situ data, as manifested by their higher standard deviations. Interestingly, both low- and high-resolution RTG products, which do assimilate in situ data, are nevertheless found in the “in situ free” cluster. The reason is not fully clear, but may be due to a lower weight of in situ data in the RTG production, or larger RMS errors in physical L2 SST retrievals generated at NCEP and used as input in their L4 analysis. On the other hand, all the foundation SSTs along with the GMPE and OISST.v2: AVHRR compare best with in situ data. This may be because all foundation and Reynolds products assimilate in situ SST data, including GMPE, which however does it indirectly through its ensemble members.
 The corresponding temporal mean of the spatial variances, μ(σΔε), are also estimated from the time series. Figure 3b shows that GMPE, along with all foundation SSTs and NAVO K10, have the smallest μ(σΔε). For GMPE, μ(σΔε) is ∼306 mK, followed by CMC (∼319 mK), NAVO K10 (∼327 mK) and OSTIA (∼337 mK). The temporal variability of spatial variances, σ(σΔε), is smallest for CMC and GMPE with <13 mK, followed by GAMSSA with <20 mK. OSTIA did show a comparable performance of <20 mK before 8 April 2012, when the ENVISAT with the Advanced Along-Track Scanning Radiometer (AATSR) onboard failed, resulting in almost doubled value ofσ(σΔε) ∼35 mK. All the foundation SSTs (CMC, GAMSSA and OSTIA) use AATSR data as input and therefore are also affected by the loss of AATSR. A slight increase in σ(σΔε) is observed for the GMPE, CMC, and GAMSSA, but OSTIA is affected the most. Recall that unlike the other three L4s, that use AATSR SST as one of the input data sets, OSTIA also used it as a “transfer standard,” along with the in situ SST, to bias correct the other satellite SSTs [Stark et al., 2007]. Such large sensitivity to one single input data set is alarming, and suggests that an alternative approach to heavy reliance on one satellite product should be sought.
 All comparison statistics for 11 L4 SSTs against two L2 SSTs (Metop-A and NOAA-19) and in situ SST are systematically tabulated inTable 1. Note that NOAA-18 was not included because its SST time series are unstable, due to unstable AVHRR sensor radiances.
Table 1. μ(μΔε), σ(μΔε), μ(σΔε) and σ(σΔε) of SST Difference (TL4 − TL2 and TL4 − Tin situ)a
In situ SST
NOAA-19 platform shows an extra row providing the temporal variability of DDs (σDD) along with mean DD (μDD), with Metop-A as reference satellite and various L4 SST fields as transfer standard. The metrics forTin situ − TL2are also tabulated. The highlighted rows represent the metrics not used for the L4 ranking. Note that OISST.v2: AVHRR+ AMSR production has been discontinued on 4 October 2011, due to the failure of AMSR-E. Also since the failure of AATSR onboard ENVISAT on 8 April 2012, the UKMET OSTIA is affected the most.
In Situ SST
 Relationships σ(μΔε) versus μ(σΔε), and σ(σΔε) versus μ(σΔε) from Table 1 are shown in two scatterplots in Figure 4. GMPE is found at the bottom left, followed by CMC. The NAVO K10 also measures well against ACSPO L2, but has a larger than average systematic bias, and larger bias and temporal variability with respect to in situ data (cf. Figures 2a–2b).
Martin et al., 2012 compared most of these L4 SSTs with Argo floats [Ingleby and Huddleston, 2007], and reported that globally the GMPE is more accurate than any of the contributing L4 members with an overall standard deviation of μ(σΔε) ∼400 mK, followed by CMC and OSTIA, whereas other L4 SSTs have standard deviations less than 700 mK. Martin et al.  results corroborate well with our results, except the degraded performance of the OSTIA product after the loss of AATSR in April 2012.
5. Cross-platform Biases Between L2 SSTs
 A double differencing (DD) technique employing L4 SST as a “transfer standard,” was introduced elsewhere to evaluate cross-platform biases between L2 SSTs from different satellites, in the full retrieval domain [Dash et al., 2010; Liang and Ignatov, 2011]. The DD technique is supposed to largely cancel out errors in the L4 “transfer standard,” as follows
 Here, TL2,SATdenotes the L2 SST obtained from either NOAA-18 or NOAA-19 andTL2,MAfrom Metop-A. Note that the two terms on the right-hand side ofequation (4)are defined in different domains, and using an L4 SST as a “transfer standard” allows checking for cross-platform consistency in a wider near-global domain, and minimize errors due to different sampling. Note also that no L4 product tested in this study, resolves diurnal cycle. As a result, DDs estimated usingequation (4), include diurnal variability, in addition to artificial cross-platform biases.
Liang and Ignatov  showed that the DDs calculated using OSTIA L4 as a “transfer standard,” were very close to those calculated using the OISST, but more stable in time. Here, this analysis is extended and the effect of 11 L4 fields on DDs is checked. Figure 5a shows a time series of DDN19,MA with various L4s used as a transfer standard, and Figure 5b shows corresponding DDN18,MA. As before, the solid lines represent real DDs calculated with different L4 SSTs, while the dotted lines show the corresponding median fits. The noise present in the median differences of ΔTL4,L2 in Figure 2is largely reduced in the DD time series, as expected. Both NOAA-18 and NOAA-19 SSTs are colder than Metop-A SST. This is also expected, because Metop-A local overpass time is ∼2130 UTC whereas NOAA-18 and NOAA-19 overpass 4–5 h later in the night, at ∼0200 UTC. Since NOAA-18 and −19 fly in very close orbits, the magnitude and stability of the corresponding cross-platform biases inFigures 5a and 5bare expected to be consistent, however this is not the case. NOAA-19 is very stable in time relative to Metop-A, whereas NOAA-18 is not. This is likely attributed to the instability of the NOAA-18 AVHRR sensor, as evidenced from the MICROS DD analysis available online athttp://www.star.nesdis.noaa.gov/sod/sst/micros/. Therefore NOAA-18 data were withheld from this study, pending analyses of root causes of its sensor instability, and only Metop-A and NOAA-19 data were analyzed. The NOAA-19 temporal mean DDs,μDD, and their temporal standard deviations, σDD, are tabulated in Table 1. A remarkable result is that for the full range of L4s, temporal mean DDN19,MA are typically found in a (−43 ± 15) mK range, with the ±15 mK providing an estimate of the DD uncertainty due to the L4 input. It is further observed that the σDD values vary from 16 to 23 mK. The smaller the σDD, the more trustworthy is the estimate of μDD. The σDD values are lowest for GMPE and CMC, consistent with the ranking of the corresponding L4 SSTs obtained using the previous three metrics in section 4.
6. Validation of ACSPO L2 SSTs Against In Situ Data
 One might question how the ACSPO L2 SSTs, used in this study as a “transfer standard,” compare with the “gold standard,” in situ SSTs. The corresponding validation statistics are summarized in the last column of Table 1. The nighttime satellite and drifters observations were collocated within ±4 h and ±20 km using NN approach, with in situ data as the spatial anchor. The number of “ACSPO L2 vs. in situ” match ups obtained “nightly” (∼1,500) is approximately three orders of magnitude less compared with the number of “L4-L2” match-ups. Metop-A and NOAA-19 SSTs are ∼67 and 24 mK colder than in situ data, respectively. At first glance, this cold bias in “satellite skin” with respect to “in situ bulk” SST is expected. Satellite radiometers are indeed sensitive to skin SST, but recall that regression SST is tuned against the very same in situ bulk, and therefore a zero global average bias is expected. A nonzero bias is thus deemed to be due to suboptimal tuning of the regression coefficients in ACSPO. Work is underway to revisit regression equations, and verify the product using similar analyses. The RMSEs,μ(σΔε), ∼300 ± 5 mK are comparable for the two platforms.
 The mean value of double difference, μDD∼ +42 mK, calculated using in situ SSTs as a transfer standard, should be compared with −42 mK, calculated using GMPE or CMC as a transfer standard. The in situ estimate ∼+42 mK is deemed more accurate measure of cross-platform consistency, because in situ SSTs take into account the diurnal change from 2130 (Metop-A overpass) to ∼0200 (NOAA-19 overpass) (although its magnitude may be suppressed compared to skin SST representative of ACSPO L2 product). Recall that no L4 SST product used in this study, resolves the diurnal cycle. The sign of the difference between these two estimates, ∼+85 mK, is positive as expected, while the magnitude provides an estimate of global-average diurnal cooling between the two overpass times. This estimate provides a measure of uncertainty in DDs due to using diurnally flat L4s and emphasizes the need for diurnal cycle resolving L4s. Furthermore, it suggests a way of how the future diurnally resolving L4s can be validated, using the methodology described here.
7. Summary and Future Work
 Eleven different L4 SST fields have been tested as potential first-guess SST inputs in ACSPO, by comparing them to an independent source of ACSPO L2 SSTs andiQuam quality controlled in situ data. The two customary metrics employed are the global spatial biases and variances of the differences (TL4-TL2) and (TL4-Tin situ). In this study, their temporal stability was additionally analyzed, a new metric not previously considered in the validation analyses. It is generally observed that the GMPE and CMC provide an improved combination of metrics over the currently employed OISST, and thus serve as a more consistent first-guess SST field for ACSPO. The UKMO OSTIA was comparable to GMPE and CMC before 8 April 2012 (cf. similar findings reported byMartin et al. , based on comparisons of various L4 SSTs with Argo floats), but strongly degraded following the loss of AATSR onboard ENVISAT.
 The relative ranking of various L4 products is additionally supported by cross-platform consistency analyses based on using double differences (DD). When GMPE and CMC products are used as a “transfer standard,” the DDs appear more stable in time and therefore presumably more accurate. For most efficient utilization of the DD potential, however, diurnally resolving L4 SST products are needed. Work commenced in SST community to generate such products and we will evaluate them as they become available, for the potential use in ACSPO.
 Following a request from our NOAA Coral Reef Watch colleagues, we plan to reprocess ACSPO data back to 2004. The most viable L4 SST candidate for this reprocessing, at this time, is the CMC product, which goes back to January 2002 and was recently reprocessed back to 1991 (B. Brasnett, personal communication, 2012). Recall that the operational OSTIA product goes back to only September 2006, and GMPE became available in September 2009. Our analyses suggest that extending GMPE analyses back in time would be beneficial (which in turn, would require generating longer time series of individual ensemble members). We also plan to explore the effect of using different atmospheric profiles as input to ACSPO, by testing the high-resolution ECMWF upper air fields against GFS data, using a methodology similar to described here.
 As a side observation of this study, NOAA-18 sensor appears unstable in time, and NOAA-16 sensor was found even more vulnerable [Liang and Ignatov, 2011]. Analyses are underway to identify the root causes, and fix if possible. The Metop-A and NOAA-19 sensors work well and corresponding ACSPO SST products appear stable and accurate, although adjustments of SST regression coefficients are needed to remove negative biases of a few hundredths of a Kelvin with respect to in situ SSTs. Future work may also include exploring the remaining L4 products (which as of this writing, have not been included yet in L4-SQUAM), and extending current analyses to regional scales.
 Authors acknowledge funding support from GOES-R and JPSS Program Offices and from NESDIS Ocean Remote Sensing Program. The authors also thank the various providers of L4 SST data used in these analyses, STAR SST colleagues supporting the ACSPO system andiQuam monitoring, and three reviewers of this paper for constructive feedback and helpful suggestions. The views, opinions, and findings contained in this report are those of the authors and should not be construed as an official NOAA or U.S. government position, policy, or decision.