3.1. Bias (Accuracy)
 One of the targets for the ARC project was that the SST accuracy should be 0.1 K, by which we mean that the mean error (bias) should be within ±0.1 K for all regions. The assessment of this target is based on comparison of the ARC SSTs with validation data, with ARC SST minus validation SST hereafter referred to as “discrepancy.” Compared to the previous assessment of this target [Embury et al., 2012b], the results here (1) use updated, more precise in situ locations extracted from the ICOADSv2.5 IMMA data set [Woodruff et al., 2011], reducing errors associated with identifying spatial coincidence between satellite and in situ, (2) include new matches to Argo profiling floats for which the uppermost measurements at a mean depth of approximately 4 m have been extracted from the Met Office Hadley Centre EN3 data set [Ingleby and Huddleston, 2007] from 2004 onward, and (3) cover all three ATSR sensors used in the ARC project.
 Accuracy is assessed using the median discrepancy between ARC SST0.2m values and drifting buoy measurements across the whole time series, as shown in Figure 3. Most 5° latitude-longitude cells inFigure 3 are within the target range for D2 and D3 SSTs. Since the calibration accuracy of the drifting buoys to which the ARC SSTs are matched is thought to be of order 0.2 K [O'Carroll et al., 2008], cells with fewer than ∼16 independent drifting buoys are not statistically reliable for determining a 0.1 K bias with high confidence (i.e., at the 2σ or 95% confidence level). Statistical uncertainty in the validation data account for some of the outlier cells that appear in common between plots.
Figure 3. Median discrepancy of nighttime ARC SSTs (depth SST estimates for a depth of 20 cm) minus matched drifting buoys, averaged on 5° latitude-longitude cells. This is averaged across all sensors, so AATSR is heavily weighted because of the increase in drifting buoy numbers since the year 2000. Results are shown for dual-view retrievals using (left) two (D2) and (right) three (D3) channels.
Download figure to PowerPoint
 Global median bias statistics for D2 and D3 retrievals (again, after model-based adjustments of the skin-SST retrievals to SST0.2m or SST1m) are given relative to drifting buoys, the Global Tropical Moored Buoy Array (GTMBA) [McPhaden et al., 1998; McPhaden et al., 2009; Bourlès et al., 2008] and the shallowest measurement of Argo profiles, in Table 1. The methodology follows Embury et al. [2012b]. The shallowest measurement for most Argo profiles is between 3 and 5 m depth, so there is a 2 to 4 m difference in the nominal depths in this case. However, Argo SSTs are a useful complement to the drifting and the tropical moored buoy networks, being highly accurate and distributed across all latitudes.
Table 1. Global Median Discrepancy (K) Between ARC SSTs and Different Types of In Situ Measurementsa
| ||D2 Daytime||D2 Nighttime||D3 Nighttime||SSTskin-SSTdepth Day, Night|
 For AATSR and ATSR-2, the median discrepancy between ARC depth SSTs and drifting buoy or GTMBA SSTs is within ±0.05 K of zero, for all sensor combinations, type of algorithm and day or night. For AATSR, the comparison with the somewhat deeper Argo gives results that are relatively more positive during the day, with the D2 daytime mean discrepancy being warmer than the D2 nighttime discrepancy by 0.06 K, and warmer than the D3 nighttime discrepancy by 0.05 K. This may partly reflect mean thermal stratification between the ARC SST1mand the Argo SST depth. Nonetheless, the results suggest that the 0.1 K target accuracy is met for ARC SSTs from AATSR and ATSR-2 and also that the SSTs from these two sensors are consistent with each other to well within 0.1 K. The intersatellite consistency has been achieved in ARC by exploiting the overlap period of ATSR-2 and AATSR to obtain consistency in the calibration and simulation of their brightness temperatures.
 The overlap of ATSR-2 and ATSR-1 was similarly exploited to tie ATSR-1 at the end of its life to ATSR-2 at the start of its life. However, ATSR-1 presents additional challenges. The detector temperature of ATSR-1 was not stable, which affects the calibration of at least the 12μm detector temperature. The calibration impact of the detector temperature trend has been modeled using the best available information of the impact on the sensor calibration, but it is not clear how to tie the start of life of ATSR-1 to ATSR-2. In ARC, we elected to tie the D2 SST at the detector temperatures prevalent at the start of the ATSR-1 mission to the SSTs obtained using in addition the 3.7μm channel, which was available for the first ∼8 months of the mission. However, this is an area where more investigation should be done.
 ATSR-1 is also problematic because of the stratospheric aerosol present from May 1991 and diminishing through to roughly the end of 1993, arising from the massive eruption of Mount Pinatubo in the Philippines. The ARC coefficients, following the techniques ofMerchant et al. , are designed to be “aerosol robust” [Embury and Merchant, 2012], i.e., insensitive to the presence of this mode of aerosol. Robustness depends on accurate forward modeling of the brightness temperature impact of the aerosol relative to aerosol-free sky. A residual sensitivity to the presence of aerosol is therefore possible. (This is examined insection 4.3.1.)
 There are thus two reasons why early ATSR-1 SSTs could be biased relative to later ATSR-1 SSTs: uncertainty in the effect of changing detector temperatures, and residual sensitivity to stratospheric aerosol. It appears fromTable 1that ATSR-1 SSTs over the full lifetime are negatively biased by between −0.05 and −0.1 K relative to the later sensors, depending on the in situ measurements used for the comparison.
 The time evolution of the monthly, global, median discrepancy relative to drifting buoys is shown in Figure 4(left). For comparison, we show the equivalent plot for Argo matches from 2004 onward (derived from <1% as many matches as the drifting buoy plot, but nonetheless giving a consistent picture). During the overlap of ATSR-1 and ATSR-2 (roughly the last 6 months of 1995), the overlap analysis has brought the median discrepancy of the two sensors into alignment to within 0.05 K, which gives us an estimate of how closely the two sensors agree. The agreement between ATSR-2 and AATSR during their overlap (late 2002, early 2003) is closer. The median discrepancy relative to drifting buoys is relatively constant and generally between 0.00 and 0.05 K throughout the ATSR-2 and AATSR period. The robust standard deviation (RSD) of differences relative to drifting buoys for ATSR-2 is slightly larger than for AATSR, and again is stable, except for a few months in early 2001. This was a period when the attitude control of the ERS-2 platform was degraded, and thus the satellite geolocation and nadir-forward colocation have larger errors. This means the satellite and in situ SSTs are less precisely matched, and the errors from mismatch in location are significantly greater, adding to the RSD of discrepancy. However, there is no obvious effect on the mean discrepancy, so ARC SSTs from this period are not biased relative to before or after the event. The largest biases are associated with ATSR-1, as noted earlier, and here we see that there is an evolution of these biases in time, with a bias that is more negative than −0.1 K around the start of 1992 dissipating by mid-1993, followed by another negative excursion around the start of 1994, and a much smaller difference during late 1995. Given its timing, it is tempting to interpret the first negative excursion to residual sensitivity to the Pinatubo aerosol, which is considered further insection 4.3.1. In mid-1994, there was a rapid rise in the ATSR-1 detector temperatures (which are actively cooled) from around 92 K to around 98 K, and thereafter the detector temperatures rose to about 105 K by the end of the main ATSR-1 mission (mid-1996). This instrumental factor seems likely to play a role in the second negative excursion in the data. However, it is difficult to be definitive because the stability of the drifting buoy ensemble is not controlled or guaranteed during this period. Compared to the 2000s, the number of drifting buoys at that time was low, and the geographical coverage was uneven. These are both factors that render the stability of the validation values here open to question. A more formal analysis of the stability of ARC SSTs is therefore presented insection 3.3.
Figure 4. (left) Time series of (bottom) median discrepancy and (top) robust standard deviation (RSD) for the ATSR mission compared to drifting buoys. Results are shown for D2 daytime (red), D2 nighttime (blue) and D3 (black) retrievals. (right) The equivalent time series for AATSR compared to Argo. RSD is calculated using the median absolute deviation from the median, scaled by a factor such that for a Gaussian distribution, the RSD equals the conventional standard deviation.
Download figure to PowerPoint
3.2. Standard Deviation (Precision)
 We estimate the precision of the ARC data by three-way analysis, which allows the simultaneous estimation of the precision (standard deviation of the errors) of each of three observation types. Here, version 5 SSTs from the Advanced Microwave Scanning Radiometer–Earth Observing System (AMSR-E) available from mid-2002 [Wentz and Weissner, 2000; Wentz et al., 2003] were used as the third observation type with ARC SSTs and drifter SSTs. The AMSR-E data are gridded on a 0.25° latitude-longitude grid. The ARC-drifting buoy collocations were provided by the Met Office processing system used for the near real time monitoring of ATSR (details outlined by K. Lean and R. W. Saunders (Validation of the ATSR Re-processing for Climate (ARC) dataset using data from drifting buoys and a three-way error analysis, submitted toJournal of Climate, 2012)).
 O'Carroll et al. previously undertook a three-way analysis on earlier data sets of AATSR, AMSR-E and drifting buoy SSTs. They derived and discussed the applicability of an expression for the error variance (σx2) of a set of observations x:
where Vxy is the variance in the difference between two observation types, x and y, etc. In equation (1), the term −0.5Vyz deducts the variance contribution from y and z from the mean of the variances of x relative to y and z (which is 0.5(Vxy + Vzx)) to yield an estimate of the variance of xitself. The three data sources must be closely collocated in time and space. Here a tolerance of different observation times of 180 min is used, and the buoy location must lie within the ARC grid cell, which in turn must lie within the AMSR-E grid cell. Differences in observation time and the different nature and spatial scales of the three types of observation mean that some true geophysical variability will be folded in unknown proportions into the precision estimates.
 The new three-way precision analysis was carried out for the years 2003–2009 in order to assess whether there are any trends in the precision of any of the types of observation. Since we closely reproduce the approach ofO'Carroll et al. , there is good comparability with the earlier results.
 The inferred precision values are shown in Table 2. The same precision is found for the ARC data in 2003 as was found for the AATSR data in the study of O'Carroll et al. . Similar precision is also found for the AMSR-E SST in the two studies, while the precision of the buoy SSTs in this analysis is slightly lower. The precision estimates for the ARC SSTs have the smallest range over the seven years and do not have any obvious trend. For the AMSR-E SSTs there seems to be a deterioration in precision over time, which could be due to the instrument degrading with age and/or increasing radiofrequency interference. The buoy SST precision improves slightly in the early years and thereafter is stable. The Data Buoy Cooperation Panel (DBCP) indicates an increase in the number of drifting buoys reporting on the GTS between 2003 and 2005. The introduction of the regime with increased numbers of drifting buoys coincides closely with the improvement in precision. However, we have not found documented evidence that the “extra” buoys deployed are of different quality.
Table 2. Standard Deviation of Error for 2003–2009 for ARC D3 SST1m, AMSR-E SST and Drifting Buoy SST
|Instrument||Standard Deviation of Error for Each Year (K)|
 The analysis above is possible only for AATSR (because of the availability of AMSR-E, an example of the utility of observing SST by several independent means). It is clear fromFigure 4, and knowledge of the instruments involved that ATSR-2 SST precision is likely to be comparable to that of AATSR (except during the period of degraded geolocation accuracy), while ATSR-1 SSTs are markedly less precise.
 We assessed the temporal stability of ARC SST estimates at 1 m depth through comparison with SST measurements from GTMBA moorings in the tropical Pacific. The components of the GTMBA outside of the tropical Pacific have existed for too short a time for this application. Outside the tropics (e.g., Gulf Stream, Gulf of Mexico, UK/Western European Shelf), the operational moorings managed by the National Meteorological and Hydrological Services (NMHS; e.g., NOAA, UK Met Office) were assessed, but typically too few passed the selection criteria in each region. Thus, the stability of the ARC SST1m outside of the tropics is subject to ongoing investigation.
 Buoy measurements were extracted from the International Comprehensive Ocean-Atmosphere Data Set (ICOADSv2.5 [Woodruff et al., 2011]) and collocated with the ARC SSTs. The ICOADS measurements were quality controlled using the ICOADS trimming flags to discard any observation more than 4.5 standard deviations from the climatological median. This should exclude any gross outliers in the buoy data but maintain the extreme values associated with the El Niño Southern Oscillation (ENSO) [Wolter, 1997]. The SST1mvalues collocated with ICOADS moored buoy measurements are the value for a clear-sky 0.1° grid cell containing the buoy observation. The difference between the SST1m and the buoy SST was then calculated for each collocated pair of observations. The differences were deseasonalized (DSST hereafter) by subtracting the mean annual cycle in difference in each time series. Any DSST more than 3 standard deviations from the climatological monthly mean for a given buoy was discarded. This limit has been chosen to exclude any DSSTs that may be cloud contaminated or contain errors that have not been otherwise detected. Deseasonalizing was done to minimize the risk that aliasing of any annual cycle in difference would cause step changes to be falsely detected. This was not expected to be a problem for the tropics (since annual cycles are small), but was done in this study nevertheless.
 Each buoy used in the stability assessment had a minimum of 120 months with 5 or more DSST values over the period 1991–2009. Separate monthly mean time series of DSST were constructed for daytime and nighttime data from each buoy meeting this requirement. These were then assessed for step changes using a penalized maximal t test (PMT) [Wang et al., 2007]. Step changes were assumed to indicate spurious changes in the buoy data, unless they were clustered in time across multiple buoys, which would be more consistent with a step change in the ARC SST1m data. No such clustering in time was found, therefore any buoy with a step change statistically identified in either time series was discarded. The individual difference time series are noisy and the sensitivity of the PMT tests is therefore low. Noise in the individual time series will also affect any trend analysis.
 To increase the sensitivity of the PMT and reduce the impact of the noise in the differenced time series, the DSST values have been averaged across the retained buoys to give monthly mean composite time series for the daytime and nighttime ARC SST1m (Figure 5). When the PMT is applied to these combined time series, step changes are identified in both the daytime and nighttime series during 1993, with ARC SST1m ∼0.1 K warmer after the change. The timing is consistent with the reduction of stratospheric aerosols after the 1991 eruption of Pinatubo. The step detection technique characterizes this as a step, but in reality it appears more gradual. The excursion of bias seen against drifting buoys in 1994 in Figure 4 is not evident here.
Figure 5. Time series of the deseasonalized composite monthly mean differences (DSST). The dashed lines indicate the identified break points and mean values for each segment.
Download figure to PowerPoint
 In order to assess the stability of the ARC SSTs, a linear trend model with AR(1) errors (which allows some correlation between any given month and the previous month) has been fitted to the two difference time series from 1994 onward. No significant trends in the differences are found and the confidence intervals are smaller than the ARC target stability of 0.005 K yr−1. The 95% confidence intervals for the trends are −0.0026 to 0.0015 K yr−1 (daytime) and −0.0018 to 0.0019 K yr−1 (nighttime). These results suggest that the ARC SSTs meet the target stability in the tropics from 1994 onward. As mentioned in section 3.1, sensitivity to the Pinatubo aerosols and sensor instability are candidate explanations for the ∼0.1 K negative shift of the early ∼2 years of the ATSR-1 SSTs. Both the PMT and trend analysis assume that the error characteristics of the monthly mean values tested remains constant over time. However, the standard deviation of errors for the period of the ATSR-1 satellite are approximately double that for the ATSR-2 and AATSR periods due to the larger errors in ATSR-1 retrievals (as previously shown inFigure 4(top)). Nevertheless, when the PMT and trend analysis are performed using only the ATSR-2 and AATSR data similar results are found.
 The lack of long-term stable reference sites, especially in the extratropics, has been problematic for this research, as has lack of accessible metadata on the existing in situ buoy SST measurements. It is not clear that the current in situ observing system is adequate for assessing the stability of the satellite SST record to the required accuracy outside of the tropics.