The Atmospheric Infrared Sounder (AIRS) is the first of a new generation of advanced satellite-based atmospheric sounders with the capability of obtaining high–vertical resolution profiles of temperature and water vapor. The high-accuracy retrieval goals of AIRS (e.g., 1 K RMS in 1 km layers below 100 mbar for air temperature, 10% RMS in 2 km layers below 100 mbar for water vapor concentration), combined with the large temporal and spatial variability of the atmosphere and difficulties in making accurate measurements of the atmospheric state, necessitate careful and detailed validation using well-characterized ground-based sites. As part of ongoing AIRS Science Team efforts and a collaborative effort between the NASA Earth Observing System (EOS) project and the Department of Energy Atmospheric Radiation Measurement (ARM) program, data from various ARM and other observations are used to create best estimates of the atmospheric state at the Aqua overpass times. The resulting validation data set is an ensemble of temperature and water vapor profiles created from radiosondes launched at the approximate Aqua overpass times, interpolated to the exact overpass time using time continuous ground-based profiles, adjusted to account for spatial gradients within the Advanced Microwave Sounding Unit (AMSU) footprints, and supplemented with limited cloud observations. Estimates of the spectral surface infrared emissivity and local skin temperatures are also constructed. Relying on the developed ARM infrastructure and previous and ongoing characterization studies of the ARM measurements, the data set provides a good combination of statistics and accuracy which is essential for assessment of the advanced sounder products. Combined with the collocated AIRS observations, the products are being used to study observed minus calculated AIRS spectra, aimed at evaluation of the AIRS forward radiative transfer model, AIRS observed radiances, and temperature and water vapor profile retrievals. This paper provides an introduction to the ARM site best estimate validation products and characterizes the accuracy of the AIRS team version 4 atmospheric temperature and water vapor retrievals using the ARM products. The AIRS retrievals over tropical ocean are found to have very good accuracy for both temperature and water vapor, with RMS errors approaching the theoretical expectation for clear sky conditions, while retrievals over a midlatitude land site have poorer performance. The results demonstrate the importance of using specialized “truth” sites for accurate assessment of the advanced sounder performance and motivate the continued refinement of the AIRS science team retrieval algorithm, particularly for retrievals over land.
 Temperature and water vapor soundings from satellites have been available for more than 35 years, beginning with the launch of two pioneering sensors on the Nimbus 3 satellite in April 1969 [e.g., Wick, 1971]. By providing twice daily atmospheric profiles on a global scale, the satellite soundings were expected to provide significant improvements in numerical weather prediction. Because of improved data assimilation methods, use of the satellite data for model initialization, and expanded computer resources, many significant improvements in forecast accuracy have indeed been realized. However, the relatively low vertical resolution provided by these broadband sensors, such as the Advanced TIROS Operational Vertical Sounder (ATOVS) on NOAA's current Polar Operational Environmental Satellite, has limited the benefit of these satellite observations for improved weather forecasting. In many circumstances, for example, the low vertical resolution is responsible for large absolute errors as well as reduced horizontal variance, and strong horizontal error correlation of the satellite soundings [e.g., Phillips, 1980]. In order to provide further advances in numerical weather prediction and climate studies, the requirement for sensors and algorithms with improved vertical resolution and with an ability to handle partially cloudy scenes was recognized. Beginning in the mid-1970s, extensive retrieval algorithm development, instrument design, and experimental field studies were then conducted to determine the feasibility and requirements of a future advanced sounder; the reader is referred to Smith  for an overview of the evolution of the satellite sounder to achieve high vertical resolution and improve weather prediction. As a result of these efforts, specific requirements for the NOAA/NASA “Interagency Sounder” were established in the late 1980s, providing motivation for the Atmospheric Infrared Sounder (AIRS) as part of NASA's Earth Observing System (EOS).
 Launched into orbit on 4 May 2002 on the EOS Aqua platform, AIRS is the first of a new generation of satellite-based advanced infrared sounders, designed to provide data with higher vertical resolution and accuracy to numerical weather prediction centers for improved medium range weather forecasting. Aumann et al.  present a thorough overview of the science objectives, data products, retrieval algorithms, and ground data processing concepts of AIRS. Exploiting the higher vertical resolution made possible by high spectral resolution and the combined use of Advanced Microwave Sounding Unit (AMSU) microwave and infrared radiances in the presence of clouds, AIRS is expected to produce retrievals of temperature and water vapor with high accuracy under clear and partly cloudy conditions (for scenes with cloud fraction up to 80%). The AIRS retrieval performance requirements are stated in terms of RMS errors for vertically averaged layers in the troposphere. The required performance is 1 K RMS in 1 km vertical layers below 100 mbar for temperature profile retrievals and 20% (10% goal) RMS in 2 km vertical layers below 100 mbar for water vapor profile retrievals. There are no documented requirements for the yield or mean error (bias) of the retrieved profiles.
 Detailed characterization of the accuracy of the satellite products is required for many applications of the data. The high-accuracy retrieval goals of the advanced sounders, combined with the temporal and spatial variability of the atmosphere and difficulties in making accurate measurements of the atmospheric state, necessitates careful validation using well-characterized ground-based sites. Starting in the early 1990s the Department of Energy Atmospheric Radiation Measurement (ARM) program has developed several ground sites around the world to collect detailed measurements of the atmospheric state and coincident radiation with an overall goal of improving the representation of radiation and clouds in climate models [Stokes and Schwartz, 1994]. The ARM Climate Research Facility ground sites have matured and are now well suited for providing data sets that are both accurately characterized and statistically significant for performing satellite validation [Ackerman and Stokes, 2003]. As part of an ongoing AIRS Science Team effort initiated under team member Dr. H. E. Revercomb and continuing under a collaborative effort between the NASA EOS project and the ARM program, data from the heavily instrumented ARM sites are used to create “best estimates” of the atmospheric state and surface properties at the Aqua overpass times. Combined with the colocated AIRS observations, the ARM products are being used for various studies, including assessment of clear sky observed minus calculated radiance spectra, aimed at development and validation of the AIRS clear sky forward radiative transfer model. Validation of the AIRS clear sky radiative transfer algorithm using the ARM best estimate and other data is presented by Strow et al. .
 The primary purpose of this paper is to provide a description of ARM site best estimate validation products (section 2) and, using the ARM products, to provide an objective and accurate characterization of the AIRS Team version 4 atmospheric temperature and water vapor retrievals (section 3). Section 4 summarizes the results of this study and plans for further development and applications of the ARM validation data.
2. ARM Site Atmospheric State and Surface Best Estimate Products
 Best estimates of vertical profiles of atmospheric pressure, air temperature, water vapor concentration, and surface parameters are produced for Aqua overpasses of the ARM sites using ARM and other data. This section includes an introduction to the ARM sites included in this effort, a description of what data are used and how the best estimate products are created, and a discussion of the accuracy and statistical significance of the products for AIRS temperature and water vapor profile retrieval validation.
 Three ARM sites, shown in Figure 1, are included in this validation effort. These include the Tropical Western Pacific (TWP) site on the small equatorial island of Nauru, the Southern Great Plains (SGP) central facility site in north central Oklahoma, and the North Slope of Alaska (NSA) site near Barrow, Alaska. These sites were chosen by ARM to represent three of Earth's primary climate regimes. Covering a range of conditions with varying degrees of retrieval difficulty, the sites are also well suited for characterizing the AIRS retrieval performance. The TWP Nauru site has been very instrumental in AIRS validation because it is an ocean site with well-known surface emissivity, high water vapor amounts, and low variability in the atmospheric temperature and water vapor. This provides a valuable data set for validation of various components of the AIRS molecular absorption and radiative transfer models and provides the most straightforward retrieval conditions. Operational since 1991, the ARM SGP central facility site is one of the most heavily instrumented ground-based sites in the world for atmospheric state and radiation measurements. Numerous detailed experiments have been conducted at the SGP site in order to characterize and improve the accuracy of ARM's temperature and water vapor profiling capabilities. The site is located within a large agricultural region and experiences wide fluctuations on many timescales in the atmospheric temperature and water vapor, surface conditions, and cloud conditions. Having to deal with these conditions, most notably the complex and unknown surface emission, satellite retrievals are more difficult at the SGP site. The NSA site provides data of clouds, atmospheric state, and radiative processes for cold and dry conditions. It is located near the coastal town of Barrow on the Arctic Ocean, with surrounding areas including open ocean, sea ice, coastal inland tundra, and fresh water ecosystems. Passive remote sounding of arctic atmospheres is historically very difficult because of low contrast of clouds and the variable surface, and the NSA site serves as one of the most difficult sites for AIRS retrievals.
 Each ARM site is instrumented with numerous ground-based sensors to measure wind speed and direction, water vapor and liquid water, temperature, cloud properties, aerosols, and radiation. For AIRS temperature and water vapor retrieval validation, the primary profile measurements are made with Vaisala RS-90 radiosondes [Paukkunen et al., 2001]. Depending on the particular site, radiosondes are typically launched two to four times per day at the ARM sites. These normally scheduled launches were supplemented to provide an optimal data set for AIRS validation. Three “phases” of dedicated radiosonde launches have been completed at each of the three ARM sites as of October 2004. During these periods, two radiosondes are launched for Aqua overpasses of the ARM sites. The first radiosonde is launched 45 to 90 min prior to the overpass time and the second is launched just before (∼5 min) the overpass time, providing radiosonde measurements of the upper and lower troposphere at the AIRS overpass time. At the SGP and NSA sites, this is accomplished using two radiosonde receivers, allowing two radiosondes to be tracked simultaneously. At the TWP site, a single receiver is used and the first sounding is terminated prior to initiating the second. Each dedicated radiosonde launch phase includes approximately 90 overpasses (180 radiosondes) spanning several months and including both day and nighttime overpasses. Beginning and ending dates of each phase are listed in Table 1. On a given day, the dedicated launches are performed for the overpass with the smallest local zenith viewing angle. The launches are performed for all conditions except for totally overcast or precipitating conditions (conditions also for which full infrared plus microwave AIRS retrievals are not performed) or for local zenith angles greater than 30 degrees. Additional dedicated launch periods are scheduled for later in 2005.
Table 1. Beginning and Ending Dates of Aqua Overpass Dedicated Radiosonde Launch Phases
2 September 2002 to 25 May 2003
9 September 2003 to 7 February 2004
2 April 2004 to 10 August 2004
15 September 2002 to 29 May 2003
8 September 2003 to 20 March 2004
2 April 2004 to 29 September 2004
1 September 2002 to 27 December 2002
10 September 2003 to 8 February 2004
23 April 2004 to 5 September 2004
 Each Vaisala radiosonde water vapor profile is multiplied by a height-independent scale factor such that the total column precipitable water vapor (pwv) of the radiosonde profile is equal to that measured simultaneously by a ground-based two-channel microwave radiometer (MWR). This “microwave scaling” technique for the radiosonde water vapor profiles has several important advantages [Revercomb et al., 2003, and references therein] including the removal of significant radiosonde batch-to-batch calibration variations and diurnal biases in the radiosonde total column water vapor, and providing an absolute calibration reference. The Goff Gratch formulation of vapor pressure over liquid water [Smithsonian Institution, 1984] is used to compute mass mixing ratio profiles from the radiosonde temperature and relative humidity measurements.
 To account for temporal differences between the AIRS overpass and the radiosonde measurements, time continuous profiles of temperature and water vapor at the ARM site are used to interpolate the radiosonde profiles to the overpass time. At the SGP site, AERI-plus [Feltz et al., 2003] profiles of temperature and water vapor are used for this. The AERI-plus retrievals are derived from ground-based downwelling spectral radiance measurements by the Atmospheric Emitted Radiance Interferometer (AERI) and from coincident model analysis fields, providing vertical temperature and water vapor profiles every 6 to 8 min with high vertical resolution in the boundary layer, and lower–vertical resolution profiles extending to ∼2 mbar. The AERI-plus fields are obtained at the (altitude-dependent) radiosonde observation times and at the satellite overpass time, and the relative changes in the fields are used to interpolate the radiosonde profile to the overpass time. This is performed by adding AERI-plus temperature differences to the radiosonde profile and multiplying the radiosonde water vapor profile by AERI-plus water vapor amount ratios. This is done independently for each altitude bin of the AERI-plus fields. With this approach, the drift of the radiosonde position with height is not taken into consideration. The time interpolation is performed for the two radiosondes with launch times bounding the overpass time, and the two profiles are then combined in a weighted mean, with the weights determined by the time difference and by the magnitude of change in the AERI-plus fields between the observation and overpass times. Only radiosondes that reach a minimum air pressure of 200 mbar are used. AERI-plus time cross sections and radiosonde trajectories for an example nighttime SGP site overpass on 25 July 2002 are shown in Figure 2. If the AERI-plus fields are not available for a given case, fields from the Rapid Update Cycle (RUC-2 [Benjamin et al., 2004]) are used. In the event that neither AERI-plus nor RUC-2 fields are available, simple linear interpolation between the two bounding radiosonde profiles is used to compute the profile at the overpass time. Note that at the SGP site, continuous water vapor retrievals are available from a Raman Lidar. This sensor, however, was not fully operational during the past two years [Turner and Goldsmith, 2005] and it has therefore not been included in the production of the best estimate profiles to date.
Figure 3 displays estimates of the short-term variability of the temperature and water vapor profiles at the SGP site at the Aqua overpass times, utilizing differences between the pairs of microwave scaled radiosondes launched ∼60 min apart during the three Aqua overpass dedicated launch phases. Differences for individual high–vertical resolution profiles are shown. Analogous to how the AIRS retrieval validation statistics are computed (as described in more detail in section 3), the mean differences (biases) and RMS differences for 1 km (temperature) and 2 km (water vapor) vertical layering are also shown. Note that in this process, the individual profiles are first reduced to lower vertical resolution and then the bias and RMS statistics are computed; this has the dramatic effect of suppressing many of the largest errors associated with vertically fine-scale structure present in the high–vertical resolution profiles. Throughout the troposphere for both day and night cases, the temperature RMS differences range from 0.5 to 1.0 K, and the water vapor RMS percent differences range from 10 to 25%, as shown in Figure 3. Variability at the TWP site is less, with temperature RMS differences ranging from 0.3 to 0.7 K, and water vapor RMS percent differences ranging from 7 to 15%. Considering the AIRS retrieval goals, this temporal variability is significant. This emphasizes the importance of minimizing the temporal mismatches between the ARM and AIRS measurement times when attempting to accurately characterize the AIRS retrievals, and the need for the temporal interpolation approach described previously. Note that the mean temperature difference for the lowest altitude bin for daytime overpasses is nonzero, which can be attributed to the warming typically experienced at the SGP site in the early afternoon.
 The AIRS retrievals are performed using a combination of a 3 by 3 array of AIRS footprints and one AMSU footprint and have footprint diameters ranging from 50 km at nadir to ∼150 km at the edge of scan. Spatial gradients can be significant on this scale, and the local ARM site profiles are therefore adjusted to account for large-scale spatial gradients and the location of the ARM site within the AMSU footprint. At the SGP site, this is accomplished using 4 km spatial resolution Geostationary Environmental Operation Satellite (GOES) retrievals [Ma et al., 1999]. The hourly GOES fields are linearly interpolated to the overpass time. As with the temporal interpolation, the spatial adjustment is performed in a relative sense, where relative differences in the GOES fields are computed between the GOES footprint closest to the ARM site and the mean of all GOES footprints within the AMSU footprint. The profile adjustments are performed using temperature differences and water vapor amount ratios and are done independently for each altitude bin of the GOES retrievals. Sample spatial gradients for the 25 July 2002 SGP overpass are shown in Figure 4. Analogous to Figure 3 in regard to short-term temporal variability, Figure 5 displays the differences between local SGP site profiles and those which have been adjusted to account for large-scale spatial gradients within the AMSU footprints as discussed above. As with the temporal differences, spatial gradients at the SGP site are significant in terms of the ability to validate the AIRS retrievals. Spatial variability for the tropical ocean is far less than for land sites in the middle and high latitudes. Attempts have been made to use data from the Geostationary Meteorological Satellite (GMS-5, prior to May 2003) and GOES 9 (after May 2003) to account for the spatial gradients at TWP. Various issues have been encountered in obtaining and using high-spatial and high–temporal resolution atmospheric profiles from these data, however, and spatial adjustments are not performed for the TWP site profiles. Future work will involve the use of atmospheric profile retrievals from the Moderate resolution Imaging Spectroradiometer (MODIS) [Seemann et al., 2003] on Aqua to account for the spatial gradients within the AMSU footprints at all three sites.
 The high–vertical resolution pressure, temperature, and water vapor profiles are interpolated to a standard altitude grid and written to an output file for each Aqua overpass of each site. The profiles extend to the maximum height of the shortest radiosonde used in the processing, with a minimum height of 200 mbar. Three versions of the profiles are provided in the files, including the unmodified profiles (before time and space interpolation) for the two radiosondes with launch times bounding the overpass time, the weighted mean profile with only the time interpolations performed, and the weighted mean profile with both time interpolation and spatial adjustment performed. Various additional data included in the output files include cloud base height data from a Vaisala ceiliometer, total column liquid water vapor from the MWR, estimates of the skin temperature and surface emissivity, and metadata regarding the logistics of the case.
 Sample ARM best estimate profiles and sample AIRS spectra for the TWP and SGP sites are shown in Figures 6 and 7. In Figure 6, note the relatively low temporal variability of the TWP site profiles and the large variability of the SGP site profiles. Also note the relative lack of high–vertical resolution structure in the TWP profiles, as opposed to the SGP profiles. Figure 7 also demonstrates the range of thermodynamic and cloud conditions that are observed at each site.
 In the following discussion regarding the accuracy of the validation profiles and in the AIRS retrieval validation analysis presented in section 3, only the TWP and SGP sites are included; characterization and validation of the more difficult polar atmospheres at the NSA site is deferred to a subsequent paper. Numerous analyses have been performed to characterize the accuracy of the ARM site temperature and water vapor measurements [e.g., Clough et al., 1994; Lesht and Liljegren, 1995; Tobin et al., 2002; Revercomb et al., 2003; Turner et al., 2003, 2004; Ferrare et al., 2004; Miloshevich et al., 2006; Whiteman et al., 2006]. Considering these various investigations in the context of AIRS retrieval validation, the uncertainty of the ARM best estimate temperature profiles is estimated to be ∼0.2 K below 100 mbar and for water vapor the uncertainty is estimated to be ∼3% below 500 mbar and ∼10% from 500 to 100 mbar. These are estimates of the absolute uncertainties in the mean biases for the TWP and SGP ensembles, due primarily to sensor calibration uncertainties, and should be considered when interpreting the mean biases of the AIRS retrievals investigated in section 3.
 Uncertainties that contribute to individual profiles and overpasses are more difficult to quantify and depend directly on the temporal and spatial variability of the case, the temporal and spatial collocation of the AIRS and ARM measurements, and the uncertainty of the ARM measurements for those conditions. When reduced to 1 km and 2 km vertical layering, the vertical resolution of the best estimate profiles contributes a negligible uncertainty. When performing the microwave scaling, the MWR data are averaged over a 20 minute period coincident with the radiosonde ascent through the lower troposphere, and short-term variability in the MWR observations is also negligible. However, despite the previously described approaches to account for temporal and spatial variability, we should expect some real variability which is not accounted for in the best estimate profiles. For example, even with the MWR scaling approach, residual radiosonde calibration differences for the radiosonde pairs can produce differences that are interpreted as real atmospheric variability. Also, because of radiosonde drift, the MWR, AERI, and radiosondes do not sample the exact same air mass. Since these profiles are used as “truth” in the AIRS retrieval validations, the computed RMS retrieval errors should therefore be considered as upper bounds of the true errors. This is particularly true for the SGP site which experiences large variability. We expect negligible impact of this effect on the computed mean biases since the residual variability is random from case to case. Very rough estimates of the residual RMS contributions due to the residual variability at the SGP site, computed as 20% of the observed short-term temporal and spatial variability (Figures 3 and 5), are 0.2 K for temperature and 7% for water vapor. Further discussion of this issue is included in section 3.
 While significant improvements in radiosonde performance and characterization have been achieved in recent years, measurements of low water vapor amounts in the upper troposphere remain challenging. In this regard it should be noted that candidate corrections to the Vaisala RS-90 upper level water vapor profiles [Miloshevich et al., 2006] are not included in this analyses, but are under consideration for future analyses. The component of these corrections which have the largest potential impact on the AIRS retrieval assessments is based on comparisons of eight nighttime radiosonde profiles with colocated measurements by a reference frost point hygrometer. For the TWP phase 1 profiles, for example, these corrections produce an overall moistening of the radiosonde profiles by approximately 15% from roughly 200 mbar to the tropopause, with significant variations in the corrections from profile to profile. Additionally, the Vaisala radiosonde water vapor profiles are known to have a diurnal bias, with daytime total column water vapor amounts 3 to 8% drier than nighttime total column amounts, as demonstrated with coincident MWR and downwelling infrared radiance observations [e.g., Turner et al., 2003; Miloshevich et al., 2006]. In terms of total column water vapor, the microwave scaling approach accounts for this diurnal bias. Analyses of clear sky observed and calculated AIRS radiances for upper level water vapor spectral channels using day and nighttime MWR scaled RS-90 profiles, however, demonstrate that the diurnal bias is height-dependent, with the daytime RS-90s drier by 5 to 8% in the upper troposphere [Strow et al., 2006]. Impact of this bias on the AIRS retrieval validation is investigated in section 3.
 As mentioned previously, estimates of the surface emissivity and of the local surface skin temperature are included in the best estimate products. Considering the size of the AMSU footprints and the spatial nonhomogeneity of the SGP and NSA sites, it is a difficult task to estimate the effective skin temperature and emissivity without retrieving the quantities from the AIRS data themselves. The skin temperature and emissivity are included, however, along with the best estimate profiles to provide initial estimates as input for top-of-atmosphere radiance calculations. The local skin temperature estimates are measured with downlooking broadband infrared radiometers at the ARM sites. These estimates are not expected to be representative of the larger AMSU footprints. For observations over ocean, the surface emissivity is homogeneous and well known [Masuda et al., 1988; Wu and Smith, 1997]. Over land, however, the spectral surface emissivity can vary largely in both time and space. Likewise, the remote sensing of land surface temperature from satellite requires a detailed knowledge of infrared land surface emissivity. Roughly speaking, a 2% error in the knowledge of the land surface emissivity near 10 microns leads to an error in the derived surface temperature of about 1 K. Since the emissivity of bare soil can vary across the infrared spectrum by 10% or more, errors in the remote sensing of surface temperature from satellites can be substantial. To address this issue for the ARM SGP domain, investigations utilizing land type surveys, unique AERI observations of the distinct surface types, and aircraft-based sounding and imager data have been used to develop a parameterization of the surface emissivity [Knuteson et al., 2003]. In this parameterization, shown in Figure 8, the surface emissivity representative of an AIRS footprint is computed as the linear combination of pure bare soil emissivity and pure vegetated surface emissivity, with the weighting determined from vegetation fractions determined from land type surveys. Improvement of this empirical model for the SGP site, and the development of a similar model for the NSA site, is the subject of ongoing research.
 In summary, the ARM site best estimate data set is a collection of “microwave scaled” Vaisala RS-90 radiosonde profiles, launched at the approximate Aqua overpass times, interpolated to the exact overpass time using time continuous ground-based profiles, adjusted to account for spatial gradients within the AMSU footprints, and supplemented with surface temperature, surface emissivity, and some limited cloud observations. A wide range of atmospheric, cloud, and surface conditions (hot/cold, wet/dry, day/night, clear/cloudy, ocean/land/ice) are sampled at the three ARM sites. Relying on the developed ARM infrastructure and previous and ongoing characterization studies of the ARM measurements, the data set provides a good combination of statistics and accuracy which is essential for assessment of the advanced sounder temperature and water vapor retrievals.
3. Validation of AIRS Temperature and Water Vapor Retrievals
 Various validation activities to assess the accuracy of the AIRS radiances, forward model, cloud cleared radiances, and retrievals have been conducted to date. Summaries of many of these activities and additional references are provided by Fetzer et al.  and Fetzer . In particular, the absolute accuracy of the AIRS spectral radiances have been validated to ∼0.2 K in selected window channels using sea surface temperatures [Aumann and Gregorich, 2006] and for most channels with weighting functions peaking below 50 mbar using aircraft-based spectral radiance observations [Tobin et al., 2006], and the AIRS clear sky radiative transfer model has been validated to 0.2 K for the majority of spectral channels used for tropospheric temperature and water vapor sounding [Strow et al., 2006]. With validated radiance observations and a well-characterized forward model, validation of the AIRS cloud cleared radiance and retrieval products is being pursued. The AIRS team has taken a tiered approach to retrieval algorithm development and validation, starting with clear sky conditions over nonpolar ocean, followed by clear and cloudy conditions over nonpolar ocean, then nonpolar land conditions, and finally polar conditions. Various validation data have been used to assess the accuracy of the retrievals, including numerical forecast model analysis fields, global radiosondes, dedicated radiosondes, lidar, and retrievals from high-altitude aircraft. Using the ARM site best estimate profiles, validation of AIRS Team version 4 temperature and water vapor retrievals for the tropical ocean TWP and midlatitude land SGP conditions is presented here. Susskind et al.  and J.Susskind and M. Chahine (Accuracy of geophysical parameters derived from AIRS/AMSU as a function of fractional cloud cover, submitted to Journal of Geophysical Research, 2005, hereinafter referred to as Susskind and Chahine, submitted manuscript, 2005) provide detailed descriptions of the AIRS Team retrieval algorithm and products. Version 4 is the latest data processing package developed by the AIRS science team and was delivered to the NASA Goddard Distributed Active Archive Center in April 2005 for reprocessing of past and future AIRS data. Relevant to this study, it should be noted that elements of the AIRS fast radiative transfer algorithm used in the version 4 retrievals have been adjusted on the basis of the TWP phase 1 best estimate data set, as described by Strow et al. .
 The retrieval validation analysis presented here includes only satellite overpasses for which the ARM site best estimate products are successfully produced, the matching AIRS retrievals are available, and which involved two radiosondes that were each launched within two hours of the satellite overpass time and reached at least 200 mbar. All retrievals (irrespective of their quality) within 120 km of the ARM site have been considered for the three phases of special ARM site radiosonde launches, resulting in up to ∼10 retrieval comparisons per overpass. The resulting data set includes a total of 79 overpasses and 722 retrieval comparisons for the TWP site and 151 overpasses and 1594 retrieval comparisons for the SGP site, with nearly equal sampling of day and nighttime cases. To account for a recent finding that routine processing of MWR data produces total column pwv values which are too wet by 3% [Liljegren et al., 2005], all of the best estimate water vapor amounts are reduced by 3% before the retrieval statistics are generated. This bias is due to a processing issue and affects the routine processing of ARM microwave radiometer data from April 2002 through the present study. Additionally, since full atmospheric profiles are required for top-of-atmosphere radiance calculations and analyses, the best estimate profiles are supplemented with coincident analysis fields from the European Centre for Medium-Range Weather Forecasts (ECMWF). ECMWF temperature fields are inserted above 60 mbar (or lower for radiosondes which do not reach 60 mbar) and water vapor fields are inserted above 200 mbar. Note that the 200 mbar cutoff is below the tropopause height for TWP profiles while the tropopause height can vary at SGP, as shown in Figure 6.
 The process of computing the AIRS retrieval performance statistics is consistent with the approach used by Susskind et al.  in assessing the retrieval performance using simulated data. For each profile, the ARM best estimate profile is converted to layer quantities (layer mean temperatures and water vapor layer amounts in units of molecules/cm2) consistent with the AIRS fixed 101 pressure level grid, and the lowest valid layer values are adjusted to account for the fractional layer above the true surface. The AIRS 100 “pseudo-level” retrieval profiles provided in the Level 2 “support” product files are also converted to layer values (for temperature) and the bottom fractional layer values above the surface are similarly adjusted. For the temperature profile comparisons, the ARM and AIRS profiles are degraded to ∼1 km vertical layer values, and mean differences (AIRS minus ARM) and RMS differences are then computed for each layer. For water vapor comparisons, the profiles are degraded to ∼2 km vertical layer values, and mean percent differences (100 (AIRS-ARM)/ARM) and RMS percent differences are computed for each layer.
 For water vapor, it should be noted that the mean percent differences (i.e., biases) and RMS percent differences are computed using the convention used in reporting AIRS retrieval statistics [e.g., Susskind et al., 2003], where water vapor layer amounts are used to weight the observed percent differences. The weighting is done independently for each ∼2-km thick layer. For ensembles with higher water vapor variability, this process has the effect of down-weighting percent errors for cases with lower water vapor amounts. The weights are normalized for the ensemble, and the weighting therefore has lower impact on the results for ensembles with lower water vapor variability (e.g., in the tropics and in the upper troposphere).
 As mentioned in section 2, the process of degrading the profiles to lower vertical resolution (e.g., 1 km for temperature) before computing the mean and RMS differences has the dramatic effect of suppressing large errors associated with fine-scale vertical structure in the ARM profiles. The satellite soundings have limited vertical resolution and cannot resolve this fine structure. The comparisons presented here therefore suppress most of this “null-space” error [Rodgers, 1990] associated with the retrievals.
 Temperature and water vapor profile validation statistics are shown for four ensembles of profiles, corresponding to four different selections of retrievals according to the version 4 retrieval quality flags. The reader is referred to Susskind and Chahine (submitted manuscript, 2005) for a detailed description of the quality flags and how their values are determined. Quality flags are provided for various retrieved quantities including the surface fields (Qual_Surf), water vapor profile (Qual_H2O), and temperature profile for three vertical regions (Qual_Temp_Profile_Top, above 200 mbar; Qual_Temp_Profile_Mid, 3 km to 200 mbar; Qual_Temp_Profile_Bot, below 3 km). Each flag can have values of 0, 1, or 2 corresponding to retrievals of highest, good, and poor (i.e., not accepted) quality, respectively. Sea surface temperatures are used in evaluating Qual_Surf and this flag is applicable only to nonfrozen ocean profiles (and is therefore not used for the SGP site selections). The criteria for the four selections of quality flags are given in Table 2. A range of selections is included to demonstrate how the accuracy and yield of the retrievals vary as a function of the assigned quality flags. The first selection (denoted QC1) includes retrievals for which all of the products (including the surface fields, water vapor profile, and temperature profile at all levels) of a given profile are flagged as highest quality. The last selection (QC4) consists of retrievals for which the selected products (temperature at various levels, water vapor) are accepted. That is, for example, if Qual_H2O = 0 or 1 then the water vapor profile is selected irrespective of the temperature and surface field quality flags. The other two somewhat arbitrary selections represent data sets of intermediate quality. The QC2 ensemble are retrievals for which all of the products for a given profile are accepted (all quality flags = 0 or 1) and QC3 are retrievals for which both the water vapor and temperature profiles at a given level are of highest quality. Selections QC1, QC2, and QC4 (for water vapor) have yields (percent of selected cases out of all cases) that are independent of altitude while selections QC3 and QC4 (for temperature) have yields that are altitude-dependent.
Table 2. Retrieval Quality Flag Selections
Quality Flag Criteria
Qual_Surf criteria is used for the TWP (ocean) site but not the SGP (land) site.
Qual_Surf = 0a AND Qual_H2O = 0 AND Qual_Temp_Profile_Top = 0 AND Qual_Temp_Profile_Mid = 0 AND Qual_Temp_Profile_Bot = 0
Qual_Surf ≠ 2a AND Qual_H2O ≠ 2 AND Qual_Temp_Profile_Top ≠ 2 AND Qual_Temp_Profile_Mid ≠ 2 AND Qual_Temp_Profile_Bot ≠ 2
Qual_Temp_Profile_Top = 0 AND Qual_H2O = 0, above 200 mbar Qual_Temp_Profile_Mid = 0 AND Qual_H2O = 0, 3 km to 200 mbar Qual_Temp_Profile_Bot = 0 AND Qual_H2O = 0, below 3 km
Qual_Temp_Profile_Top ≠ 2, temperature above 200 mbar Qual_Temp_Profile_Mid ≠ 2, temperature 3 km to 200 mbar Qual_Temp_Profile_Bot ≠ 2, temperature below 3 km Qual_H2O ≠ 2, water vapor
 Assessment of the AIRS temperature and water vapor retrievals for the TWP site is shown in Figure 9. The highest-quality retrievals (QC1) demonstrate very good performance for both temperature and water vapor. The yield of these retrievals is 10%, and inspection of the coincident AIRS spectra and retrieved cloud fractions show that these are primarily clear sky cases. The 1 km layer temperature biases are less than 1 K at all levels, with small but clearly defined oscillating biases as a function of altitude (AIRS warmer than ARM at ∼500 and ∼150 mbar, AIRS colder than ARM at ∼350 mbar,). The temperature RMS errors are ∼0.6 K or less for nearly all layers below 200 mbar. The 2 km layer water vapor biases are less than 5% below 400 mbar and approximately minus 10% (AIRS drier than ARM) from 300 to 100 mbar. Considering the estimated accuracy of the best estimate upper level water vapor amounts (∼10%) the significance of the bias reported for the upper troposphere is questionable. Application of the suggested radiosonde corrections [Miloshevich et al., 2006] would produce a negligible impact on these comparisons below ∼200 mbar but would moisten the TWP profiles by ∼15% above ∼200 mbar. Note that above 200 mbar the water vapor comparisons are with ECMWF, because of the radiosonde cutoff height. The water vapor RMS errors are very good, with values of ∼10% below 400 mbar and increasing to ∼25% at 150 mbar. As discussed in section 2, the reported retrieval statistics should be interpreted as upper bounds of the AIRS retrieval performance since the ARM validation profiles contain uncertainties that also contribute to the computed statistics. The observed RMS performance at TWP, however, is in fact very similar to the retrieval performance predicted for global clear sky cases using simulated AIRS data. See, for example, Susskind et al. [2003, Figures 5 and 7]. This confirms that the AIRS retrieval algorithm is behaving as expected for clear sky tropical atmospheres over a well-known (ocean) surface. In terms of the best estimate profile accuracy, this result is also significant because the simulation studies include no errors in the “truth” profiles or in the radiative transfer algorithms, which suggests that the uncertainty in the ARM profiles do not contribute significantly to the reported RMS differences. For the other quality flag selections, the temperature and water vapor mean biases do not change significantly with respect to the QC1 results. Going from QC1 to QC4, the yields increase and the temperature and water vapor RMS errors degrade gradually, as shown in Figure 9. Considering all accepted temperature and water vapor products (i.e., QC4), the temperature yield is 55% (89%) below (above) 200 mbar and the water vapor yield is 88%. For these retrievals, the temperature RMS errors are ∼1 K or less below 200 mbar and the water vapor RMS errors are 20% or less below 400 mbar. Retrieval statistics for the QC1 and QC4 ensembles are listed in Tables 3 and 4.
Table 3. Summary of ∼1 km Layer Temperature Differences (AIRS-ARM) for the QC1 and QC4 Quality Flag Selectionsa
 As mentioned previously, analyses of clear sky observed and calculated AIRS radiances suggest that the ARM best estimate water vapor profiles possess a height-dependent diurnal bias, with daytime profiles 5 to 8% drier in the upper troposphere. The bias is suspected to be due to effects of solar radiation on the Vaisala radiosonde water vapor sensor. To assess the impact of the bias on the AIRS retrieval profile validation, the TWP site profile statistics were generated separately for the day and night cases using the highest-quality (QC1) retrievals. With this approach, the AIRS retrievals are assumed to not have a diurnal bias and in this sense are used as a relative reference to assess the ARM profiles. As shown in Figure10, the presence of coherent differences from day to night are not obvious in the retrieval comparisons. Taken literally, the comparisons in the 200 to 300 mbar region suggest that the daytime profiles have higher variability (∼5% RMS) and that the daytime ARM profiles are wetter than the nighttime profiles by 5 to 10%. Further investigation is required in order to resolve this finding with respect to the observed minus calculated spectral analyses.
 Analogous to Figure 9 for the TWP site, assessment of the SGP site retrievals is shown in Figure 11. As opposed to the TWP site results, the SGP site retrieval statistics do not vary significantly between the four different quality flag selections. The 1 km layer temperature biases are 1 K or less and display vertical oscillations similar in character and magnitude to the TWP temperature biases above 700 mbar. Temperature RMS errors are 2 K near the surface, decrease to 1 K from 500 to 400 mbar, and increase to 2 K at 200 mbar. Divakarla et al.  report similar temperature biases as a function of altitude and also discuss potential retrieval issues responsible for the biases. The 2 km layer water vapor biases are 5% or less below 400 mbar and increase to approximately minus 10% at 200 mbar, similar to the TWP result. The water vapor RMS errors are larger than those for TWP, with values of 25% below 500 mbar and increasing to 35% at 200 mbar. There are no significant changes in the results shown in Figure 11 if the time window (radiosonde launch minus overpass time) is reduced from 2 hours to 1 hour and the spatial window (distance from center of AMSU footprint to the ARM site) is reduced from 120 km to 50 km. As discussed previously, AIRS retrievals at the SGP site are more difficult (compared to the TWP site) because of several issues, including the need to account for land surface emissivity effects, much larger atmospheric state variability, and the interpretation of microwave observations of land surfaces for infrared cloud clearing.
 The SGP ensemble includes a wide range of hot/wet and cold/dry profiles, and it should be noted that the water vapor weighting technique used in computing the water vapor statistics has a large effect on the SGP statistics reported here. Computing the statistics without the water vapor weighting, the mean bias and RMS percent errors are approximately 20 and 55%, respectively, in the lower troposphere (as opposed to values of ∼5 and 25%, shown in Figure 11). This is due to fractional water vapor errors that have a strong dependence on the total column pwv, as shown in Figure 12. For low-pwv cases encountered at the SGP site, the fractional pwv retrieval errors have larger variability and larger mean values, with the mean pwv error approaching 25% (AIRS wetter than ARM) at 1 cm pwv. The ARM best estimate pwv values are determined by the MWR retrievals, which have an estimated absolute accuracy of 3% for all but very dry (<0.2 cm pwv) conditions. For the TWP site, the water vapor variability is much less and the water vapor weighting therefore has little impact on the statistics computed for the TWP ensemble. Similarly, water vapor variability in the upper troposphere is relatively low at both SGP and TWP sites, and the water vapor weighting has little impact on the statistics computed for the upper troposphere.
 Last, the AIRS retrievals are examined as a function of cloudiness. The 1 km temperature and 2 km water vapor statistics are computed for the accepted (i.e., QC4) products for a range of cloud fraction bins and displayed as a function of the AIRS retrieved cloud fraction in Figures 13 and 14 for the TWP and SGP sites, respectively. For both sites, the small-scale vertical oscillations in the temperature biases are largely independent of cloud fraction but with the positive bias at ∼500 mbar increasing slightly with cloud fraction, particularly for the TWP site. The upper level water vapor bias increases slightly with cloud fraction at the TWP site. For both the SGP and TWP sites, the temperature and water vapor RMS differences increase slightly with increasing cloud fraction.
4. Conclusions and Summary
 Validation of products from the new generation of advanced satellite sounders requires correlative data sets which are accurate, well characterized and statistically significant. Using various ARM data, including radiosondes launched at the Aqua overpass times, ensembles of “best estimate” temperature and water vapor profiles for Aqua overpasses of the ARM sites have been constructed, as described in section 2. This data set has been used to assess the accuracy of the AIRS radiative transfer algorithm and, in section 3 of this paper, assess the accuracy of the AIRS Team version 4 temperature and water vapor retrievals at the TWP and SGP ARM sites. Specific findings of the retrieval validation study include:
 1. Short-term and small-scale variability of the atmosphere is significant and should be taken into account when validating large area footprint retrievals from polar orbiting satellites. This is particularly important for assessing RMS errors of the retrievals, opposed to the mean differences (biases) for which the random variability cancels for a large ensemble of cases. RMS errors computed using validation profiles that do not fully account for the variability should be interpreted as upper bounds of the true errors.
 2. The conventional approach used to report AIRS water vapor retrieval statistics (and used in this paper) of weighting the observed percent errors by water vapor concentration can produce significant differences from the traditional, unweighted, calculations. This is particularly true for ensembles with high water vapor variability, such as the SGP site lower troposphere.
 3. The yields of AIRS retrievals with temperature, water vapor and surface products flagged with highest quality (i.e., the QC1 ensembles) are 10 and 21% for the TWP and SGP sites, respectively. Considering all accepted products (i.e., the QC4 ensemble) at the TWP site, the middle and bottom level (below 200 mbar) temperature yield is 55%, the top level (above 200 mbar) temperature yield is 89%, and the water vapor yield is 88%. Analogous yields for the SGP site are 71, 89, and 87%, respectively.
 4. AIRS retrievals for the tropical ocean TWP site have very good performance, with RMS errors approaching the theoretical limit predicted by retrieval simulation studies performed with no errors in the truth data or radiative transfer algorithms. Retrievals for which the temperature, water vapor, and surface products are flagged as highest quality (QC1) have the best performance, and the performance degrades gradually as retrievals flagged with lower quality (and more cloudy scenes) are included. For all accepted temperature and water vapor products (QC4), 1 km layer temperature RMS errors are ∼1 K or less below 200 mbar and 2 km layer water vapor RMS errors are 20% or less below 400 mbar.
 5. AIRS retrievals for the midlatitude land SGP site have poorer RMS performance with respect to the TWP site results for both temperature and water vapor. 1 km layer temperature RMS errors range from 1 to 2 K and 2 km layer water vapor RMS errors range from 25 to 35%. This performance is largely independent of the retrieval quality flags, yield, and cloud fraction.
 6. AIRS total column precipitable water vapor (pwv) fractional errors are higher for lower-pwv conditions encountered at the SGP site, with mean fractional errors of ∼25% (AIRS wetter than ARM) at 1 cm pwv. These larger percent errors observed for lower water vapor amounts are suppressed when the water vapor weighting approach is used to compute the AIRS water vapor mean and RMS differences.
 7. For both the TWP and SGP ensembles, small-scale (∼0.5 K) vertical oscillations are present in the AIRS temperature retrievals (AIRS retrievals are too warm at ∼600 mbar, too cold at ∼300 mbar, too warm at ∼150 mbar). The magnitude of the oscillations are largely independent of retrieved cloud fraction.
 8. For both the TWP and SGP sites, water vapor biases are ∼5% or less below 400 mbar and increase to minus ∼10% (AIRS drier than ARM) at 200 mbar. The biases are largely independent of retrieved cloud fraction. The significance of the reported upper troposphere bias is questionable given the estimated absolute accuracy of the ARM profiles (∼10%) at these levels.
 9. Evidence of diurnal biases in the upper level water profiles is not evident in the ARM/AIRS comparisons shown here.
 The AIRS Team version 4 retrieval results reported here for the ARM TWP site demonstrate the potential of the new generation of advanced high–spectral resolution satellite sounders. The TWP site retrievals meet or surpass the 1 K/1-km and 20%/2-km RMS performance requirements throughout the middle and lower troposphere for clear and partly cloudy conditions. The high accuracy is due in part, however, to the well-known (ocean) surface and to the tropical profiles which have little vertical structure and low spatial and temporal variability. The results for the SGP midlatitude land site are more indicative of the AIRS version 4 sounding accuracy under more typical meteorological condition. For the SGP site, the 1-km temperature and 2-km water vapor RMS performance requirements are not met, motivating further improvement of the AIRS science team retrieval algorithm. Retrievals at the SGP site are difficult because of several potential issues, including the need to account for land surface emissivity effects, much larger atmospheric state variability, and the interpretation of microwave observations of land surfaces for infrared cloud clearing. The AIRS Team is currently considering several different algorithm changes to address these issues. An improved method to retrieve and account for the spectral surface emissivity, for example, is currently being evaluated. Improved temperature and water vapor retrieval performance over land is expected to be provided in a subsequent AIRS Team algorithm release.
 The ARM site best estimate products described in this paper will continue to be used to provide systematic and objective evaluations of future AIRS Team algorithm changes. Further applications include assessment of retrievals performed using combined AIRS and MODIS observations [Li et al., 2005], assessment of the AIRS Team version 4 retrievals for the polar NSA site, and assessments of NOAA 16 ATOVS and Aqua MODIS temperature and water vapor retrievals. Case studies and statistical correlations of the retrieval performance versus retrieval and validation parameters are also being used to diagnose specific issues within the retrieval algorithms. Further development of the ARM best estimates products will also be pursued; specific activities include continued studies to characterize the accuracy of the ARM upper level water vapor measurements, improved representation of the SGP and NSA site surface emissivities, and the use of Aqua MODIS products to account for spatial gradients within the AMSU footprints. The data set will also be extended to include additional radiosonde launch campaigns.
 This research was supported by the EOS Science Project Office under NASA grant NNG04GG22G. We gratefully acknowledge numerous ARM infrastructure personnel, David Starr of the EOS Project, and the AIRS project at JPL for facilitating this work. Thanks to Chris Barnet, Bill Smith, and Dave Turner for discussions regarding various aspects of the analysis and comments provided by three anonymous reviewers. ARM data were obtained from the Atmospheric Radiation Measurement (ARM) Program sponsored by the U.S. Department of Energy, Office of Science, Office of Biological and Environmental Research, Climate Change Research Division.