Observing system simulation experiments at the National Centers for Environmental Prediction



[1] Observing system impact assessments using atmospheric simulation experiments are conducted to provide an objective quantitative evaluation of future observing systems and instruments. Such simulation experiments using a proxy true atmosphere, Nature Run, are known as observing system simulation experiments (OSSEs). Through OSSEs, future observing systems that effectively use data assimilation systems in order to improve weather forecasts can be designed. Various types of simulation experiments have been performed in the past by many scientists, but the OSSE at the National Centers for Environmental Prediction (NCEP) presented in this paper is the most extensive and complete OSSE. The agreement between data impacts from simulated data and the corresponding real data is satisfactory. The NCEP OSSE is also the first OSSE where radiance data from satellites were simulated and assimilated. Since a Doppler wind lidar (DWL) is a very costly instrument, various simulation experiments have been funded and performed. OSSEs that evaluate the data impact of DWL are demonstrated. The results show a potentially powerful impact from DWL. In spite of the many controversies regarding simulation experiments, this paper demonstrates that carefully constructed OSSEs are able to provide useful information that influences the design of future observing systems. Various factors that affect the assessment of the impact are discussed.

1. Introduction

[2] Building and maintaining observing systems with new instruments is extremely costly, particularly when satellites are involved. Any objective method that can evaluate beforehand the improvement in forecast skill due to the selection of future instruments and their configurations will be quite valuable. The forecast skill evaluations performed using simulation experiments are known as observing system simulation experiments (OSSEs). Since all the details of operational data assimilation systems (DASs) have to be reconstructed for a simulated world, the OSSE is quite labor intensive. However, its cost is a small fraction of the total cost of an actual observing system and, therefore, a relevant aid in its design. By running OSSEs, a current operational DAS can be upgraded to handle new data types and volumes. Additionally, OSSEs can speed up the development of databases, data processing (including formatting), and quality control software. Recent OSSEs show that some basic tuning strategies can be developed before the actual data become available. All of this will accelerate the operational use of new observing systems. Through OSSEs, future observing systems such as the National Polar Orbiting Environmental Satellite System will be designed for optimal use in DAS to improve weather analyses and forecasts [Arnold and Dey, 1986; Lord et al., 1997; Atlas, 1997]. Without OSSEs, it would take a long time for new data to be successfully utilized in operational weather forecasts.

[3] Among the many future instruments, the Doppler wind lidar (DWL) has often been evaluated by OSSEs [Halem and Dlouhy, 1984; Arnold and Dey, 1986; Rohaly and Krishnamurti, 1993; Stoffelen et al., 2006] because it is a very costly instrument and, therefore justifies the cost of OSSEs. In this paper, results are presented from DWL-OSSEs that were conducted at NCEP in collaboration with the National Environmental Satellite, Data, and Information Service (NESDIS), Simpson Weather Associates (SWA), and NASA.

[4] The general guideline for OSSE was reviewed by Arnold and Dey [1986]. A Nature Run (NR), which serves as truth for the OSSE, was produced by using the numerical weather prediction (NWP) model at the European Center for Medium-Range Weather Forecasts (ECMWF) [Becker et al., 1996]. The NCEP DAS and NWP models were used to assimilate data. NWP models at NCEP and ECMWF have contrasting physics parameterizations and different algorithms for dynamics, and this difference is expected to mimic model error (the difference between an NWP model and the real world) in real-world data assimilation. The satisfactory agreement between data impacts from simulated data and the corresponding real data is presented. In this paper, this type of OSSE is called a “full OSSE” to distinguish it from other types of OSSEs.

[5] Various simulation experiments have been attempted that use real data for existing instruments and only simulate future instruments. Cress and Wergen [2001] conducted observing system replacement experiments (OSREs) to evaluate the impact of spaceborne DWL over Northern Hemisphere (NH) land using existing wind observations. Marseille et al. [2008a, 2008b, 2008c] developed a method called the sensitivity observing system experiment (SOSE). In a SOSE, adjoint sensitivity structures are used to define a pseudotrue atmospheric state for the simulation of the prospective observing system. An alternative method, the analysis ensemble system (AES) [Tan et al., 2007], uses the spread in the ensemble as a proxy for the analysis and background uncertainty based on arguments of error growth [Fisher, 2003]. To test the realism of the OSRE, SOSE, and AES, both the analysis and forecast impacts need to be carefully calibrated, just as in an OSSE.

[6] Although a SOSE, OSRE, or AES allow a quick study without an NR, the SOSE requires an adjoint model to generate the new observations and the AES requires an established ensemble system. However, interpretation of the results becomes more complicated for the SOSE, OSRE, and AES. Full OSSEs with a long NR allow for quantitative assessment of the analysis and forecast impacts. Therefore, although an initial investment is required for a full OSSE, it is today the most reliable strategy to use full OSSEs for the impact assessment of prospective observing systems. The error characteristics of this work show realistic behavior [Errico et al., 2007].

[7] Throughout the simulation experiments, realistic data should be processed. This OSSE is the first one where satellite level-1B radiance data were simulated and assimilated. In some OSSEs, satellite radiance data are simulated by simple interpolation from NR temperature fields, but this does not replicate the complexity of radiance data. In the OSSE described by Stoffelen et al. [2006], satellite radiance data were simulated [Becker et al., 1996] but not used in the OSSE. Without radiance data, a large impact from DWL over the Southern Hemisphere (SH) is obtained but does not represent the real-world impact. To avoid misleading results, only data impacts over the NH were presented by Stoffelen et al. [2006]. However, the impact of DWL over the NH oceans could still be overestimated without radiance data.

[8] DWL is often simulated as a vector wind, but in this OSSE it is simulated and assimilated as line-of-sight (LOS) wind. Estimation of vector wind from LOS is challenging task that needs to be evaluated through OSSE. Providing simulated vector wind is a dangerous shortcut in evaluating DWL data impact.

[9] Section 2 provides a description of the NR and the evaluation and adjustment of the NR. An overview of the NCEP DAS used for OSSEs and a procedure to set up OSSEs is given in section 3. Simulation of observed data (section 4), results from calibration experiments (section 5), and the impact of the DWL (section 6) are also shown. The results are presented and various factors that affect the results are discussed in section 7. Finally, plans and strategies for future OSSEs are discussed in section 8. In this paper, elements and experiences are illustrated, while more technical details are described by Masutani et al. [2006].

2. Evaluation and Adjustment of the NR

[10] The NR, which serves as a proxy for the true atmosphere in OSSEs, needs to be sufficiently representative of the real atmosphere and must be produced by a state-of-the art NWP model. The observational data for existing instruments are simulated from the NR, and forecast and analysis skill for the real and simulated data are compared.

[11] A lengthy uninterrupted forecast is used as the NR for this study. The NR was provided by ECMWF, which is described by Becker et al. [1996]. The 1-month-long forecast run was made at a resolution with a triangular truncation of 213 (T213) and with 31 vertical levels starting on 5 February 1993. This resolution corresponds to a horizontal grid spacing of approximately 60 km. The version of the model used for the NR is the same as in the ECMWF 15 year reanalysis and contains Tiedtke's mass flux convection scheme [Tiedtke, 1989] and prognostic cloud scheme [Tiedtke, 1993]. The 6 hourly data, from 0600 UTC 5 February through 0000 UTC 7 March 1993, were provided by ECMWF. The forecast run starts from the analysis at 00Z on 5 February, and the sea surface temperatures (SSTs) were kept constant with the values at the initial time. The effect of this constant SST in the NR will be discussed in section 5.

[12] If the same model is used for both the NR and DAS steps, this is called an identical twin OSSE [Liu and Kalnay, 2007]. During the early years of OSSEs, identical twin OSSEs were often conducted due to the lack of variety in high-fidelity NWP models. The simulated model error to be handled by the DAS will be unrealistically low, which severely compromises the OSSE data impact study [see Arnold and Dey, 1986]. If the NR model is a different version of the DAS weather model, the OSSEs are called fraternal twin OSSEs. A study employing the same NR as was used in this paper was performed by Cardinali et al. [1998] and Stoffelen et al. [2006] to evaluate the impact of the DWL that will be launched through the Earth Explore Atmospheric Dynamics Mission (ADM-Aeolus) [Stoffelen et al., 2005]. Since the forecast model used by the DAS was a different version of the ECMWF model, it is considered to be a fraternal twin OSSE. Neither identical twin nor fraternal twin OSSEs adequately capture the growth of model error, which is unavoidable in an operational DAS, and this handicap prevents a realistic evaluation of the impact of observations on forecasts.

[13] A lengthy, uninterrupted forecast is used as the NR for this study. The idea of using analysis fields as the NR was also considered. However, analysis fields are forced by existing observations and are also affected by background error covariance and many other parameters in the DAS, such as the nonuniform observation sampling. Also, analysis fields are available only at the analysis time while forecast data are continuous and can be saved as frequently as needed. It is important to have dynamical consistency among the predicted variables and time evolution within the NR. Lahoz et al. [2005] and Keil [2004] were assessing potential observing systems for the stratosphere. They justified using an analysis as there are very few observations in the stratosphere.

[14] Although the NR has to serve as truth in an OSSE, it does not have to be the same weather system as the actual atmosphere. However, the statistical characteristics of the NR have to be similar to the real atmosphere. The 1993 ECMWF NR period was found to be relatively neutral as an El Niño–Southern Oscillation event, and the tropical intraseasonal oscillation was decaying. A comparison of cyclone activity between the NR and the ECMWF reanalysis was performed. The number of cyclones in the ECMWF analysis is about 10% higher than in the NR, which is within the natural variability. The distribution of cyclone tracks was found to be realistic.

[15] Cloud evaluation is particularly important for the data assessment presented in this paper. Clouds are important targets for a DWL, and they also interfere with the lidar beam reaching lower levels. Therefore large differences in the NR cloud amount will significantly affect the sampling of the simulated data. Realistic clouds are also necessary for generating proper cloud-track winds from geostationary platforms. Finally, the cloud distribution affects the simulation of radiance data.

[16] All over the globe, the NR high-level cloud cover (HCC) amount appears larger than the satellite-observed estimate. The amount of low-level cloud cover (LCC) over the oceans is less than observed, and the amount of LCC over snow is too high [Masutani et al., 1999]. Figure 1 shows an observed estimate for total cloud cover (TCC) based on three different sources: the USAF Real-Time Nephanalysis (RTNEPH) [Hamill et al., 1992; Henderson-Sellers, 1986], the International Satellite Cloud Climatology Project (ISCCP), and the NESDIS experimental product Clouds from the Advanced Very high Resolution Radiometer (CLAVR-phase 1); these combined sources are used for verification. Details about the CLAVR data are available from the CLAVR Web site (http://cimss.ssec.wisc.edu/clavr). In general, the NR total cloud agrees with observational estimates, except over the North and South poles. After careful investigation, we found that because of the lack of reliable observations, there is no strong evidence for an overestimation of HCC and polar cloud by the NR. However, the underestimation of low-level stratocumulus over the oceans and their overestimation over snow are clearly problems, and adjustments to the clouds were necessary [Masutani et al., 1999].

Figure 1.

Total cloud cover (TCC, %) for February 1993 estimated from three different sources: (a) U.S. Air Force Real-Time Nephanalyses (RTNEPH) [Hamill et al., 1992]; (b) International Satellite Cloud Climatology Project (ISCCP), stage D2; and (c) NESDIS experimental product, “Clouds from Advanced Very High Resolution Radiometer” data (CLAVR phase 1).

[17] Low-level cloud adjustment consists of replacing these clouds with the Warren cloud climatology [Warren et al., 1988; Hahn et al., 1996] if there is rising motion over the ocean, or dividing the clouds by 1.5 if there is snow cover over land. After the adjustment, the LCC frequency distribution over the ocean increased (Figure 2).

Figure 2.

Frequency distribution (%) for ocean areas containing low-level cloud cover (LCC) in 20 categories. Each category has 5% of cloud cover. Solid curve indicates NR cloud cover without adjustment; dashed curve indicates with adjustment.

[18] Various NRs have been used in the past, and many of them have known problems. The best way to verify the NR is to conduct the calibration described in section 5 and demonstrate a satisfactory data impact in simulated experiments. If the selection of an NR has a fundamental problem, it will fail calibration.

3. The NCEP Elements of the OSSEs

3.1. Data Assimilation System

[19] The NCEP global DAS used in this paper is based on the spectral statistical interpolation (SSI) of Parrish and Derber [1992] and Derber and Wu [1998]. It is a three-dimensional variational analysis scheme and was used for the OSSE described in this paper. Satellite level-1B radiance data are directly assimilated [McNally et al., 2000]. LOS winds from DWL are directly used. The March 1999 (OP1999) and 2004 (OP2004) versions of NCEP's operational global forecast system were used for the data impact tests presented in this paper. OP1999 includes the DAS and NWP model used in 1999 and OP2004 includes the DAS and NWP model used in 2004. A T62 spectral model was used for most of the experiments, and the effect of model resolution is discussed in section 8, where the T170 model is used for comparison.

[20] Sometimes the inclusion of new instruments requires a major revision to the DAS in order to accommodate both large amounts of data and the increased spectral resolution of the new sounding instruments. More details about the forecast model, SSI, and the upgrades are described by Global Climate and Weather Modeling Branch [2003, 2004].

3.2. Preparation for the OSSE

[21] Before starting our simulation experiments, we needed to consider the noise produced by initial conditions from the other model. If the DAS used for an OSSE is significantly different from the DAS used for the analysis from which initial conditions are taken, a spin-up period is required. The 1993 NCEP DAS used temperature retrieval, while Level 2B radiance data were used in 1999. Since this significantly affects the temperature fields, the period from 1 January 1993 to 5 February 1993 (the first day of the T213 NR) was used to spin up from the 1993 to the 1999 version of the NCEP DAS.

[22] The ECMWF analysis at 0000 UTC on 5 February 1993 was used for the initial conditions to produce the NR. Therefore the real analysis from the DAS used for the OSSE at 00Z on 5 February 1993 can be used as the initial conditions for both the real and simulated DAS. The first week of the NR was not used in the data impact test because of drift from the real atmosphere to the NR model atmosphere. All data impact tests start at 0000 UTC 13 February 1993. SSTs valid at 0000 UTC 5 February were used for experiments with simulated data, and real weekly SSTs were used for experiments with real observations. The use of constant SST in NR is discussed in section 5.

4. Simulation of Observations for the Control Experiment

[23] Observations were simulated at the same locations as in the 1993 distribution so that real and simulated data impacts could be compared. Satellite level-1B TIROS Operational Vertical Sounder (TOVS) radiance data were simulated for NOAA 11 and NOAA 12, which were available in February 1993. In the calibration experiments described in section 5, data impacts from the simulated data were compared with data impacts from real observational data. The simulation of conventional data was done by the NASA Goddard Space Flight Center (NASA/GSFC) and NCEP using real observational data distributions available in February 1993. Cloud motion vectors (CMVs) were simulated at the observed data locations instead of being based on cloud from the NR. This is not satisfactory, but the number of CMV observations was not significant in 1993 and did not seriously affect the results. Random errors with a Gaussian distribution were added to all conventional observations. Representativeness errors (REs) used for the NCEP operational DAS to control weighting for each observation were used here to determine the amplitude of the random error.

[24] The TOVS data, High Resolution Infrared Radiation Sounder (HIRS) and level-1B radiance data from the microwave sounding unit (MSU), from NOAA 11 and NOAA 12 were simulated and assimilated as level-1B radiance data by NOAA/NESDIS, and the strategies for including error in simulations were presented by Kleespies and Crosby [2001]. The radiative transfer model used in the simulation was RTTOV-6 [Saunders et al., 1999], which is different from the OPTRAN used in the NCEP DAS [Kleespies et al., 2004]. This difference adds a more realistic error to the radiance data.

[25] All data, including radiance and DWL, are saved in Binary Universal Form of Representation (BUFR) of meteorological data [World Meteorological Organization (WMO), 2002] format, which is used for NCEP operations. It is important that the data in OSSEs are saved in the same format as in operations. There is often a long time period spent in data preparation after the real data become available. Therefore it is important that an OSSE includes the preparation of databases.

5. Calibration of the OSSE

[26] Calibration of OSSEs verifies the simulated data impact by comparing it to a real data impact. To conduct an OSSE calibration, the data impact of existing instruments has to be compared to their impact in the OSSE. It may be difficult to reproduce the exact real data impact in the simulation. However, if the difference is explained, we will be able to interpret the OSSE results as to the real data impact. The results from calibration experiments provide guidelines for interpreting OSSE results on the data impact in the real world. Without calibration, the quantitative evaluation of data impact using OSSE could mislead the meteorological community. In this OSSE, calibration was performed and presented.

[27] The denial of rawinsonde observation (RAOB) winds, RAOB temperatures, and TOVS radiances in various combinations was tested. As an example, Figure 3 shows RMSE, averaged twice daily between 0000 UTC 13 February and 1200 UTC 28 February, of the differences between experiments without RAOB winds and the control for 200 hPa meridional wind (V). Figure 3 shows a general agreement on the data impact of RAOB wind between the simulated and real analyses. However, the impact of RAOB winds is slightly weaker in the simulation for the NH. The calibration was performed using the OP1999.

Figure 3.

The 200 hPa V fields. RMSE difference (m s−1) from CTL for (top) analysis and (bottom) 72 h forecast. The data impact is described as a reduction in RMSE from the NR. RMSEs are computed with 12 h sampling, time averaged between 13 and 28 February.

[28] Anomaly correlation (AC) skill for the 72 h forecast of the 500 hPa height field is verified against the analysis of the control experiments. The analysis of the control experiments (CTL) includes conventional observations and TOVS. A comparison of the AC between real and simulated experiments is presented in Figure 4. The following experiments are presented: (1) without TOVS (NTV), (2) with TOVS but without RAOB winds (NWIN), and (3) with TOVS but without RAOB temperatures (NTMP).

Figure 4.

The 500 hPa height AC time-averaged between 13 and 28 February. Seventy-two hour forecast fields are verified against the control analysis. Control runs include all conventional data and TOVS radiances. For each run RAOB winds, RAOB temperatures and TOVS radiances are withdrawn in turn (NWIN, NTMP, and NTV, respectively). Figure shows (left) Northern Hemisphere and (right) Southern Hemisphere and (top) simulation experiments and (bottom) the real system.

[29] Forecast skill is verified against experiments using all the data (CTL). In both the real and simulated experiments, NWIN shows the least skill in the NH and less skill globally, compared to NTMP. Therefore RAOB winds have more impact compared to RAOB temperatures in both the simulated and real cases (Figure 4).

[30] Simulated TOVS data should be of better quality than the real TOVS data because various systematic errors and correlated large-scale errors have not been added. Therefore it is expected that denial of the simulated TOVS would result in more skill reduction than the denial of the real TOVS. However, Figure 4 shows that the impact of real TOVS is much larger than the simulated TOVS in the SH. Variable SST was used in the assimilation with real data and constant SST was used in the simulation. The consistency in response from the two different SSTs between the simulated and real atmospheres was confirmed. These results suggest that if SST has a large variability, the impact of TOVS becomes more important. With this NR, the data impact with a slowly varying SST could be tested in the SH.

[31] Further detailed evaluation of the data impact in the simulation experiments is discussed by Errico et al. [2007]. Errico et al. [2007] also pointed out a deficiency in the spectral characteristics of the NR, which is the lack of short waves. Any data impact that depends on small scales may not be reproduced in this OSSE. This is one of the reasons the T62 model, which is much coarser than the resolution of the NR, is mainly used in this paper. Because of this problem in SST, the results are mainly presented for the NH.

6. Configuration of DWL

[32] Since the DWL is one of the most costly instruments planned, various OSSEs have been supported. Rohaly and Krishnamurti [1993] evaluated the laser atmospheric wind sounder (LAWS), and Stoffelen et al. [2006] and Tan et al. [2007] evaluated ADM.

[33] In this OSSE, instead of evaluating a specific instrument four representative types of DWL are evaluated: (1) Hybrid_DWL: DWL with scanning, sampling is from all vertical levels; (2) non_scan_DWL: DWL without scanning, sampling is from all vertical levels and in only one direction; (3) Upper_DWL: DWL with scanning, sampling is from upper levels; and (4) Lower_DWL: DWL with scanning, sampling is from lower levels and clouds.

[34] Upper level DWL sampling represents measurements of molecular scattering; lower level sampling represents measurements of aerosol and particle returns. Through these experiments, we expect the data impact from the specific type of DWL can be estimated from the data impact of these four DWLs. Figure 5 illustrates the vertical distribution of DWL measurement. Lower_DWL has measurements from clouds as well as the atmospheric boundary layer, and the measurements reach 600 hPa at midlatitudes and 400 hPa in tropical regions. The simulations are done assuming each type of data is collected from one satellite. However, the configuration of the data set can be achieved by multiple satellites.

Figure 5.

Zonally and time-averaged numbers of DWL measurements in a 2.5° grid box with 50 mb thicknesses in 6 h. Numbers are divided by 1000. Note that the sizes of the 2.5° boxes are smaller at higher latitudes.

[35] A representativeness error of 1 m s−1 was assigned to DWL. This is the representativeness error that gives the maximum data impact. Therefore the results presented in this paper are expected to show the maximum possible impact from DWL. DWL data are generated by averaging shots within a 200 km square to achieve the required accuracy.

[36] Wind data from the DWLs were simulated as LOS components of wind, which is the component along a direct line between a satellite and an observation point. Assimilation code for LOS wind was implemented into the NCEP DAS and has been tested through the OSSEs. Vector winds (U and V) are often used by OSSEs for DWL data. However, obtaining vector winds requires satellite systems that are capable of taking measurements from at least two different directions at approximately the same time and location. Since this is not possible for the non-scan_DWL with one lidar, using vector winds will compromise the reliability of the assessment. In the NCEP OSSEs, adjusted low-level cloud is used to enhance the sampling from lower levels to make it more realistic (see section 2).

7. Various Factors That Affect the Data Impact

[37] In this section, the impact of DWL data is presented and various factors that affect the results are discussed. The meridional wind (V) is mainly used to assess the performance of DWL. Note that the evolution of atmospheric phenomena at shorter times and smaller spatial scales is dominated by the wind field, while for longer times and larger spatial scales the mass (temperature) field is dominant [Halem and Dlouhy 1984; Kalnay et al., 1985; Stoffelen et al., 2005]. In the NH, excellent skill at the global scale is mostly achieved with existing data (conventional and TOVS). Therefore the impact of DWL is expected to be seen at the synoptic scales. The skill in predicting temperature (T) comes mainly from planetary scale events, while the skill in predicting V comes mainly from the synoptic scale. U and V contain the information about relative vorticity at the synoptic scale while U and T contain information about waveguide [Hoskins and Ambrizzi, 1993]. Therefore V depicts the information about relative vorticity. The large-scale U component can be inferred from T observations in the extratropics, while DWL wind observations mainly define the synoptic scale wave that is represented in relative vorticity and V.

[38] The data impacts are described as a reduction of RMSE from NR or improvement in AC (%) from the CTL. All RMSEs and ACs with respect to the NR are computed with 12 h sampling, time averaged between 13 and 28 February. For the total scale, zonal wave numbers from 1 to 20 are used and zonal wave numbers with 10–20 components are used for the synoptic scale.

7.1. Radiance Data

[39] Since the main observations in the SH are from radiance data, the impact from DWL is mainly in the SH if radiance data are not included [Cardinali et al., 1998]. In this paper, the CTL run includes conventional and TOVS data. Figure 6 shows the reduction of RMSE from NR due to inclusion of TOVS radiance in CTL, and the reduction by including Hybrid_DWL in CTL. The main impact of TOVS radiance spread was in the SH while the peak of the impact of DWL is located in the tropics.

Figure 6.

Difference in RMSE (m s−1) from NR for 200 hPa meridional wind. Averaged twice daily between 13 and 28 February. Improvement is due to adding Hybrid_DWL to CTL in analysis. Positive values indicate improvement. Reduction in RMSE due to (top) including TOVS data and (bottom) adding Hybrid_DWL to CTL run.

[40] In this paper, results for the NH are presented, because realistic assessment of data impact over the SH with 1993 level radiance data, which is much less than current, with unrealistic fixed SST is not suitable for demonstration for OSSEs. Errico et al. [2007] also pointed out unrealistic analysis increments in simulated data impact in SH for this data set.

[41] Figure 7 shows an improvement in AC by adding DWL to NTV (remove TOVS radiance from CTL). The improvement in AC for the wind fields is about 1–3% with the Hybrid_DWL, even without TOVS radiance data. In Figure 8, including TOVS data, the impact is reduced to half. Larger impact of DWL is expected at smaller scales [Stoffelen et al., 2006], and Figures 7 and 8 confirm that the impact is much larger at the synoptic scale. The improvement in AC is nearly up to 8% without radiance data. In Figure 8, the impact was reduced by half with radiance data. The large impact in the analysis could be achieved by the large difference between observations and guess fields that is produced by a poor NWP model. This kind of improvement in the analysis cannot be maintained in the forecast, and forecast skill will rapidly decrease with time. This is very clear in the forecast performance in the tropics.

Figure 7.

Differences in time-averaged AC (%) with a NR from the NTV for (top) 200 hPa meridional wind and (bottom) 850 hPa meridional wind in the NH. Averaged twice daily between 13 and 28 February. Shown are (left) ACs computed using the total scale and (right) ACs for the synoptic scale. Positive differences mean the addition of DWL data improves the forecast. In these figures, NTV includes conventional data only, and the assimilation was performed using the 1999 DAS. Green curve, Hybrid_DWL+NTV; purple curve, upper_DWL+NTV; orange curve, lower_DWL+NTV; and blue curve, non_scan_DWL+NTV.

Figure 8.

Same as Figure 7, except all experiments include TOVS radiance data. The green dashed curves are for Hybrid_DWL with (20 times) thinned measurements to make the number of measurements similar to non-scan_DWL. Assimilation was performed using the 2004 DAS. Green solid curve, Hybrid_DWL+CTL; green dashed curve, Thinned Hybrid_DWL+CTL; purple curve, upper_DWL + CTL; orange curve, lower_DWL+CTL; and blue curve, non_scan_DWL+CTL.

7.2. Scanning

[42] Since there has been a great deal of interest in evaluating the nonscanning lidar proposed for the ADM mission [Stoffelen et al., 2005], the first task for the NCEP OSSE was to evaluate the effect of scanning. Since the non_scan_DWL evaluated in this paper uses different sampling and assumes a different technology, conducting simulation experiments for ADM is a future project that still needs to be performed. Both Figures 7 and 8 show the significant advantage of scanning. Even lower_DWL and upper_DWL show a much higher impact compared to non_scan_DWL at all levels. Note that non_scan_DWL samples wind from all levels unlike Upper and Lower DWL.

[43] Since scanning allows the measurement of divergent wind, which cannot be estimated from mass fields, its advantage was expected. However, these results could just be due to the amount of data in the experiments, since the number of measurements from Hybrid_DWL is almost 20 times more than from non_scan_DWL (Figure 5). In Figure 8, AC from (20 times) thinned Hybrid_DWL measurements is also included to demonstrate that scanning is indeed important. It is interesting to observe that thinned data could be better than full data in an 850 hPa synoptic scale analysis. This is due to over weighting in the full Hybrid_DWL at lower levels.

[44] Although the results clearly show the advantage of scanning, an overwhelming technical difficulty in scanning has been reported. On the basis of the results in the NCEP OSSE, a multiple satellite system with nonscanning lidars or one satellite with at least four different directional lidar beams have been considered.

[45] In general, Lower_DWL has more impact at 850 hPa and Upper_DWL at 200 hPa. However, in Figure 7 Upper_DWL becomes better than Lower_DWL, even at 850 hPa after a 60 h forecast. The detailed results may vary depending on the OSSE system used for assimilation and have to be evaluated repeatedly with future OSSE systems. An OSSE system includes the NR, simulation method, DAS and NWP model. In this paper, DAS does not include the NWP model. A complete system including DAS and NWP model are preferred using OP.

7.3. Data Impact and DAS

[46] There is another element that changes the impact in Figure 8 compared to Figure 7. In Figure 7, the CTL for the analysis includes only conventional data and experiments are performed using OP1999. In Figure 8, the CTL uses assimilation with conventional data and TOVS radiance, and all experiments are performed using OP2004.

[47] Figure 9 shows the impact of Hybrid_DWL compared to CTL (with TOVS data) using OP1999 and OP2004. These diagrams show that the improvement in the analysis is roughly similar for OP1999 and OP2004. Although Hybrid_DWL is an extremely powerful DWL, the forecast skill of CTL using OP2004 with respect to CTL using OP1999 is much better than Hybrid_DWL with the OP1999. However, improvement in forecast skill when including Hybrid_DWL is much more robust using OP2004, particularly at 850 hPa for synoptic scales.

Figure 9.

Same as Figure 7, except showing a comparison between the improvement from additional data (scan DWL) and the improvement in using OP2004 with respect to OP1999. Orange dashed curve, CTL for OP1999; orange solid curve, CTL for OP2004; green dashed curve, best_DWL+CTL for OP1999; and green solid curve, best_DWL+CTL and OP1999.

7.4. Data Impact and Model Resolution

[48] In section 7.3, DWL was evaluated using a T62 model. However, the results using a higher-resolution model could be different. The data impact with better models may be reduced because they can provide much better background fields, leaving less room to improve the analysis. On the other hand, a higher-resolution model should be able to utilize the data in finer detail, which may lead to more data impact.

[49] A comparison of the data impact of Hybrid_DWL between the T62 and T170 models was performed to study how much data impact depends on model resolution. T62 corresponds to a grid spacing of approximately 300 km and T170 to 110 km. OP2004 was used for these experiments.

[50] The impact of increasing the model resolution to T170 is comparable to adding the Hybrid_DWL at the total atmospheric scale for T62. In the analysis fields, the data impacts of Hybrid_DWL with respect to the CTL may not be significant in the T170 model. This is because the forecast fields from the T170 model are already good, which leaves less room for improvement. The improvement in AC skill of 200 hPa meridional wind due to adding DWL is similar at T62 and T170. However, the improvement in forecasts is larger at T170 (Figure 10).

Figure 10.

Same as Figure 7, except showing comparisons between the improvement from additional data (Hybrid_DWL) and the improvement from increased model resolution. Solid red curve, CTL for T62 model; dashed red curve, CTL for T170 model; solid green curve, Hybrid-DWL+CTL and T62 model; and dashed green curve, Hybrid-DWL+CTL and T170 model.

[51] Figure 11 shows the reduction of RMSE from an NR including Hybrid_DWL in a 72 h forecast with a T62 model and T170 model. RMSE is time averaged between 13 and 28 February. Compared to the analysis improvement in Figure 6, more negative values are observed. Although the analysis impact for T170 is very similar to the analysis impact for T62, a 72 h forecast with a T170 model shows much more uniform improvement compared to a forecast using a T62 model.

Figure 11.

Difference in RMSE (m s−1) from NR for 200 hPa meridional wind. Averaged twice daily between 13 and 28 February. Improvement is due to adding Hybrid_DWL to CTL in a 72 h forecast. Positive values indicate improvement. (top) T62 model. (bottom) T170 model.

7.5. Role of Observational Error

[52] Designing the observational error is always a challenge in OSSEs. To test a more realistic RE, the difference between the Observation and Analysis (O-A) for each observation point was computed from the real analysis and then added to the simulated data. The O-A value from the real analysis includes REs that come from subgrid-scale structures. These REs from the subgrid scale were not included in the NR since it is a model integration. O-A from the real analysis also include large-scale correlated errors, as well as subgrid-scale random errors, of which we have little knowledge to base our simulation on. The estimation of REs is an important aspect in OSSEs and extensive discussions and evaluations are being conducted.

[53] The impact of DWL also depends on the error in the data used in the control runs (CTL). Control runs, with and without O-A errors, were conducted along with the Hybrid_DWL. In Figure 12, results are presented for the meridional wind (V) in the upper troposphere (200 hPa) and lower troposphere (850 hPa) at all wave numbers (1–20) and at the synoptic scale (wave numbers 10–20). The systematic errors, such as O-A, significantly increase the forecast impact at the larger scales. However, at synoptic scales where the impact is already significant without O-A, changes in impact due to the additional systematic error are rather small, although O-A also adds more subgrid-scale random errors as well. Note that TOVS data were not used for this experiment, because O-A was not added to TOVS data and the CTL experiment is assimilated with conventional data only. OP1999 is used for these experiments.

Figure 12.

Same as Figure 7, except showing changes in the data impact of Hybrid_DWL due to the observational error in CTL data with (obs-anal) added to the conventional data as an additional observational error. For these experiments, the CTL assimilated only conventional data. Green solid curve, Hybrid_DWL+CTL; green dashed curve, Hybrid_DWL+CTL (with additional observational error); blue solid curve, non_scan_DWL+CTL; blue dashed curve, non_scan_DWL+CTL (with additional observational error); and black curve with value zero, CTL either with or without additional observational error.

8. Summary and Discussion

[54] It is a challenging task to evaluate the realism of impacts from OSSEs. The uncertainties in an OSSE, the differences between the NR and real atmosphere, the process of simulating data, and the estimation of observational errors all affect the results. Evaluation metrics also affect the conclusion. OSSE data impacts are often characterized as being overestimated because of lack of sufficient REs. However, simulated data impacts can be underestimated, if the control experiments become too close to the NR. Consistency and theoretical backup of the results help in gaining confidence in the results from OSSEs. As more information is gathered, we can perform more credible OSSEs. Sometimes interpretation of the results becomes very difficult and OSSE results cannot be used to make recommendations. NCEP's OSSEs have demonstrated that carefully conducted OSSEs are able to provide useful recommendations, such as the advantages of scanning, to influence the design of future observing systems.

[55] The Hybrid_DWL used in this paper is the most powerful DWL and may not be achievable with current technology. The improvement in AC may not be as impressive as expected. Local negative impact using a low-resolution model may be disappointing. However, the data impact definitely becomes robust with a better DAS and higher resolution NWP model. Further improvement in the impact from DWL is expected with flow dependent error covariances [Sato et al., 2009] and a four-dimensional variational DAS.

[56] Sometimes the improvement in forecasts due to model improvements is much greater than the improvement due to observations. OSSEs will be able to provide guidance on where more observations are required and where the model needs to be improved. As models improve, there is less improvement in the forecast due to additional observations and reduced data impact in analyses. However, data impact in forecast fields requires advanced DAS and forecast models.

[57] OSSEs are very labor intensive. The NR has to be produced using state-of-the-art NWP models at the highest resolution. Simulating data from a NR requires large computing resources. Simulations and assimilations have to be repeated with various configurations. OSSEs also require the best knowledge in many areas of the NWP system. Expert knowledge is required for each instrument. Efficient collaborations are essential for producing timely and reliable results.

[58] Ideally, all new instruments should be tested by OSSEs before they are selected to be built. OSSEs will also be important in influencing the design of the instruments and the configuration of the observing system. While the instruments are being built, OSSEs will help to prepare the DAS for the new instruments. We have to realize that developing a DAS to assimilate a new type of data is a significant task. However, this effort has traditionally been made only after the data become available. The OSSE effort demands that this same work be completed earlier, and that will speed up the use of the data and the realization of their full potential.

[59] OSSEs will be conducted by various scientists with different interests. Advocates of specific observing systems have a strong motivation to perform OSSEs, but they may bring biases to the table. Operational centers such as NCEP can provide balance among conflicting interests and focus on actual improvement in weather predictions. However, while operational centers may be unbiased they often have difficulties in finding sufficient motivation.

[60] From the experience of the OSSEs performed during recent decades, we realize that using the same NR is essential in conducting OSSEs to deliver reliable results in a timely manner. The simulation of observations requires access to the complete model level data and considerable resources, and it is important that the simulated data from many institutes be shared among all the OSSEs. By sharing the NR and simulated data, many OSSE projects will be able to produce results that can be compared. This will enhance the credibility of the results. On the basis of this principle, a group of international partners formed the “Joint OSSEs” [Masutani et al., 2007].

[61] The experience of OSSEs at NCEP also demonstrated that they often produce unexpected results. Theoretical prediction of the data impact and theoretical backup of the OSSE results are very important. On the other hand, unanticipated OSSE results will stimulate further theoretical investigation. When all efforts come together, OSSEs will help with timely and reliable recommendations for future observing systems. At the same time, OSSEs will prepare the operational DAS to promote the prompt and effective use of the new data when they become a reality.


[62] Throughout this project, many NOAA/NWS/NCEP, NASA/GSFC, NOAA/NESDIS, and ECMWF staffs contributed to this project. The forward calculation of LOS lidar data for the NCEP DAS was prepared by J. Derber. We would like to acknowledge contribution from W. Yang, Y. Song, Z. Toth, W. Baker of NCEP, R. Atlas, G. Brin, S. Bloom of NASA/GSFC, and P. Li, W. Wolf, J. Yoe, M. Goldberg of NOAA/NESDIS. We are grateful for support from Anthony Hollingsworth, Roger Saunder and ECMWF data support section for the T213 NR. We received constructive advice from members of the OSSE review panel. Members of Joint OSSEs team provided many valuable comments.