Simple Doppler Wind Lidar adaptive observation experiments with 3D-Var and an ensemble Kalman filter in a global primitive equations model



[1] Through simple Observing System Simulation Experiments, we compare several adaptive observation strategies designed to subsample Doppler Wind Lidar (DWL) observations along satellite tracks, and examine the effectiveness of two data assimilation schemes, 3D-Var and the Local Ensemble Transform Kalman Filter (LETKF). With respect to sampling strategies, our results show that the LETKF-based ensemble spread method is superior to the other strategies tested, namely, use of a uniform distribution, the climatological spread strategy, or use of a random distribution, and is close to the ideal result obtained assuming that the true forecast error is known. With 10% DWL observations from the ensemble spread strategy, both 3D-Var and LETKF attain about 90% of the impact that 100% DWL wind profile coverage would provide. However, when the adaptive DWL observations coverage is reduced to 2%, 3D-Var becomes less effective than the LETKF assimilation scheme.

1. Introduction

[2] Within the next few years, the first Doppler Wind Lidar (DWL) will be deployed in space by the European Space Agency (ESA, see, In addition, in its recent Decadal Survey Report, the National Research Council recommended a US global winds mission in the coming decade. Because the operation of DWL is strongly constrained by energy resources [Rishojgaard and Atlas, 2004], a frequently stated qualitative goal is to get about 90% of the total effectiveness from just 10% coverage with adaptive observations. Here, 10% coverage means making measurements in only 10% of the total footprints that the DWL can possibly scan in a certain interval such as six hours. Unlike the applications of adaptive dropsonde observing in field experiments (FASTEX, NORPEX) [Joly et al., 1997; Bergot, 1999; Langland et al., 1999a; Langland et al., 1999b; Pu and Kalnay, 1999; Szunyogh et al., 1999; Majumdar et al., 2002; Toth et al., 2002; Langland, 2005], which attempt to optimize the 2–3 days forecast within a specified verification region (e.g., Europe or North America), the goal in our study is to optimize the six-hour global analysis by optimally distributing the limited DWL observation resources. As pointed out by Lorenz and Emanuel [1998], if a single adaptive observation is made at the location with largest background uncertainty, the global analysis error will be most reduced. The question we address is how to represent the background dynamical uncertainty and choose adaptive observation locations accordingly.

[3] One efficient formulation of Ensemble Kalman Filter (EnKF), a relatively new data assimilation approach that provides an estimate of the background dynamical uncertainty, is the Local Ensemble Transform Kalman Filter (LETKF) [see Hunt et al., 2007, and references therein]. The diagonal value of an EnKF-computed background error covariance matrix for a given variable is the ensemble spread for that variable. Locations with large ensemble spread are those in which dynamical instabilities of the evolving flow result in large background (forecast) errors and therefore where observations can be most useful. The different observation location selection strategies that we compare are (1) one based on the 20-member LETKF ensemble spread, (2) a uniform observation distribution, (3) one based on the climatological background uncertainty, (4) random locations, and (5) an “ideal” strategy based on assumed knowledge of the true forecast error. We compare the impacts of adaptive observations selected with these different methods by assimilating them with two different data assimilation schemes, 3D-Var and LETKF. We test both 10% and 2% adaptive observations coverage, allowing for relatively dense and sparse adaptive observation scenarios. Comparison of these two scenarios will show the sensitivity of data assimilation schemes to the amount of adaptive observations.

2. Model, Observations, and Data Assimilation Schemes

[4] In this study, we use the Simplified Parameterizations, primitivE Equation DYnamics (SPEEDY) model, developed by Molteni [2003] and adapted for data assimilation by Miyoshi [2005]. It has a simplified but complete set of physical processes, seven vertical levels, 96 longitudinal grid points, and 48 latitudinal grid points. We follow a “perfect model” Observing System Simulation Experiments (OSSEs) setup, in which the simulated “truth” (long model integration) is generated with the same atmospheric model as the one used in data assimilation. In such an experimental setup, we avoid the complications of model error, and the only source of forecast errors comes from the initial conditions, but because of the “identical twin” relationship between nature and model, the results may be overoptimistic. Observations are obtained from the “truth” with added Gaussian random perturbations. The observational error standard deviations assumed for wind components (u, v), temperature (T), specific humidity (q) and surface pressure (ps) are 1.0 m/s, 1.0 K, 0.1 g/kg, and 1.0 hPa, respectively. These error levels do not depend on the spatial locations.

[5] To test the sensitivity of the impacts of adaptive observations to data assimilation methods, we use both 3D-Var [Parrish and Derber, 1992; Miyoshi, 2005] and LETKF [Ott et al., 2004; Hunt et al., 2007]. 3D-Var uses a constant background error covariance, which is calculated as by Parrish and Derber [1992]. LETKF, as other EnKF schemes, employs the time evolving error covariance estimated from the forecast ensemble so that it automatically gives an estimation of the forecast uncertainty. The application of LETKF on the SPEEDY model follows Hunt et al. [2007].

3. Adaptive Strategies and the Distribution of Simulated Adaptive DWL Observations

[6] We mimic polar orbiting satellite tracks and DWL observations assuming that the satellite scans half hemisphere “orbits” in each six-hour analysis cycle. The basic observations (u, v, T, q, ps) assimilated in all our experiments are simulated rawinsondes, shown as closed circles in Figure 1 (six-hour “orbits” are shown separated by vertical dashed lines). Figure 1 also shows an example of the distribution of 10% adaptive observations (crosses) from the ensemble spread strategy (defined below) at 1200 UTC. At 0000 UTC, the satellite scans the same half hemisphere orbit as at 1200 UTC, and the other half hemisphere orbit is scanned at 0600 UTC and 1800 UTC. Thus, we assume that each grid point can be observed twice a day (this is too optimistic because we neglect the impact of clouds). Since the characteristics of the forecast uncertainties are different in different regions [Kalnay, 2003], the adaptive DWL observations are distributed into seven subregions, the equatorial region, northern and southern tropics, and northern and southern mid- and high-latitudes (separated by horizontal dashed lines in Figure 1). Each subregion is allotted a number of adaptive observations proportional to its area. At the selected adaptive DWL locations, both zonal wind and meridional wind are observed at all vertical levels, which is also over-optimistic because the lidar wind component that is actually observed is its projection on the line-of-sight direction [Stoffelen et al., 2005], and we do not account for clouds.

Figure 1.

Example of the distribution of adaptive observations (crosses) from the ensemble spread sampling strategy at 1200 UTC February 03. The closed circles represent rawinsonde observation locations. Shades represent the average ensemble spread (m/s) of zonal and meridional wind at 500 hPa at that time. Horizontal dashed lines divide the whole globe into seven latitude bands. Vertical dashed lines separate the globe into four sub-regions representing two “orbits.”

[7] In all of the five adaptive observation strategies we tested, we impose a horizontal separation constraint to minimize possible observation redundancy, namely that the adaptive observations have to be at least two grid points apart in both longitude and latitude directions. Hamill and Snyder [2002] account for observation redundancy by selecting the observations serially in minimizing the analysis error variance. However, directly minimizing the analysis error variance is much more expensive than computing ensemble spread and applying the separation constraint, especially when selecting adaptive observations from a very large pool of observation locations. In the ensemble spread method, the separation constraint is carried out by first ordering the average 6-hour forecast ensemble spread of wind at 500 hPa from largest to smallest in each region. Within each region, the location with largest ensemble spread is selected as the first adaptive observation location. Then, we delete the locations adjacent to (neighboring horizontal grid points) the first adaptive observation location in both zonal and meridional direction from the potential adaptive observation queue. The second adaptive observation location is where ensemble spread in the remaining queue is largest. This process is repeated until all the adaptive observation locations are selected. If all the observations are either selected or deleted before the allotted number of adaptive observations are picked out, the remaining adaptive observations are the locations with largest ensemble spread that were deleted from the queue. A similar separation constraint is applied in all of the other strategies. In the climatological spread method, the climatological background ensemble spread is obtained from a long LETKF analysis assimilating rawinsonde observations only, and the adaptive observations are at the locations with largest climatological ensemble spread. In the ideal strategy, the adaptive observations are located where the background error (i.e., the absolute difference between 6-hour forecasts of 500 hPa wind and the true 500 hPa wind field) is largest. Since this strategy requires knowing the “truth”, it cannot be implemented in practice. The adaptive observation locations from ensemble spread, random location and the ideal strategy change with time, whereas the locations are fixed for the uniform distribution and climatological ensemble spread strategies. In order to test whether the forecast ensemble spread truly represents forecast uncertainty, we use the same adaptive observation locations for both 3D-Var and LETKF in the ensemble spread and climatological ensemble spread strategies, even though they are both derived from LETKF assimilations.

[8] We examine the effectiveness of these five adaptive observation strategies by computing the analysis Root Mean Square (RMS) errors and comparing them to extremes of both 0% DWL coverage (i.e., rawinsondes only), and full (100%) DWL coverage. The percentage improvement for each strategy is defined as PI = equation image × 100%, where RMS is the time mean global average RMS errors of the adaptive strategy, RMS100% and RMS0% are the time mean global average RMS errors of full DWL coverage and no DWL coverage, respectively. The time mean is calculated over the second month analysis cycle of the two-month analysis period.

4. Results

[9] Figure 2 shows the time evolution of the global averaged zonal wind analysis RMS errors for 3D-Var (Figure 2, left) and LETKF (Figure 2, right) with 0% coverage (dashed line) and 100% coverage (solid line), as well as the five adaptive strategies using 10% coverage. The time averaged RMS error for the second month is presented in Table 1. Not surprisingly, the ideal strategy (dot dashed line) has the smallest errors, and is close to the error level obtained with 100% coverage. The LETKF-based ensemble spread strategy (solid line with open squares) is the best of the adaptive strategies that are feasible in practice, and is very close to the ideal strategy even for the 3D-Var analysis. The random location (solid line with crosses) is better than the uniform distribution strategy (solid line with closed circles). The worst results are obtained from the climatological ensemble spread distribution (solid line with open triangles) because there are no adaptive observations over vast areas (not shown). The adaptive strategies with time-changing locations (ensemble spread, random location, ideal strategy) are all better than the constant observation distributions (uniform distribution, climatological ensemble spread), a conclusion consistent with previous results [Lorenz and Emanuel, 1998; Hamill and Snyder, 2002]. Through the covariance between winds and the other variables in background error covariance, the wind observations improve the analysis of the other variables such as temperature, for which the different adaptive observation strategies have the same ranking as for the wind analysis (not shown).

Figure 2.

2-month evolution of 500 hPa globally averaged zonal wind analysis RMS errors for (left) 3D-Var and (right) LETKF from 10% adaptive observations assimilation. From top to bottom: dashed line, rawinsonde observation (0% DWL) assimilation; solid line with open triangles, climatological spread; solid line with closed circles, uniform distribution; solid line with crosses, random locations; solid line with open squares, ensemble spread adaptive strategy; dot dashed line, ideal sampling; solid line, 100% adaptive observation coverage over half hemisphere.

Table 1. 500 hPa Zonal Wind Time Average Over February of Global Mean RMS Errors and Percentage Improvement Obtained With 10% Adaptive Observations for Both 3D-Var and LETKF
Data AssimilationExperimentRawinsonde, 0%Climatology, 10%Uniform, 10%Random, 10%Spread, 10%Ideal, 10%100%
3D-VarRMS error (m/s)4.042.360.920.740.430.360.30
LETKFRMS error (m/s)1.180.380.360.330.320.290.23

[10] A striking result is that the RMS errors of LETKF (Figure 2, right, and Table 1) shows a much smaller difference among the adaptive strategies than that of 3D-Var, although their relative ranking is the same. This is because 3D-Var, with a constant background error covariance, is much more sensitive to the choice of observation location. With less optimal adaptive strategies, such as uniform distribution, the large background errors are not effectively reduced due to lack of observations around some locations with large background error (Figure 3, left). On the other hand, with the ensemble spread strategy, the adaptive observations are near the locations with large background errors (Figure 3, right). Therefore, the assimilation of these adaptive observations is equivalent to providing the information of the time-changing large background errors to 3D-Var. As a result, the analysis increments in 3D-Var have a shape more similar (but with opposite sign) to the background error (Figure 3, right) than in any other feasible method. By contrast, LETKF, whose background error covariance already includes information on the “errors of the day”, is more efficient in extracting information from the observations even if their locations are not optimal, so that all the strategies give similarly small analysis errors.

Figure 3.

3D-Var zonal wind analysis increments (contour and color bar interval 0.3 m/s, same as the color bar, solid lines for positive value, dashed lines for negative value), background error (shaded) and adaptive observation distribution (crosses) from (left) uniform distribution and (right) ensemble spread sampling strategy at 1200 UTC February 03. The closed circles are rawinsonde observation locations.

[11] It is clear from Figure 2 (left) and Table 1 that in this idealized experiment, 3D-Var attains more than 90% of the improvements between 0% and 100% coverage from just 10% adaptive observations determined with the ensemble spread strategy. The percentage improvement of ensemble spread strategy in LETKF is somewhat smaller than for 3D-Var, and, as discussed above, all adaptive strategies are similarly successful (Table 1). This seems to contradict the conclusions based on the previous adaptive observation field experiments that adaptive observations would be more effective with more advanced data assimilation schemes, such as 4D-Var or EnKF [Langland, 2005]. However, we used relatively dense adaptive observation coverage in our experiments with 10% observed every six hours over half the globe. To make our results more compatible with previous field experiments, we now use the same adaptive observation strategies but substantially reduce the number of observation locations to only 2% of the full coverage (Table 2). With this small number of adaptive observations, the analysis errors of the adaptive strategies in 3D-Var are much larger, and even the most effective strategies, random location and ensemble spread, are only able to reduce the errors by less than 30%. By contrast, the LETKF still obtains 77% improvements from just 2% adaptive observations. The difference in performance among the five adaptive observation strategies is much more evident, but with the same ranking as before. This result shows that with fewer adaptive observations, the data assimilation scheme plays a more important role in determining the effectiveness of adaptive observations. More advanced data assimilation schemes, such as the LETKF, use more efficiently of small amounts of observation information, which is consistent with previous field experiments [Langland, 2005]. The small number of observations is not enough to provide enough global information on the “errors of the day” needed for the improvement of 3D-Var, while in the LETKF, it is possible to estimate the evolving error structures even with few observations.

Table 2. 500 hPa Zonal Wind Time Average Over February of Global Mean RMS Errors and Percentage Improvement Obtained With 2% Adaptive Observations for Both 3D-Var and LETKF
Data AssimilationExperimentRawinsonde, 0%Climatology, 2%Uniform, 2%Random, 2%Spread, 2%Ideal, 2%100%
3D-VarRMS error (m/s)
LETKFRMS error (m/s)1.180.670.590.510.450.410.23

5. Conclusions and Discussion

[12] In this study we showed the potential of a simple ensemble spread strategy for adaptive observations in the context of minimizing the energy required by DWL laser firings. The same adaptive strategy could be used for any satellite instrument designed to “dwell” in regions of high uncertainty rather than providing uniform coverage along the orbit as conventionally done.

[13] We compared ensemble spread with several other adaptive observation strategies (uniform distribution, random distribution, climatological ensemble spread) and found that the six-hour LETKF forecast ensemble spread gives a useful estimate of background uncertainty and dynamical instabilities. With 10% adaptive DWL observations, the ensemble spread sampling strategy gives the best result in both 3D-Var and LETKF, attaining more than 90% effectiveness of the full observation coverage. 3D-Var is more sensitive to adaptive strategies than the LETKF. Since the latter includes information on the “errors of the day”, different adaptive strategies have closer performances.

[14] We found that the sensitivity of adaptive observation effectiveness to data assimilation schemes is related to the amount of adaptive observations to be determined. With a relatively dense number of adaptive wind observations, such as 10% of the maximum coverage, the relative impact of these observations is similar for 3D-Var and LETKF. With only 2% coverage, the effectiveness of the adaptive observations strongly depends on the data assimilation scheme. 3D-Var is less effective than LETKF even when using the LETKF ensemble spread locations.

[15] Although our results are indicative of the potential for adaptive observations in remote sensing, we made several simplifying assumptions, using a perfect model scenario, a low resolution global model, an extreme simplification of satellite orbits and DWL observations, assuming uncorrelated Gaussian observational errors, and neglecting the effect of clouds. As a result, the actual percentage improvements from assimilating DWL adaptive observations may be overoptimistic. Experiments with state-of-the-art OSSE systems should be carried out to verify whether our results are valid in a more realistic setup. We believe that the main results, which states that the EnKF-based uncertainty estimation gives valuable guidance to allocate limited observation resources along the satellite track, and that the effectiveness of data assimilation schemes is sensitive to the amount of adaptive observations, would be valid even in a realistic experimental setup.


[16] We are very grateful to Wayman Baker, David Emmitt and Bob Atlas for their encouragement and suggestions and to our colleagues from the Weather and Chaos group at the University of Maryland, especially Ed Ott, Istvan Szunyogh, and Brian Hunt, for many discussions. This work was supported by a NOAA/NASA/NPOESS grant through SWA04N0024403C.