SEARCH

SEARCH BY CITATION

Keywords:

  • observations;
  • data assimilation;
  • ensemble forecasts;
  • observation sampling error

Abstract

  1. Top of page
  2. Abstract
  3. 1.  Introduction
  4. 2.  Statistical description of the atmosphere and the definition of ‘truth’
  5. 3.  Statistical description of error referenced to the model grid
  6. 4.  Statistical description of error referenced to the observation coordinate
  7. 5.  NWP model representation
  8. 6.  Maximum likelihood data assimilation for error referenced to the model grid coordinates
  9. 7.  Data assimilation for error referenced to the observation coordinates
  10. 8.  Implementation of advanced data-assimilation algorithms
  11. 9.  Simple example calculations
  12. 10.  Operational issues
  13. 11.  Summary and discussion
  14. Acknowledgements
  15. References

A consistent definition of ‘truth’ is presented to define the errors in a numerical weather prediction (NWP) forecast, analysis and observations resulting from the unresolved turbulent field. ‘Truth’ is defined as the convolution of the continuous atmospheric variables by the effective spatial filter of an NWP model. Direct measurements of atmospheric variables are represented as an instrument error and a convolution of the continuous atmospheric variables by the observation sampling function. This clearly separates the instrument error from the observation sampling error that describes the mismatch between the NWP model effective spatial filter and the observation sampling function. The ensemble average that defines error statistics is defined by an infinite number of atmospheric realizations with statistically similar random fluctuations in the unresolved model field. This results in large spatial variations in the observation sampling errors due to the atmospheric variations in turbulence statistics. Two approaches are discussed to describe these spatial variations: one that defines observation error referenced to each model coordinate and one that assigns observation error referenced to each observation coordinate. The observation-error statistics depend on the observation sampling function, the local spatial statistics of the turbulence field and the NWP model filter. The effects of imprecise knowledge of the shape of the model filter on observation sampling error are small for rawinsonde measurements and for observations that produce a linear average along a track. The modifications to data-assimilation algorithms (the maximum-likelihood (ML) method, minimum mean-square-error algorithms, Kalman filtering, variational data assimilation and ensemble data assimilation) to include the spatial variations in observation-error statistics are discussed. In addition, the generation of ensemble forecast members should be consistent with the spatial variations in total observation error. A rigorous definition of error statistics is essential for evaluating the many different types of current and future observing systems. Copyright © 2010 Royal Meteorological Society


1.  Introduction

  1. Top of page
  2. Abstract
  3. 1.  Introduction
  4. 2.  Statistical description of the atmosphere and the definition of ‘truth’
  5. 3.  Statistical description of error referenced to the model grid
  6. 4.  Statistical description of error referenced to the observation coordinate
  7. 5.  NWP model representation
  8. 6.  Maximum likelihood data assimilation for error referenced to the model grid coordinates
  9. 7.  Data assimilation for error referenced to the observation coordinates
  10. 8.  Implementation of advanced data-assimilation algorithms
  11. 9.  Simple example calculations
  12. 10.  Operational issues
  13. 11.  Summary and discussion
  14. Acknowledgements
  15. References

The goal of numerical weather prediction (NWP) is to represent the future state of the continuous atmosphere using a discrete representation of the equations of motion. Early work assumed that the appropriate discrete representation or ‘truth’ denoted by xt is an average of the continuous state variables over each grid box and over the time step (Lilly, 1962; Deardorff, 1970; Cohn, 1997; Pielke, 2002). Various types of discrete grids and approximations to the fluid equations have been developed as well as subgrid parametrizations to include the contribution of the unresolved scales (Pielke, 2002; Kalnay, 2003). The concept of effective model resolution was investigated by Pielke (1991, 2001, 2002) in terms of the effects of the numerical schemes on wave solutions. Laprise (1992) discussed the effective resolution of global spectral models and effective resolution has been evaluated based on spatial spectra and spatial structure functions of model output (Skamarock, 2004; Frehlich and Sharman, 2004, 2008). A quantitative description of the spatial filtering of the continuous atmospheric field produced by the discrete model is required for a rigorous definition of error statistics. Recent evaluation of model spatial statistics reveals that NWP models have an effective spatial filter that is larger than a grid box average (Skamarock, 2004; Frehlich and Sharman, 2004, 2008). Therefore, error should be defined in terms of ‘truth’, i.e. a spatial average of the continuous state variables based on the effective model filter (Frehlich, 2006). In addition, we assume that the model numerics and all sources of spatial filtering are universal, i.e. the shape of the filter is independent of the state of the atmosphere.

Data assimilation techniques estimate the true state of the atmosphere based on various observations with different spatial and temporal sampling and different instrumental observation errors (Lorenc, 1986; Daley, 1991, 1997; Cohn, 1997; Kalnay, 2003). Rawinsonde measurements have low instrument error (Benjamin etal., 1999; Jaatinen and Elms, 2000) but a large observation sampling error, since the point observations do not represent the true spatial average of any NWP model (Frehlich, 2001). Ground-based scanning Doppler lidar can be processed to provide a better match to the effective spatial filter of the NWP model as well as providing better information on the unresolved scales (Frehlich et al., 2006; Frehlich and Kelley, 2008) which are essential for short-term forecasts of wind power. Space-based Doppler lidar data will produce a larger spatial average than rawinsondes and therefore will have superior error statistics if the instrument error is sufficiently small (Frehlich, 2000, 2001). The improved spatial sampling of airborne Doppler lidar data provides significant impact on NWP forecasts (Weissmann and Cardinali, 2007, Koch et al., 2007) because of improved global coverage and lower observation sampling error than rawinsondes. New GPS profiling techniques have even larger sampling volumes but can provide accurate temperature and humidity observations (Kursinski et al., 1997). Radar profilers also sample a larger region of the atmosphere and therefore have a lower observation sampling error than rawinsondes. An improved estimation of the total observation-error statistics would also enhance the value of these data (Benjamin et al., 2004). New satellite-based wind measurements provide global coverage but with a larger spatial sampling volume (Velden et al., 2005). The situation with indirect atmospheric measurements such as space-based irradiance data is more complicated, since radiative transfer code and inversion algorithms are used to extract the atmospheric state variables and the underlying spatial average is difficult to quantify and include in the error statistics. Many data-assimilation algorithms are developed based on minimizing analysis error from all of these diverse measurement systems (Daley, 1991; Kalnay, 2003). Ensemble data assimilation and forecasting systems have become popular techniques, since they produce an estimate of the state-dependent forecast error statistics (Evensen, 1994; van Leeuwen and Evensen, 1996; Houtekamer and Mitchell, 1998, 2001; Burgers et al., 1998; Hamill and Snyder, 2000; Mitchell and Houtekamer, 2000; Anderson, 2001; Bishop et al., 2001; Mitchell et al., 2002; Zhang and Anderson, 2003; Lorenc, 2003; Kalnay, 2003; Zupanski, 2005; Ehrendorfer, 2007). However, current data assimilation systems typically assume that the observation errors are uncorrelated (Rabier, 2005) with constant variance over large regions, especially for direct observations of state variables (rawinsonde, aircraft, Doppler radar, Doppler lidar). The magnitude of the observation errors is determined from a long-term average of the spatial forecast-error statistics (Hollingsworth and Lonnberg, 1986; Lonnberg and Hollingsworth, 1986; Daley, 1992; Dee, 1995; Dee and Da Silva, 1999; Dee et al., 1999) and therefore the data assimilation is suboptimal since it does not include the large spatial and temporal variations of the observation errors (see figure 13 of Frehlich and Sharman, 2004). NWP forecast performance is determined by various error statistics and critical events such as hurricane tracks and severe storms. A consistent definition of analysis error, observation error and forecast error is required for optimal data assimilation that includes the spatial variations in observation errors with minimal assumptions, which therefore minimizes the analysis error. In addition, this provides the foundation for a correct interpretation of all error statistics.

A rigorous evaluation of error statistics requires three inputs: a description of the ensemble members of the process, a prescription for a subset of events and a mapping (a measure) that defines the probability of these events (Kolmogorov, 1933; Rao, 1995). These concepts have been applied to many problems such as turbulence (Lumley, 1970; Monin and Yaglom, 1975a, 1975b), engineering applications (Papoulis, 1965), statistical optics (Goodman, 1985, section 3), geostatistics (Chiles and Delfiner, 1999) and others. The careful evaluation of locally homogeneous, isotropic and stationary turbulence by Monin and Yaglom (1975b) is most closely related to the spatial and temporal variations of error statistics in NWP. Note that ‘ensemble members’ are realizations of a continuous atmosphere with statistically similar properties and not NWP ensemble forecast members.

The ensemble members for turbulent flow over a cylinder in a wind tunnel (Monin and Yaglom, 1975a, section 3.2) are defined by ‘the statistical ensemble of similar flows created by some set of fixed external conditions’. The probability of an event is the fraction of the ensembles that define the given event (Lumley, 1970, chapter 1; Goodman, 1985, section 3) which leads into the definition of the probability density function (PDF) and joint PDFs. Various statistical averages are then defined as an integral operator over the PDFs. The main difference between statistical fluid mechanics and data assimilation is the definition of ‘truth’ for the NWP model values that are required for a description of error statistics.

Frehlich (2006) extended the definition of error statistics to the NWP data assimilation and forecasting problem by defining ‘truth’ as the convolution of the continuous atmospheric state variables by the spatial filter of the NWP model at each grid coordinate. This produces a description of total observation error in terms of the instrument error and the observation sampling error (related to the ‘error of representativeness’ Lorenc, 1986; Daley, 1993; Cohn, 1997), which describes the error produced by the difference between the observation sampling pattern and the spatial average of the model that defines ‘truth’. The total observation-error statistics are state-dependent since they depend on the local turbulence statistics. Optimal data-assimilation algorithms have been produced to include the spatial variations of the total observation error based on the maximum-likelihood (ML) method and using modifications to the classical mean-square error techniques such as the Kalman filter (Frehlich, 2006). To include the spatial variations in the total observation error, two different definitions of error were proposed: one defined for each model coordinate ra and one based on the interpolation of the model coordinates to each observation coordinate rk. These two definitions of error have different ensemble members that define the joint conditional probability density functions, i.e. conditioned by the local turbulence statistics and consistent with the definition of truth. The impact of imprecise knowledge of the true shape of the effective model filter will be determined for these optimal data-assimilation algorithms (Frehlich, 2006). Simple formulations will be used to introduce these concepts. Only direct observations of atmospheric variables (rawinsonde, aircraft, Doppler lidar, radar profilers, etc.) are considered. Indirect observations such as satellite irradiance data can also be used with the same formulation if an inversion algorithm produces observations of the atmospheric state variables with a known spatial average. The notation of Ide et al. (1997) is used whenever possible.

2.  Statistical description of the atmosphere and the definition of ‘truth’

  1. Top of page
  2. Abstract
  3. 1.  Introduction
  4. 2.  Statistical description of the atmosphere and the definition of ‘truth’
  5. 3.  Statistical description of error referenced to the model grid
  6. 4.  Statistical description of error referenced to the observation coordinate
  7. 5.  NWP model representation
  8. 6.  Maximum likelihood data assimilation for error referenced to the model grid coordinates
  9. 7.  Data assimilation for error referenced to the observation coordinates
  10. 8.  Implementation of advanced data-assimilation algorithms
  11. 9.  Simple example calculations
  12. 10.  Operational issues
  13. 11.  Summary and discussion
  14. Acknowledgements
  15. References

To simplify the presentation, we first consider the assimilation of direct observations yo of the continuous atmospheric state variables x for a fixed instant in time t and at spatial coordinate r = (r1,r2,r3) where r1, r2 and r3 denote the east, north and vertical coordinates, respectively, to produce the optimal analysis xa. Since the model time steps are typically much smaller than the time-scale of the atmospheric processes, the temporal filtering by the model numerics is ignored in this work. ‘Truth’ for the discrete representation equation image of the model state variable xj is defined as the convolution of the continuous atmospheric variable xj(r) by the spatial filter equation image of the NWP model, i.e.

  • equation image(1)

and equation image, where ds = ds1ds2ds3 denotes three-dimensional integration. In many cases, the vertical dimensions Δr3 of the NWP model grid are much smaller than the horizontal dimensions (Δr1r2) and the problem reduces to a two-dimensional analysis, i.e. ds = ds1ds2 (see the discussion in Frehlich, 2006).

Conditional statistics are required to describe correctly the spatial variations of the total observation error produced by the variations in the atmospheric turbulence statistics (see figure 5 of Nastrom and Gage (1985), figure 13 of Frehlich and Sharman (2004) and Figure 3). The calculation of total observation error requires the conditional spatial structure function (or equivalently the conditional spatial covariance function) of variable xi and xj for a fixed altitude (two-dimensional analysis) defined by (Monin and Yaglom, 1975b, p. 102)

  • equation image(2)

where the random perturbations are given by

  • equation image(3)

<>c denotes the conditional ensemble average and Θo(ra) denotes the local turbulence parameters evaluated at the coordinate ra (Frehlich, 2006). Similarly, the conditional spatial covariance function is defined as (Monin and Yaglom, 1975b, p. 47)

  • equation image(4)

A critical issue for spatially varying statistics is the meaning of <>c. For locally homogeneous random fields, the conditional mean values < xi(ra) > c are assumed to be independent of ra and the conditional covariance function and conditional structure function are only a function of the separation vector r. Then

  • equation image(5)

is only a function of r. If the two state variables are the same (xi = xj), then (Monin and Yaglom, 1975b, p. 103)

  • equation image(6)

which is the more familiar form of the conditional structure function. However, a definition of the conditional ensemble members is required to include the spatial variations of observation error produced by the variations in the turbulence field correctly.

The ensemble average structure function (the climatological average over many years of stationary statistics) is

  • equation image(7)

where <> denotes an ensemble average over all realizations of the atmosphere. The ensemble average structure function can also be a function of location (latitude, longitude and altitude), in which case the ensemble average is computed as a time average and stationarity is assumed over a long time period such as many years and the effects of climate change are negligible. An empirical model for the average structure function of the longitudinal velocity component DLL(r) (the east component u of the horizontal velocity as a function of the separation r in the east direction) from 40°–50°N latitude over the Continental US (CONUS) determined from aircraft data is (Frehlich and Sharman, 2010)

  • equation image(8)

where r is in m, a1 = 0.0037049m4/3s−2, a2 = 109089 m, a3 = 1.7680 and a4 = 1312407 m.

The dimensions of the NWP spatial filter equation image have been determined from comparisons of the spatial structure function of model output and aircraft data (Frehlich and Sharman, 2004, 2008), assuming a square spatial filter of dimension L, i.e.

  • equation image(9)

An example is shown in Figure 1 for the Global Forecast System (GFS) model at 250 hPa pressure altitude and a latitude band of 40°–50°N over CONUS, which provides the best match to the high-density ACARS aircraft data over the same domain (note that past results (Lindborg, 1999) are based on unknown averaging domains which have similar scaling laws). There is excellent agreement between the predictions of the effective square filter with L = 150 km and the GFS average structure function, even though the GFS grid is not exactly square. However, it is difficult to determine whether the GFS spatial filter is universal, since more data are required to produce reliable conditional structure functions. As will be shown later, it is also difficult to determine the exact shape of the model spatial filter equation image from the structure functions.

thumbnail image

Figure 1. Average longitudinal structure function DLL for the u velocity component in the east–west direction from the GFS model (bullet), ACARS aircraft data (dotted) and the theoretical prediction for a square effective model filter with L = 150 km (line) for a pressure level of 250 hPa and latitude 40°–50°N over CONUS. The r2 scaling at small lags is also shown.

Download figure to PowerPoint

3.  Statistical description of error referenced to the model grid

  1. Top of page
  2. Abstract
  3. 1.  Introduction
  4. 2.  Statistical description of the atmosphere and the definition of ‘truth’
  5. 3.  Statistical description of error referenced to the model grid
  6. 4.  Statistical description of error referenced to the observation coordinate
  7. 5.  NWP model representation
  8. 6.  Maximum likelihood data assimilation for error referenced to the model grid coordinates
  9. 7.  Data assimilation for error referenced to the observation coordinates
  10. 8.  Implementation of advanced data-assimilation algorithms
  11. 9.  Simple example calculations
  12. 10.  Operational issues
  13. 11.  Summary and discussion
  14. Acknowledgements
  15. References

There are several possible methodologies for defining the observation-error statistics such that the spatial and temporal variations are adequately described. The two most appealing approaches define the observation errors referenced to the numerical model grid or the location of the observation (Frehlich, 2006). A rigorous definition of the error statistics and ensemble average permits a consistent foundation for error analysis and the development of optimal data-assimilation algorithms. However, simplifying approximations must be made to meet operational requirements. The most basic approximation is the assumption of locally homogeneous and stationary turbulence, i.e. the relevant atmospheric statistics change slowly in a small space–time volume around each model grid coordinate and around each observation coordinate (Monin and Yaglom, 1975b, section 21.2). This section will develop the error statistics for observation error referenced to the model grid coordinate.

The selection of the ensemble members Ω for defining the probabilities for error statistics is an abstract concept (Kolmogorov, 1933; Rao, 1995) which can be simplified by assuming infinite measurement resources and a hypothetical infinite number of similar earth systems or experiments (Monin and Yaglom, 1975a; Frehlich, 2006). This concept is consistent with recent interpretations of probability (Jaynes 2003), however, de Finetti (2008) has questioned the concept of infinite realizations and frequentism. With infinite measurement resources, the true state of the continuous atmospheric state variable xj(r) and ‘truth’ for the discrete model representation equation image defined by Eq. (1) can be determined for each time t and therefore the truth can be known. For a given NWP model with universal spatial filter equation image for all state variables xj, the ensemble members Ω are chosen as those realizations of an infinite number of earth systems or experiments that satisfy the following properties over the space and time domain of interest:

  • the surface conditions (vegetation, sea surface, land usage, etc.) and external forcing (solar radiation, forest fires, man-made heat, etc.) are identical;

  • the discrete model variables equation image (truth) are identical;

  • the conditional turbulence statistics Θo(r) are identical.

Following the terminology of calculus, the term ‘identical’ means that the values are equal to within an infinitesimally small interval δ. The probability P of an event F is defined as the fraction of the infinite number of realizations Ω satisfying event F. For example, the conditional probability distribution function equation image, the probability that a temperature measurement T(r) from a perfect rawinsonde observation at coordinate r is less than t, is defined as the fraction of the realizations of Ω that have T(r) < t. The conditional probability density function equation image is the derivative of FT(t,r) with respect to t and defines all statistical moments of T(r) at the coordinate r. Similarly, the conditional joint probability distribution function equation image is defined as the fraction of realizations of Ω that have T(r1) < t1 and T(r2) < t2. The conditional joint probability density function equation image is the derivative of FTT(t1,t2;r1,r2) with respect to t1 and t2 and defines the correlation statistics of T(r1) and T(r2). For any two variables T and V, the conditional joint probability distribution function equation image is defined as the fraction of realizations of Ω that have T(r1) < t and V(r2) < v. Similarly, the conditional joint probability density function equation image is the derivative of FTV(t,v;r1,r2) with respect to t and v.

The conditional joint PDF defines the conditional expectation operator, i.e.

  • equation image(10)

which is a linear operator that depends on the local turbulence parameters Θo. The most common joint PDF is a joint Gaussian PDF equation image given by

  • equation image(11)

where (for consistency with Monin and Yaglom 1975b, pg. 117)

  • equation image(12)

is the covariance of T and V centred on the analysis coordinate ra, equation image, equation image,

  • equation image(13)

is the correlation coefficient and we have assumed the turbulence statistics are homogeneous around ra, i.e. equation image is only a function of r for each local coordinate ra.

For the joint Gaussian PDF Eq. (11), the conditional ensemble average of f(T,V) = TV is

  • equation image(14)

which is required for calculating all the conditional error statistics.

To demonstrate these principles, examples of one-dimensional realizations of a continuous velocity component u(r1) are shown in Figure 2. These realizations are produced with a computer simulation algorithm (Frehlich, 1997, Appendix C) that generates a Gaussian random process with the spatial correlation given by the average structure function of Eq. (8) (see Figure 1) and therefore completely defines the conditional turbulence statistics. Model truth ut(rk) is defined as the linear average of the continuous velocity over the length L for a grid cell centred at rk. All the continuous realizations u(r1) in Figure 2 have model truth ut(rk) within δ = 0.01ms−1 of the values indicated by the horizontal red lines that are potential examples of truth, e.g. based on a very accurate measurement of the velocity u(r1) along a line using a research aircraft. Note that this is only one possible realization to illustrate the variations of the turbulent field around potential values of ‘truth’. The random variations of u(r1) describe the unresolved scales or subgrid processes that govern the observation sampling error (Frehlich, 2001, 2006) (also related to the ‘error of representativeness’). For the average structure function of Figure 1, the observation sampling error for a rawinsonde measurement at the centre of the grid cell is represented by the random scatter of the realizations at the centre of each red line. Note that the larger effective resolution of L = 150 km has a larger observation sampling error than the case of L = 10 km.

thumbnail image

Figure 2. One-dimensional realizations of the u velocity component with model truth indicated by red lines for the atmospheric conditions of Figure 1. For the L = 10 km case, truth is chosen as 8, 10 and 12 ms−1 and for the L = 150 km case truth is 15, 20 and 25 ms−1.

Download figure to PowerPoint

Direct observation equation image of the state variable xj (e.g. temperature, velocity) can be written as (Cohn, 1997; Frehlich, 2001)

  • equation image(15)

where equation image is the spatial sampling of the observation with centroid equation image, equation image is the random instrument error and equation image is the instrument bias following the convention of Ide et al. (1997). The bias is assumed zero for the remainder of this work, since most operational measurements of winds and temperature now have small bias.

For most observations (rawinsonde, aircraft, lidar, weather radar, sodar) the spatial sampling of the observation can be written as

  • equation image(16)

where equation image is a normalized spatial filter of the observation (equation image) and ds denotes either one-, two- or three-dimensional integration. The total instrument error is then defined by

  • equation image(17)

and the statistics of the instrument error may depend on the atmospheric conditions, especially for Doppler radar (Doviak and Zrnic, 1993) and Doppler lidar (Frehlich, 2000, 2001).

A numerically convenient definition of total observation error for data assimilation is given by (Frehlich, 2006)

  • equation image(18)

which defines observation error referenced to the nearby analysis coordinate ra and defines the spatial variations in error statistics based on locally homogeneous turbulence centred on each analysis coordinate (Frehlich, 2006). The more traditional definition of error based on interpolation of model values to the observation coordinate is considered in the next section.

Since the turbulent field is state-dependent, i.e. the turbulent statistics vary both in space and time, the total observation errors are also state-dependent and a conditional ensemble average is essential for rigorous evaluation of error statistics and for developing optimal data-assimilation algorithms (Frehlich, 2006). If the turbulent field is approximately homogeneous for all observations equation image in the nearby vicinity of the analysis coordinate ra, the elements of the conditional observation-error covariance matrix equation image are defined by

  • equation image(19)

where equation image denotes the desired measurement or truth for model variable xi at the analysis coordinate ra, yo and xt denote vectors and T denotes the vector or matrix transpose.

Equation (19) can be written as (substituting equation image and rearranging terms)

  • equation image(20)

where the observation sampling error is

  • equation image(21)

which depends on the distance to the analysis coordinate equation image. If the instrument error and observation sampling error are uncorrelated, the conditional observation-error covariance matrix equation image, where equation image is the conditional instrument error covariance, which may depend on the local turbulence parameters Θo, and equation image is the conditional observation sampling-error covariance matrix with elements

  • equation image(22)

The conditional ensemble average <>c is defined as the ensemble average based on the ensemble members and the corresponding conditional probability density function defined earlier in this section. For example, using the conditional operator Eq. (14), the term

  • equation image(23)

where

  • equation image(24)

Applying the conditional operator Eq. (14) to each of the terms of Eq. (22) produces

  • equation image(25)

where

  • equation image(26)
  • equation image(27)
  • equation image(28)

and

  • equation image(29)

is the contribution from any variations in the conditional mean value. If equation image is negligible, i.e. if the local mean value is constant, then Eq. (25) reduces to eq. (36) of Frehlich (2006).

For locally homogeneous turbulence fluctuations (Eqs (5) and (12) are only a function of r), the observation sampling-error covariance for observations in the vicinity of the analysis coordinate ra is given by

  • equation image(30)

where

  • equation image(31)
  • equation image(32)
  • equation image(33)

If the term equation image is negligible, then Eq. (30) reduces to Eq. (43) of Frehlich (2006). Eq. (30) is better suited for the troposphere and stratosphere, which has well-defined scalings for the structure functions, e.g. Eq. (8).

These results are valid for general observation sampling patterns equation image and any effective spatial filter equation image. Simple results are produced for rawinsonde observations at the coordinates equation image near the analysis coordinate ra [equation image, where δ(r) is the two-dimensional delta function and equation image is a square model spatial filter Eq. (9)]. Then

  • equation image(34)
  • equation image(35)
  • equation image(36)

where dr and ds denote two-dimensional integration and the conditional parameters Θo(ra) are defined by the local turbulence statistics centred on the analysis coordinate ra.

4.  Statistical description of error referenced to the observation coordinate

  1. Top of page
  2. Abstract
  3. 1.  Introduction
  4. 2.  Statistical description of the atmosphere and the definition of ‘truth’
  5. 3.  Statistical description of error referenced to the model grid
  6. 4.  Statistical description of error referenced to the observation coordinate
  7. 5.  NWP model representation
  8. 6.  Maximum likelihood data assimilation for error referenced to the model grid coordinates
  9. 7.  Data assimilation for error referenced to the observation coordinates
  10. 8.  Implementation of advanced data-assimilation algorithms
  11. 9.  Simple example calculations
  12. 10.  Operational issues
  13. 11.  Summary and discussion
  14. Acknowledgements
  15. References

The traditional interpretation of observation error assigns fixed observation-error statistics to each measurement. There are two common definitions of total observation error:

  • equation image(37)

where H is the linear operator that interpolates the nearby values of truth equation image to the observation coordinate equation image (Daley, 1993, Eq. (12); Dee, 1995, Eq. (19); Cohn, 1997, Eq. (2.13); Kalnay, 2003, Eq. (5.4.16)) and

  • equation image(38)

where equation image is the definition of truth Eq. (1) evaluated at the observation coordinate equation image (Lorenc, 1986; Daley, 1991, sec. 5.6). The second definition Eq. (38) clearly separates the observation error from the error in the interpolation operator H. For this definition of error, the atmospheric ensembles are defined as in the previous section with the following additional requirements:

  • the discrete model variables equation image are identical and

  • the conditional turbulence statistics equation image are identical.

The elements of the conditional observation sampling-error covariance for the case of Eq. (38) are given by Eq. (22) with modifications to include the spatial variations in the turbulence parameters equation image at each observation coordinate equation image, i.e.

  • equation image(39)

where the conditional ensemble average <>c is based on the ensemble members defined above. Substituting Eqs (15) and (1) into Eq. (39) and simplifying by assuming equation image produces

  • equation image(40)

where

  • equation image(41)
  • equation image(42)
  • equation image(43)
  • equation image(44)

and

  • equation image(45)

is the structure function that describes the turbulent field with parameters equation image and equation image. Further assumptions are required to simplify these calculations, since the spatial variations in the sampling error are referenced to the two turbulence statistics equation image and equation image. The most obvious solution is to assume locally homogeneous turbulence with average turbulence statistics, i.e.

  • equation image(46)

where equation image is the average of the appropriate turbulence statistics, e.g. the average of the conditional structure functions centred on equation image and equation image. This approximation only impacts the off-diagonal terms, i.e. i≠j.

5.  NWP model representation

  1. Top of page
  2. Abstract
  3. 1.  Introduction
  4. 2.  Statistical description of the atmosphere and the definition of ‘truth’
  5. 3.  Statistical description of error referenced to the model grid
  6. 4.  Statistical description of error referenced to the observation coordinate
  7. 5.  NWP model representation
  8. 6.  Maximum likelihood data assimilation for error referenced to the model grid coordinates
  9. 7.  Data assimilation for error referenced to the observation coordinates
  10. 8.  Implementation of advanced data-assimilation algorithms
  11. 9.  Simple example calculations
  12. 10.  Operational issues
  13. 11.  Summary and discussion
  14. Acknowledgements
  15. References

There are various representations for the forecast xf(ti) of gridded state variables at time interval ti. For a perfect initial condition xt(ti−1) (Kalnay, 2003, Eq. (5.6.1))

  • equation image(47)

where Mi−1 denotes the discrete representation of the atmospheric model and η(ti−1) is the model error. The forecast error is defined by

  • equation image(48)

where xa(ti−1) is the analysis at time ti−1. These representations are the foundations for many forecast systems such as Kalman filters. However, a rigorous definition of error statistics is required to evaluate any results correctly. For example, the forecast error (innovation error) statistics depend on the observation sampling errors and therefore are a function of the local turbulence statistics, which must be included in the analysis (Frehlich, 2008).

6.  Maximum likelihood data assimilation for error referenced to the model grid coordinates

  1. Top of page
  2. Abstract
  3. 1.  Introduction
  4. 2.  Statistical description of the atmosphere and the definition of ‘truth’
  5. 3.  Statistical description of error referenced to the model grid
  6. 4.  Statistical description of error referenced to the observation coordinate
  7. 5.  NWP model representation
  8. 6.  Maximum likelihood data assimilation for error referenced to the model grid coordinates
  9. 7.  Data assimilation for error referenced to the observation coordinates
  10. 8.  Implementation of advanced data-assimilation algorithms
  11. 9.  Simple example calculations
  12. 10.  Operational issues
  13. 11.  Summary and discussion
  14. Acknowledgements
  15. References

The ML method is one of the most attractive estimation algorithms since, for many applications, it achieves the theoretical best performance described by the Cramer Rao Bound (Helstrom, 1968; van Trees, 1968). Conceptually, the ML estimate xML is the value of the desired parameters xt that maximizes the joint probability density function (likelihood) with respect to xt based on all the observations yo and the forecast xb or background. However, for geophysical applications the statistical foundations of the joint probability density functions must be given, i.e. the meaning of the ensemble members and probability density functions. The formulation of error statistics referenced to the model grid coordinates as described in section 2 is applied to the ML method for optimal data assimilation that includes the spatial variations in observation-error statistics.

A consistent ML estimator is produced by the following assumptions (Frehlich, 2006).

  • The desired parameters xt or ‘truth’ are the spatial average of the continuous atmospheric variables at each analysis coordinate using the effective model filter gm (Eq. (1)).

  • The observations yo in the vicinity of each analysis coordinate ra are conditionally unbiased estimates (< yoxt(ra) > c=0).

  • The errors for le observations in the vicinity of each analysis coordinate ra defined by yoxt(ra) have a conditional joint Gaussian probability density function given by

    • equation image(49)

    where Y selects the nearby observations, Z selects the state variable of the observation and equation image is the conditional observation-error covariance matrix.

  • The forecast or first guess xb(ra) is conditionally unbiased (< xb(ra) − xt(ra) > c=0).

  • The forecast error defined by xb(ra) − xt(ra) has a conditional Gaussian probability density function given by

    • equation image(50)

    where equation image is the conditional background-error covariance matrix of dimensions lx × lx and Θb are the parameters that describe the conditional background statistics.

  • The observation errors and forecast errors are statistically independent and therefore the conditional joint probability density function (likelihood function) of the observations and first guess is given by equation image.

The ML estimate xML for the analysis xa is the value of xt that maximizes the conditional log-likelihood function and is given by (Frehlich, 2006)

  • equation image(51)

which can be written as

  • equation image(52)

The conditional analysis-error covariance matrix is

  • equation image(53)

which has the same functional form as previous results (Daley, 1991; Kalnay, 2003). However, the observation-error statistics are determined by the ensembles defined in section 3 and depend on the local turbulence statistics, which are a function of space and time.

The most promising estimates of the background-error covariance equation image are produced by ensemble data assimilation systems (Evensen, 1994; van Leeuwen and Evensen, 1996; Houtekamer and Mitchell, 1998, 2001, 2005; Hamill and Snyder, 2000; Anderson, 2001; Zhang and Anderson, 2003; Lorenc, 2003; Kalnay, 2003; Ehrendorfer, 2007). An estimate of the conditional background-error covariance is given by

  • equation image(54)

where xf(ti,k) is the forecast for ensemble member k and

  • equation image(55)

is the average forecast.

7.  Data assimilation for error referenced to the observation coordinates

  1. Top of page
  2. Abstract
  3. 1.  Introduction
  4. 2.  Statistical description of the atmosphere and the definition of ‘truth’
  5. 3.  Statistical description of error referenced to the model grid
  6. 4.  Statistical description of error referenced to the observation coordinate
  7. 5.  NWP model representation
  8. 6.  Maximum likelihood data assimilation for error referenced to the model grid coordinates
  9. 7.  Data assimilation for error referenced to the observation coordinates
  10. 8.  Implementation of advanced data-assimilation algorithms
  11. 9.  Simple example calculations
  12. 10.  Operational issues
  13. 11.  Summary and discussion
  14. Acknowledgements
  15. References

The majority of data-assimilation algorithms are based on the definition of error assigned to each observation (Daley, 1991; Kalnay, 2003). Most of these algorithms assume a spatially and temporally uncorrelated observation error with an error variance that has a weak dependence on altitude and latitude. The most common algorithms are based on minimizing mean-square error and the ensemble Kalman filter framework, which provides state-dependent estimates of the background-error statistics (Daley, 1991; Kalnay, 2003). Spatial variations in the total observation-error statistics can also be included in all these formulations (Frehlich, 2006) and the optimal analysis becomes

  • equation image(56)

where K is the gain matrix given by

  • equation image(57)

H is the linear interpolation of the first guess xb to the observation coordinates, equation image is the conditional observation-error covariance matrix, equation image is the conditional forward interpolation-error covariance (the elements of eh are equation image), equation image is the conditional cross-covariance matrix between the observation error and forward interpolation error and the conditional ensemble averages <>c are based on the ensemble members defined in section 4. The conditional analysis-error covariance becomes

  • equation image(58)

where I denotes the identity matrix. Similarly, the 3D-Var, 4D-Var and extended Kalman filter formulation can be modified (Frehlich, 2006) to include the state-dependent observation-error statistics based on the definition of error defined in section 4.

8.  Implementation of advanced data-assimilation algorithms

  1. Top of page
  2. Abstract
  3. 1.  Introduction
  4. 2.  Statistical description of the atmosphere and the definition of ‘truth’
  5. 3.  Statistical description of error referenced to the model grid
  6. 4.  Statistical description of error referenced to the observation coordinate
  7. 5.  NWP model representation
  8. 6.  Maximum likelihood data assimilation for error referenced to the model grid coordinates
  9. 7.  Data assimilation for error referenced to the observation coordinates
  10. 8.  Implementation of advanced data-assimilation algorithms
  11. 9.  Simple example calculations
  12. 10.  Operational issues
  13. 11.  Summary and discussion
  14. Acknowledgements
  15. References

All of the data-assimilation algorithms require estimates of the conditional observation-error covariance equation image: either around each analysis coordinate or around each observation coordinate. These calculations require local estimates of the turbulence parameters Θo that define the local structure functions of the model variables (see Eqs (2)-(7)). The simplest approach is to assume that the local turbulence statistics have a universal description where the shape and scaling laws of the conditional structure functions are equal to the climatology of turbulence, e.g. the average structure function in Figure 1 provides the universal shape in the given analysis domain of 40–50°N. Then the turbulence parameters Θo are the level of the local structure functions, i.e. a1 = 2ϵ2/3 in Eq. (8), where ϵ is the energy-dissipation rate, which can be estimated from the local structure functions of the ensemble forecast members with corrections for the effects of the model spatial filter (Frehlich and Sharman, 2004). The resulting statistics of turbulence are consistent with the statistics from commercial aircraft data (Wolff and Sharman, 2008).

An example of the observation sampling error of a rawinsonde observation of one horizontal velocity component calculated from the local turbulence estimates from the RUC13 model output for assimilation into the GFS model with the L = 150 km square model filter (see Figure 2) is shown in Figure 3. The large variations in the observation sampling error reflect the large variations in the local turbulence statistics Θo (energy-dissipation rate ϵ) related to the jet stream over Texas and the gravity waves over the Northern Rockies. Similar results are produced for temperature (Frehlich and Sharman, 2004) but with a smaller magnitude compared with the 0.5 K random error of a rawinsonde observation.

thumbnail image

Figure 3. Calculations of the observation sampling error for 14 December 2006 at 0000 UTC at an altitude of 10 km (Eqs (34)–(36) and equation image) for one horizontal velocity component of a rawinsonde measurement at the centre of a GFS grid cell based on local turbulence estimates of energy-dissipation rate ϵ from the RUC model.

Download figure to PowerPoint

9.  Simple example calculations

  1. Top of page
  2. Abstract
  3. 1.  Introduction
  4. 2.  Statistical description of the atmosphere and the definition of ‘truth’
  5. 3.  Statistical description of error referenced to the model grid
  6. 4.  Statistical description of error referenced to the observation coordinate
  7. 5.  NWP model representation
  8. 6.  Maximum likelihood data assimilation for error referenced to the model grid coordinates
  9. 7.  Data assimilation for error referenced to the observation coordinates
  10. 8.  Implementation of advanced data-assimilation algorithms
  11. 9.  Simple example calculations
  12. 10.  Operational issues
  13. 11.  Summary and discussion
  14. Acknowledgements
  15. References

The fundamental issues concerning the definition of observation-error statistics can be demonstrated by using simple observation geometries. Simplified expressions are produced using the first guess xb(ra) and the nearest observation yo to the analysis point ra of the state variable x, i.e. the ML analysis

  • equation image(59)

and the analysis-error variance

  • equation image(60)

where equation image is the forecast-error variance and equation image is the conditional observation-error variance of the nearest observation. This is a convenient reference calculation for evaluating data assimilation techniques.

The sensitivity of the data-assimilation algorithm and various statistics to the shape of the model filter equation image is not known but can be determined by calculations using different model shapes. The two-dimensional model filter is given by equation image, where

  • equation image(61)

is a sine-taper function and a < L/2. A Gaussian function is another convenient model, i.e.

  • equation image(62)

where LG = is chosen such that g0(L/2)/g0(0) = 1/2, i.e. equation image, which simplifies the comparison with the sine-taper function as shown in Figure 4.

thumbnail image

Figure 4. Model filter functions g0(r)/g0(0) for various models: a square (line), sine-taper model Eq. (61) with a/L = 0.25 (dotted line), a/L = 0.5 (dashed line) and a Gaussian model Eq. (62) (dot–dashed line).

Download figure to PowerPoint

Examples of the resulting average model structure functions for L = 50 km are shown in Figure 5 for the atmospheric conditions defined by the ACARS data of Figure 1. Also shown is the best-fitting square model filter size Leff based on the best-fitting structure function. The different model functions g0(r) all have excellent agreement with the average structure function calculated assuming a square filter shape, and therefore it is difficult to determine the exact shape of g0(r) with structure functions.

thumbnail image

Figure 5. Average velocity structure functions for an NWP model (bullet) with a 10 km grid with a Gaussian and sine-taper filter with a = 25 km and L = 50 km. The best-fitting structure function assuming a square filter with dimension Leff (line) and the average atmospheric model from ACARS data (Eq. (8), dotted line) are also shown.

Download figure to PowerPoint

However, the spatial spectrum of model output has more sensitivity to the effects of the model filter g0(r), as shown in Figure 6. Here, the spatial spectrum of the u velocity component corresponding to the data shown in Figure 2 is compared with predictions of various model filters g0(r) assuming that the structure function (and therefore the covariance function) from the ACARS data is the true climatology of the in situ statistics. The results for the square filter with L = 150 km from the best-fitting structure function in Figure 1 fluctuate around the GFS spectra at high spatial frequencies k. The sine-taper and Gaussian model have smaller variations around the GFS spectra. These results indicate that the effective filter g0(r) is closer to the Gaussian model than the square model. However, at the highest frequencies, the contribution from small-scale model noise and numerical artefacts such as aliasing are important and it is difficult to produce an accurate estimate of the shape of g0(r). Therefore, the difference between the calculations of a known filter shape and that of the corresponding effective square filter should bound the error in the calculation of observation sampling-error statistics due to the unknown shape of g0(r).

thumbnail image

Figure 6. Average spectrum from GFS (line) and predictions based on the atmospheric model Eq. (8) for a sine-taper model filter (Eq. (61)) with a = 60.5 km and L = 121 km, a square with L = 150 km and a Gaussian filter (Eq. (62)) with LG=64 km.

Download figure to PowerPoint

The sensitivity of the model filter shape to the calculation of the observation sampling error is determined for the atmospheric case of Figure 1 and various filter shapes g0(r) with L = 50 km (see Figure 4), which corresponds to a mesoscale model with horizontal grid spacing of approximately Δ = 10 km. For each filter shape, the effective square filter size Leff is determined from the best-fitting structure function (see Figure 5). The resulting square filter dimensions Leff and the calculation of various observation sampling error statistics for velocity measurements are shown in Figure 7. The observation sampling error increases as a/L increases because larger scales of turbulence contribute to the definition of ‘truth’. However, the ratio of the sampling error based on the best-fitting square model filter to that of the various filter shapes has a maximum deviation of only 4% for the rawinsonde observation and 9% for an observation averaged along a track. Therefore, the actual shape of the model filter that defines ‘truth’ has a weak dependence on the calculation of observation-error statistics. Since the actual model filter shape appears to be closer to a Gaussian function or the sine-taper function with a/L = 0.5 (Figure 6), very low errors would be produced using either of these functional forms for g0(r).

thumbnail image

Figure 7. Observation sampling error σu of the east velocity component u versus a/L for a sine-taper model filter g0 (Eq. (61)) and the Gaussian model filter Eq. (62) plotted at a/L = 0.6. The average atmospheric conditions of Figure 1 and Eq. (8) are used in Eqs (30)–(33) (equation image and equation image) for a rawinsonde observation (open circle) at the centre of a grid cell and an aircraft or lidar observation (bullet) sampled along a track of length Ltrack = 50 km. Each model filter has L = 50 km and the effective square model filter Leff (filled square) is determined from the best-fitting structure function and is used to calculate the observation sampling error σuLeff for the rawinsonde and average along the track.

Download figure to PowerPoint

10.  Operational issues

  1. Top of page
  2. Abstract
  3. 1.  Introduction
  4. 2.  Statistical description of the atmosphere and the definition of ‘truth’
  5. 3.  Statistical description of error referenced to the model grid
  6. 4.  Statistical description of error referenced to the observation coordinate
  7. 5.  NWP model representation
  8. 6.  Maximum likelihood data assimilation for error referenced to the model grid coordinates
  9. 7.  Data assimilation for error referenced to the observation coordinates
  10. 8.  Implementation of advanced data-assimilation algorithms
  11. 9.  Simple example calculations
  12. 10.  Operational issues
  13. 11.  Summary and discussion
  14. Acknowledgements
  15. References

There are many operational issues that will impact efficient implementation of ensemble forecast systems based on adaptive data assimilation techniques using state-dependent observation-error covariances. These include the initialization of the analysis, the generation of NWP ensemble members and the estimation of the local turbulence parameters that define the total observation-error statistics.

The spatial variations in total observation error produced by the large variations in observation sampling error could result in an analysis field xa that deviates from the background xb in those regions that have low observation errors (see Figure 3). This is probably the most important issue for implementing optimal data assimilation techniques, and therefore careful initialization of the analysis will be required to maintain balanced flows, especially for regions that have sparse but accurate observations. An attractive solution is the use of Digital Filter Initialization (DFI) (Lynch and Huang, 1992; Huang and Lynch, 1993; Chen and Huang, 2006), which may require iterations to modify the background fields in the nearby data-void regions.

Many techniques have been proposed for the generation of the perturbed ensemble members (Toth and Kalnay, 1997; Houtekamer and Mitchell, 1998, 2001, 2005; Mitchell et al., 2002; Snyder and Zhang, 2003; Tippett et al., 2003; Zupanski etal., 2006; McLay et al., 2007; Whitaker et al., 2008). To include correctly the effects of the spatial variations in the total observation-error statistics, the generation of perturbed observations consistent with the local estimates of total observation error based on estimates of the local turbulence statistics is an attractive option (Frehlich, 2006). The ensembles can also be used to estimate the local turbulence statistics ϵ (Frehlich and Sharman, 2004) for the calculation of the observation-error covariance (see Eqs (25) and (30)) based on turbulence scaling laws. This is similar to recent techniques for variance reduction of the background-error covariances with small ensemble sizes (Heemink et al., 2001; Raynaud et al., 2008, 2009), since the shapes of the error covariances are determined a priori using simple empirical models, climatological shapes or reduced rank covariance decompositions, and the spatially filtered ensemble-variance estimates are used as scaling constant. In addition, remote sensing data from a scanning Doppler lidar (Frehlich et al., 2006; Frehlich and Kelley, 2008) can provide local structure-function estimates (see Eq. (30)) or equivalently local covariance estimates (see Eq. (25)) for improving short-term forecasts of wind power or for input into nowcasting algorithms. In many cases, the contribution of the terms equation image from the conditional mean values in Eqs (25) and (30) are negligible, especially for well-defined boundary layer processes with a single-scale von Kármán model (Frehlich et al., 2006; Frehlich and Kelley, 2008).

One metric for the performance of the chosen ensemble members is the agreement of the long-term average forecast-error covariance with various innovation-error statistics (Hollingsworth and Lonnberg, 1986; Lonnberg and Hollingsworth, 1986; Dee and Da Silva, 1999), which is one of the common techniques to estimate the forecast-error covariance assuming spatially uncorrelated observation errors. However, this technique must include a rigorous definition of error statistics as well as the climatology of the atmospheric turbulence statistics (e.g. the structure function of Figure 1) and the effects of the spatial filter of the forecast model (Frehlich, 2008). This is especially important for data products that have a large spatial average, such as GPS occultation data (Kuo et al., 2004; Healy and Thépaut, 2006; Chen et al., 2009), where the observation sampling error can be large because of the large mismatch between the model filter and the spatial average of the observation.

Another common technique for estimating the forecast-error statistics, proposed at the National Meteorological Center, is based on the differences of two forecasts valid at the same time (Parrish and Derber, 1992; Rabier et al., 1998). Under certain conditions, the covariance of the observation error, forecast error and analysis error can be estimated from the differences between forecasts and observations (innovations), forecasts and analysis, and observations and analysis (Desroziers et al., 2005). Recently, this approach has been used to improve the ensemble Kalman filterreak (Li et al., 2009) and to evaluate the sensitivity of an analysis in an ensemble Kalman filter (Liu et al., 2009). In addition, covariance inflation algorithms can also be investigated (Anderson, 2007). All of these methods average over the spatial and temporal variations in the errors.

11.  Summary and discussion

  1. Top of page
  2. Abstract
  3. 1.  Introduction
  4. 2.  Statistical description of the atmosphere and the definition of ‘truth’
  5. 3.  Statistical description of error referenced to the model grid
  6. 4.  Statistical description of error referenced to the observation coordinate
  7. 5.  NWP model representation
  8. 6.  Maximum likelihood data assimilation for error referenced to the model grid coordinates
  9. 7.  Data assimilation for error referenced to the observation coordinates
  10. 8.  Implementation of advanced data-assimilation algorithms
  11. 9.  Simple example calculations
  12. 10.  Operational issues
  13. 11.  Summary and discussion
  14. Acknowledgements
  15. References

‘Truth’ xt(rk) is defined as the convolution of the continuous atmospheric state variables by the effective spatial filter of the NWP model for each discrete model coordinate rk (Frehlich, 2006). Therefore, the error in a forecast xf(rk) or initial state xa(rk) (analysis) is ef(rk) = xf(rk) − xt(rk) or ea(rk) = xa(rk) − xt(rk), respectively. It is assumed that with this definition of ‘truth’, a perfect model will produce forecasts with the smallest error statistics. Since direct observations of the state variables are affected by the unresolved scales of the NWP model (subgrid processes), the observation errors depend on the local turbulence statistics and are a function of space and time, i.e. state-dependent. Two definitions of observation error were proposed (Frehlich, 2006) to include this state-dependence: error referenced to the model grid coordinates rk, i.e. eo(rk) = yo(ri) − xt(rk), and error referenced to the observation coordinate eo(ri) = yo(ri) − xt(ri) where ri denotes the centroid of the spatial sampling of the observation.

The properties of the model spatial filter gm can be investigated in a statistical sense by a comparison of NWP model structure functions or spatial spectra that produce an effective model resolution of approximately five times the grid spacing. The structure functions do not have the sensitivity to determine the actual shape of gm, however the spatial spectra indicate that a Gaussian function may be a better representation than a rectangle. As a first approximation, we assume the spatial filter of the NWP model is universal, i.e. the shape is constant for all atmospheric conditions and therefore equal to that determined from the average spatial statistics.

For most direct observations of state variables (rawinsonde, aircraft, Doppler radar, Doppler lidar, etc.), the total observation error consists of the instrument error ei = yoys and the observation sampling error es = ysxt (Frehlich, 2001). In many cases, the instrument error depends on the detector noise, estimation algorithms and local turbulence conditions. The observation sampling error depends on the mismatch between the observation sampling function go and the effective model resolution gm as well as the statistics of the local turbulent field (the spatial structure function). Therefore, a rigorous calculation of the observation-error covariance requires a description of the ensemble members that faithfully reflects the contribution of all the atmospheric random processes and is consistent with the definition of ‘truth’. For error referenced to the model grid coordinates, these ensemble members were selected from an infinite number of earth systems with the same forcing such that the values of xt and the local turbulence statistics Θ were identical. As shown in Figure 2, the random variations of the continuous atmospheric variables over the ensemble members describe the subgrid turbulence that defines the observation sampling error. For error referenced to the observation coordinates, the ensemble members have the added requirement that the values of ‘truth’ and the local turbulence statistics at each observation coordinate should be identical. These ensemble members define all the important error statistics such as the observation-error covariance in terms of the scaling laws for the turbulent fields and the local turbulence statistics Θo (e.g. ϵ for the velocity field). There are large spatial variations in the observation sampling error as shown in Figure 3 for rawinsonde velocity measurements located at the centre of a model grid cell. Current techniques for estimating observation-error statistics (Hollingsworth and Lonnberg, 1986; Lonnberg and Hollingsworth, 1986; Daley 1992; Dee, 1995; Dee and Da Silva, 1999; Dee et al., 1999) are based on a climatological average over these variations and therefore produce sub-optimal data-assimilation algorithms.

The maximum-likelihood technique produces an optimal data-assimilation algorithm (Eq. (51)) using the definition of observation error referenced to each NWP model coordinate assuming that the observation error and the first-guess error are statistically independent and have a joint Gaussian probability density function. Similar results are produced using the Kalman filter formulation or mean-square error analysis with observation error referenced to each observation coordinate (see Eqs (52) and (56)). The spatial variations in the observation-error statistics for the diagonal elements of the covariance matrix are determined by the local turbulence parameters of each observation. The off-diagonal elements require an approximation such as using the average of the local turbulence statistics of the two observation coordinates. The effects of the linear interpolation to the observation coordinates should be small, since the NWP model filter produces a nearly linear dependence of state variables for adjacent grid points because the structure functions of the model variables have an r2 dependence at small lags (see Figure 1 and Frehlich and Sharman, 2004, 2008).

The definition of ‘truth’ for error statistics depends on the NWP model filter gm(r1,r2) = g0(r1)g0(r2), which is well approximated by a square filter based on spatial structure function comparisons (see Figure 5). However, the spatial spectra indicate that the shape of the model filter g0 is closer to a Gaussian filter or a sine taper (see Figure 6). Calculations of the analysis error for rawinsonde observations and observations averaged along a line (aircraft data or space-based lidar data) using the effective-square filter function have little error (Figure 7) for all the filter shapes considered. Therefore, the actual shape of the NWP model filter has a minor effect. Assuming a Gaussian shape may provide the most accurate calculation of error statistics but more work is required to characterize the spatial filter of NWP models.

There are many operational issues that should be revisited to include the spatial variations in the total observation-error statistics correctly. The selection of the members (perturbed observations or ensemble square-root filters) of ensemble forecast systems should be investigated with state-dependent observation errors included in the analysis. Similarly, the various procedures for initializing the analysis to maintain balanced fields is an even more pressing problem, since observations that have small total observation error (see Figure 3) could produce an analysis with regions with large deviations from the first-guess field.

The value of the many different observing systems can be determined using a consistent definition of error statistics and the scaling laws for the local turbulence to describe the observation sampling errors. Rawinsonde observations have the simplest description, since the instrument error is well-documented and the observation sampling error is a simple calculation (see Eqs (34)–(36)). Observations along a track (some aircraft data and Doppler lidar measurements from space) are also numerically tractable for the calculation of observation sampling error (Frehlich, 2001), but the instrument error may depend on atmospheric turbulence and shear, especially Doppler lidar measurements from space. Doppler radar observations are more complex, since both the instrument error and the observation sampling error depend on local turbulence and shear over a three-dimensional volume (Doviak and Zrnic, 1993; Fathalla et al., 2008; Lu and Xu, 2009). More research is required to calculate Doppler-radar error statistics correctly.

Observing System Simulation Experiments (OSSEs) (Rohaly and Krishnamurti, 1993; Baker et al., 1995; Atlas, 1997; Liu and Rabier, 2003; Snyder and Zhang, 2003; Riishojgaard et al., 2004; Stoffelen et al., 2005; Tong and Xue, 2005; Marseille et al., 2008; Lu and Xu, 2009; Ma et al., 2009) have been used to evaluate the performance of various observations and data assimilation systems. However, OSSEs must correctly represent the true spatial variability of the total observation errors (instrument error plus observation sampling errors) (Marseille and Stoffelen, 2003; Chen et al., 2009; Lu and Xu, 2009) and must also include optimal data-assimilation algorithms. This requires improved estimates of the local turbulence parameters (Frehlich and Sharman, 2004) and better calculations of the total observation-error covariance.

Finally, a rigorous analysis of the spatial variations of the total observation error produced by the spatial variations in the statistics of the turbulence field requires a better understanding of the turbulent processes. Fortunately, for many atmospheric conditions (Gage, 1979; Nastrom and Gage, 1985; Lindborg, 1999; Wikle et al., 1999; Cho and Lindborg, 2001; Lindborg and Cho, 2001; Lenschow and Sun, 2007; Riley and Lindborg, 2008) there is a robust scaling of turbulence in the horizontal plane that connects the resolved scales of the NWP models to the subgrid scale turbulence statistics. More work is required to extend these results to other atmospheric conditions, such as the night-time residual layer and stable boundary layers.

References

  1. Top of page
  2. Abstract
  3. 1.  Introduction
  4. 2.  Statistical description of the atmosphere and the definition of ‘truth’
  5. 3.  Statistical description of error referenced to the model grid
  6. 4.  Statistical description of error referenced to the observation coordinate
  7. 5.  NWP model representation
  8. 6.  Maximum likelihood data assimilation for error referenced to the model grid coordinates
  9. 7.  Data assimilation for error referenced to the observation coordinates
  10. 8.  Implementation of advanced data-assimilation algorithms
  11. 9.  Simple example calculations
  12. 10.  Operational issues
  13. 11.  Summary and discussion
  14. Acknowledgements
  15. References