SEARCH

SEARCH BY CITATION

Keywords:

  • model errors;
  • 1D-Var;
  • flow-dependence

Abstract

  1. Top of page
  2. Abstract
  3. 1.  Introduction
  4. 2.  Modeling tools
  5. 3.  The ‘island’ localization method
  6. 4.  Forecast experiments
  7. 5.  Conclusions
  8. Acknowledgements
  9. References

A method to select local statistics for background-error covariance matrix (B) determination is presented and applied to the 1D-Var+nudging assimilation of SEVIRI radiances in the COSMO model. The system is designed as a post-processing algorithm of an ensemble forecast system based on multi-model initial and boundary conditions and perturbations of physical parametrization parameters. Ensemble spread maps are combined to identify regions (‘islands’) inside the model domain of uniform and large error. Thereafter B matrices are calculated using local statistics from the identified islands assuming the ensemble spread to be representative of the background-error dispersion. This calculation is repeated at the beginning of each assimilation cycle to ensure the time evolution of the selected regions and thus the flow-dependency of the background-error covariance matrix statistics.

The benefit of calculating B using local statistics is then quantified by comparison to background covariance errors evaluated from global domain statistics. This is done using the same ensemble members over the whole domain, i.e. without applying the localization procedure. The standard NMC approach, which uses long time series of model departures calculated at different forecast times, is also compared for reference. Model departure statistics and comparison of the final analysis to independent observations from surface stations and satellite products highlight the relevance of the localization procedure even though the noisy structure of the variance profiles produces a substantial reduction in the number of valid retrieved profiles. The final findings show that the selective identification of areas of homogeneous spread/error is a suitable approach to characterize local error growth which can bring about sensible improvements in the forecast scores. Copyright © 2011 Royal Meteorological Society


1.  Introduction

  1. Top of page
  2. Abstract
  3. 1.  Introduction
  4. 2.  Modeling tools
  5. 3.  The ‘island’ localization method
  6. 4.  Forecast experiments
  7. 5.  Conclusions
  8. Acknowledgements
  9. References

In variational data assimilation, analysis increments belong to the column space of the background covariance matrix, B. It is therefore crucial that B correctly represents the time evolution, the magnitude and localization of the various components of model errors. In practical applications, however, the impossibility of accessing the true state of the atmosphere, the unfeasibly large size of B, and the limited number of independent error estimates, inevitably results in a failure to model certain aspects such as anisotropy, flow-dependence, and baroclinicity, with severe consequences for the quality of the analysis. For example, the most classic NMC method (named after the development of the methodology at the National Meteorological Center by Parrish and Derber, 1992), which uses differences between forecasts of different lengths which verify at the same time, has been proved to underestimate (overestimate) the variance of background errors in data-sparse (dense) areas (Bouttier, 1994). In this approach, forecast lengths of +12 h and +36 h are used to reduce the risk of incorporating ‘spin-up’ effects and diurnal cycle signals into the diagnosed statistics. Nevertheless, these time ranges are significantly longer than the ones employed to generate the background fields feeding data assimilation systems. The consequence is a broadening of the spatial and vertical error correlations, as shown by Pereira and Berre (2006). To overcome these problems, Houtekamer et al. (1996) proposed an ensemble-based approach (sometimes also called the Monte Carlo method) (Houtekamer and Mitchell, 1998; Evensen, 2003; Hamill, 2006), in which long-time statistics of background errors are estimated from ensembles of analyses drawn from randomly perturbed observations and physical parametrization. This method, implemented firstly in Canada and at the European Centre for Medium-Range Forecasts (ECMWF; Fisher, 2003), tested in the French model ARPEGE 3D-Var Pereira and Berre (2006) and also applied to limited-area models (LAMs; Lindskog et al., 2007), was shown to be able to better capture error correlation arising from the synoptic scales and to identify regions of different observation quality and coverage (Pereira and Berre, 2006). However, a main drawback is the sampling noise due to the restricted number of ensemble members which has to be alleviated by the application of optimized filtering techniques (Berre et al., 2007).

Most of the previous work has thus concentrated on how to estimate the model background error. Bannister (2008) offers a complete review of methods for practical B calculations showing how ensemble-based methods mimic information dynamics along analysis/forecast phases, and in this way are theoretically better than NMC. Instead, almost no attention has been paid to where a model error anomaly can occur and is prone to give rise to B structures very dissimilar to the climatology. Since error structures are strongly scale-dependent, especially at high model resolution typical of limited-area models, an intelligent selection of geographical regions for B determination can be an attractive perspective to obtain analysis increments representative of local synoptic processes. LAM background errors arise from two different sources. Errors at scales comparable to the driving model resolution stem from misplacement and mistiming of the large-scale flow. The inherent unpredictability of small-scale turbulent flow and the up-scale error growth of unresolved processes, provide instead an additional contribution at much smaller spatial scales. The lagged-NMC method, proposed by Š and Montmerle et al. (2006) for LAM application, is a first attempt to perform a scale-dependent selection of model error. It is a modification to the standard NMC method, which is meant to eliminate the error due to the large-scale model that provides the lateral boundary conditions, in order to concentrate on the background-error model. It is simply implemented by applying the same lateral boundary conditions for the shorter forecast as those used by the longer forecast, where there is overlap.

Nevertheless, the a priori suppression of a source of error appears somehow arbitrary, and so a better approach would be the characterization of covariances under a local stationary process assumption. Therefore in this article we propose a method to select where regions of large error (we will call them ‘islands’) are located. The geographical selection uses both the spread homogeneity of an ensemble system and the information on where most of the observation impact is expected (i.e. larger analysis increments). Thus the working assumption is that B matrices calculated from local statistics over the selected islands (as opposed to global-domain averages) are more representative of actual background errors in those regions. The consequent observation assimilation which uses those local Bs is then able to produce analysis increments representative of the local-scale processes, since the information is contained in the columns of the B matrix.

This basic idea is presented in a implementation which uses the COSMO-SREPS* ensemble system which is based on multi-model initial and boundary conditions and perturbations of physical schemes as the ensemble system. The variational assimilation system is instead a simple 1D-Var module for SEVIRI radiance assimilation (Di Giuseppe et al., 2009), implemented into the regional-scale model COSMO (Steppeler et al., 2003). In this context, therefore, only the required vertical error correlations are calculated. To assess the relevance of the proposed local selection of covariance statistics, its retrieval performance is compared to a global-ensemble-based approach, in which statistics of background errors are estimated from the same COSMO-SREPS ensemble system without applying the localization procedure. The comparison to the standard NMC approach is also reported for reference. Eight days within the operations period (OP) of the Forecast Demonstration Project starting on 8 August 2007 are selected as a testing period (Zappa et al., 2008). This is a project of the World Weather Research Program of the World Meteorological Organization.

The paper introduces the modelling tools in section 2 and describes in section 3 the new methodology for selection of local background-error covariance statistics. An extensive discussion on the quality and statistical proprieties of the selected islands and on the quality of the retrieved profiles is also provided. Section 4 reports on the impact on forecast scores when the new method is applied. Finally, section 5 gives the conclusion of this study.

2.  Modeling tools

  1. Top of page
  2. Abstract
  3. 1.  Introduction
  4. 2.  Modeling tools
  5. 3.  The ‘island’ localization method
  6. 4.  Forecast experiments
  7. 5.  Conclusions
  8. Acknowledgements
  9. References

2.1.  The COSMO-SREPS ensemble system

The COSMO-SREPS ensemble forecasting system is a recent development of the COSMO consortium, designed to provide probabilistic forecasting support at the short range and to furnish auxiliary information for data assimilation purposes. To this end, COSMO-SREPS tries to statistically characterize processes possessing a low inherent predictability at the small spatial scale (O(10 km)) and at the very short time range (up to +48 h) (Marsigli et al., 2006) using multi-analysis multi-boundary conditions and applying model perturbations to each ensemble member. It consists of 16 integrations at 10 km horizontal resolution of the limited-area non-hydrostatic model COSMO (Steppeler et al., 2003). Initial and boundary conditions are provided by four runs of the same COSMO model at 25 km resolution which are driven by four different global models: the IFS model of ECMWF, the Unified Model of the UK Met Office, the GFS of the National Center for Environmental Prediction (NCEP), USA, and the GME of Deutscher Wetterdienst (DWD), Germany. In addition, model perturbations are applied by simply varying tuning parameters inside the physical schemes. The strategy for the optimal definition of model perturbations is still under development; nevertheless during the MAP D-Phase test period the model physics perturbations applied were:

  • p1 (control run) –set-up as in the operational configuration;

  • p2 –change of the precipitating convection scheme from the operational Kain–Fritsch (Kain and Fritsch, 1990) to the Tiedtke (1989) scheme;

  • p3 –change in the value of the length-scale of thermal surface patterns in the boundary-layer scheme (from 500 m to 1000 m);

  • p4 –change in the value of the maximal turbulent length-scale in the boundary-layer scheme (from 500 m to 1000 m).

The COSMO-SREPS domain covers a central European area (Figure 5 below) with 10 km resolution and with 40 vertical levels from the ground up to around 20 hPa. The forecast range spans a period of 72 h starting at 0000 UTC. The sketch in Figure 1 provides an overview of the COSMO-SREPS configuration and how the outputs generated by the ensemble system are manipulated to calculate opportune B matrices to be employed in the COSMO-I7 deterministic run. Starting from 0000 UTC, forecast runs of +72 h are integrated and 3 h outputs are produced. COSMO-SREPS analyses are interpolated from any of the four global models employed. Therefore, the initial spread is due only to differences in the assimilation systems of the large-scale models and not, as in Fisher (2003), to the use of perturbed observations. Strictly speaking, therefore, the +12 h forecasts and not the analysis errors are sampled. Admittedly COSMO-SREPS was not originally designed for data assimilation and possibly it is not optimized for it, nevertheless it represents a good daily proxy for our model error as shown in Marsigli et al. (2006). The generated B matrices are then used as input in the +24 h analysis cycle of the COSMO-I7 deterministic run.

thumbnail image

Figure 1. Sketch showing the COSMO-SREPS ensemble forecasting system used in this study and how the outputs provided by the system are manipulated to calculate opportune B matrices to be employed in the COSMO-I7 deterministic run. Sixteen members from 10 km resolution COSMO model runs are generated. Four global models drive four COSMO model runs with perturbation of physical schemes. See text for details.

Download figure to PowerPoint

2.2.  The COSMO-I7 1D-Var assimilation system

The new methodology is tested in a 1D-Var+nudging framework, developed for the assimilation of SEVIRI radiances into the COSMO model. In the interest of brevity, here only a brief resume is given of the system and of the data used. More details can be found in Di Giuseppe. Data from five of the SEVIRI (Schmetz et al., 2002) channels (6.2, 7.3, 8.7, 10.8 and 12.0 μm) are used in a 1D-Var algorithm to generate profiles of temperature and humidity through an iterative procedure, which seeks the solution xa which minimizes a cost function, J, which measures the distance between a model background and the observed radiances:

  • equation image(1)

where H is the operator simulating the observed data from the model variable x, and R is the observation-error covariance matrix. In this study the control vector x contains vertical profiles of temperature and specific humidity, 2m temperature, 2m specific humidity and sea surface temperature (i.e. xb=(T, q, q2m, T2m, SST)). The background vector is assigned at run time during the analysis integration as explained in Di Giuseppe and only vertical profiles of T and q are passed to the nudging scheme to be assimilated. Observation errors include measurement errors, representativeness errors and errors in H. Finally B is the aforementioned background-error covariance matrix of the state xb. The superscripts –1 and T denote inverse and transpose matrices, respectively. The forecast model used as background is identical to the one used for the COSMO-SREPS, but has a reduced domain size, centred over Italy, with a grid size of 7 km and 40 levels in the vertical, between the surface and the top of the troposphere at around 20 hPa. Three-hourly boundary conditions are provided by the ECMWF global model IFS. It is assumed that the measurement error for each of the infrared channels (8.7, 10.8 and 12.0 μm) can be set to 1.0 K, while 1.7 K is the estimated error for the water vapour (WV) channels (6.2, 7.3 μm) (Di Giuseppe et al., 2009), with no correlation between channels. The observation-error covariance matrix is then a simple diagonal matrix. Before their use, the satellite data are bias corrected using a scheme which is based on four air-mass-dependent predictors following the guidelines of (Harris and Kelly, 2001).

Each run is re-initialized at the beginning of the assimilation window to have a set of self-consistent experiments. Only cloud-free observations over sea points are used in this analysis. Successful convergence is achieved by employing a minimizer based on the routine M1QN3 by Gilbert and Lemaréchal (1989) which is designed to check the decrease of the gradient of the cost function. It implements a limited-memory quasi-Newtonian technique with a dynamically updated scalar or diagonal preconditioner. To facilitate the process, a preconditioning is applied by using the B matrix eigenvectors. Additional checks are performed after the iteration procedure and, for an iteration number larger than five, convergence is considered achieved only if the final cost function value is decreased by at least of 50% of its initial value (Di Giuseppe et al., 2009, provide further details).

3.  The ‘island’ localization method

  1. Top of page
  2. Abstract
  3. 1.  Introduction
  4. 2.  Modeling tools
  5. 3.  The ‘island’ localization method
  6. 4.  Forecast experiments
  7. 5.  Conclusions
  8. Acknowledgements
  9. References

3.1.  Method outline

The background-error covariance matrix localization proposed employs outputs of the COSMO-SREPS ensemble system in a novel approach. Geographic regions of homogeneous errors (‘islands’) are identified on the basis of a map of spread. This map is obtained averaging ensemble spread fields at several levels and forecast times with appropriate weighting functions. The presented methodology will be named the Local Ensemble Method (LEM), since its novelty consists of a ‘local’ use of error statistics.

The procedure follows three steps which are schematised in Figure 2. Firstly (STEP 1), ensemble spread fields ��i=T,qv are computed for normalised temperature, T and specific humidity, qv, at each model level and every 3 h during the first 12 h of integration. The ensemble spread is defined as the variance of a given variable with respect to the ensemble mean. The variable normalization is performed by subtracting from each field point the ensemble mean and dividing by the ensemble standard deviation. In the second step (STEP 2) the ��i fields are combined to provide a mean map of spread, ��(lat,lon), which is assumed to be a proxy of the geographical distribution of model errors. ��(lat,lon) is calculated as:

  • equation image(2)

Spread fields at different heights are averaged using a weighting function equation image, which has a smooth dependency on the vertical coordinate, z:

  • equation image(3)

where zmax is the height of the function maximum. The choice of zmax is arbitrary and should be tailored to the specific application. However, the idea followed is to attribute a high weight to those levels where observations are located and the model error is large.

thumbnail image

Figure 2. Sketch showing the three steps followed by the LEM procedure to localize areas of homogeneous and significant background error. STEP 1: generation of ensemble spread fields from the COSMO-SREPS system. STEP 2: construction of a two-dimensional spread index field. STEP 3: localization procedure based on a percentile-based method using the spread index field calculated in STEP 2. See text for details.

Download figure to PowerPoint

Outputs from different time steps are simply averaged assuming that they all contribute equally to the final integrated spread index. Thus, �� is:

  • equation image(4)

As stated, �� is a geographical map of spread, with a synthesis of model errors at several vertical levels and sampled over the whole integration period. This field is then used to localize regions of anomalous error size using a percentile-based method (STEP 3). Firstly two pre-defined percentiles percmax and percmin, with 0 < percmin < percmax < 100, of �� are computed. All points of value greater then percmax are identified as possible central points of a region of interest (called an ‘island’). The procedure searches from this point, in each of the four cardinal directions for the first �� value lower than percmin. These four points will mark the rectangular borders of the selected island. The definition of other islands is performed in the same way, starting from any other point of the domain where the �� value is greater than percmax, provided that this point has not been already included in any other island; thus points cannot belong to more than one island. Islands formed by a number of points smaller than nmin are finally rejected.

Once the islands are identified, they are used as masks to select statistics for the background-error calculations. For each island of point dimension p, 4 × 16 members profiles of temperature and specific humidity are extracted and covariances are calculated using members' departures from the ensemble mean. The calculation is performed separately for temperature and humidity (i.e. B is a block diagonal matrix) and no cross-covariance is considered between the two variables. At the end of the LEM process, a number of background covariance matrices equal to the number of islands are provided. These are then used as input in the assimilation cycle to generate retrieved profiles from SEVIRI.

3.2.  Method discussion

The outlined method is based on a series of assumptions and tuning parameters. The impact of our choices in determining the number and position of these ‘islands’ and the statistical properties of the extracted sample of model errors are discussed here by applying the methodology to a testing period.

Eight days starting on 8 August 2007 are selected. During the first three days, the synoptic situation was characterized by a blocking anticyclonic over the Atlantic. Entrainment of cold and dry Arctic air into the Mediterranean basin produced vast areas of instability characterized by heavy convective precipitation over the east coast of Italy, Switzerland and southern Germany. A weather regime change occurred over the following days and a high pressure ridge formed over the Balkans, which brought intrusion of warm and dry Saharan air. The whole Mediterranean basin was then characterized by good weather and little precipitation.

In this study the LEM approach is applied to the B determination for the retrieval of temperature and humidity profiles from SEVIRI. Most of the choices made are then driven by the need to optimize this specific application as, for example, the selection of temperature and humidity as predictors of error growth. Since the need is to extract a sub-sample of model error statistics for these two variables, the localization cannot avoid using them, or variables with similar information. Additional tests performed using relative humidity and geopotential height as selectors did not show a substantial modification to the island position, number or size. Other applications, in which e.g. surface pressure is used as observations, should probably include this variable in the summation of Eq. (2). In a similar way, the choice of zmax=3 km in Eq. (3) is driven by the specific application.

In a contest of satellite retrievals, to identify the observation location, it seems reasonable to use the weighting functions (WFs)§ of the five SEVIRI channels used in the assimilation system (Schmetz et al., 2002). Figure 3 shows a graphical representation of the connection between equation image and the WFs. equation image has its maximum where all WFs are superimposed on each other. Moreover, as shown later in Figure 7, Eq. (3) has also its maximum between 2 and 4 km height, where most of both temperature and humidity model errors are.

thumbnail image

Figure 3. Graphical representation of the connection between equation image and the SEVIRI weighting functions for the five channels used in the assimilation system. equation image is chosen so that more weight is given to those vertical levels with larger 1D-Var analysis increments.

Download figure to PowerPoint

Finally, percmax, percmin and nmin are tunable parameters, and have been chosen on the basis of sensitivity studies, in order for the methodology to generate islands large enough to build suitable statistics for the B matrix calculation, while preserving the necessary localization. Empirical tests using different thresholds showed that values of percmax=95, percmin=80 and nmin=150 provide the best subjective results in terms of island number and localization.

Figure 4 shows the islands of homogeneous spread, as identified by the localization procedure for the eight days under test. The 500 hPa geopotential height from the ECMWF analysis and the high-resolution visible (HRV) channel from SEVIRI depict the synoptic situation at 1200 UTC. A subjective judgement of the quality of the island distribution is clearly difficult. On some days (like 8 and 9 August) characterized by strong large-scale regimes, the selection prefers areas on the ascending areas of pressure minima where larger instabilities are expected. The synoptic interpretation of the island locations on other days with weaker dynamical forcings is not so obvious. In order to show that, in these latter cases, the procedure effectively picks up regions of model uncertainties, one day (13 August 2007) is analysed in some detail. Figures 5 and 6 show the COSMO-SREPS spread fields for temperature and specific humidity at 850 hPa during the four time steps of the analysis. Most of the signal is confined over the southeastern part of the Mediterranean basin, both for temperature and humidity. Moreover, the signal is persistent over the 12 h integration period. This region is effectively detected by the localization procedure. Other smaller areas, where the signal is either noisier (over Sardinia, for example) or only present in one of the two variables, are instead correctly discarded.

thumbnail image

Figure 4. ‘Islands’ of homogeneous spread identified by the localization procedure for the eight days under test. The 500 hPa geopotential height (contours) from ECMWF analyses and the high-resolution visible (HRV) channel from SEVIRI (grey shading) depict the synoptic situation at 1200 UTC from 8 to 15 August 2007.

Download figure to PowerPoint

thumbnail image

Figure 5. COSMO-SREPS spread fields for 850 hPa temperature for one example day, 13 August 2007, selected at the end of the studied period, for forecast ranges (a) 3 h, (b) 6 h, (c) 9 h, and (d) 12 h.

Download figure to PowerPoint

thumbnail image

Figure 6. As Figure 5, but for the 850 hPa specific humidity field. This figure is available in colour online at wileyonlinelibrary.com/journal/qj

Download figure to PowerPoint

The procedure, therefore, works well in selecting areas of large spread. It remains to be proven that these areas are effectively characterized by large error, i.e. that COSMO-SPREPS is a reasonable tool for selecting areas of anomalous error growth. Since the true state of the atmosphere is not known, this is, strictly speaking, impossible to assess. Marsigli et al. (2006) provides details on the spread-error relationships of COSMO-SREPS, showing how the system is under-dispersive when compared to surface observations or ECMWF analysis. Probably COSMO-SREPS should be optimized for data assimilation purposes in an operational application, but this is outside the real focus of this work.

3.3.  Method assessment

Importantly, the quality of the localization procedure can be judged by the selection of a sub-sample of model error statistics able to produce better retrievals. To evaluate the possible benefit of the LEM, its performance is compared with other methods which employ global statistics, primarily a Global Ensemble Method (GEM) in which statistics of background errors are estimated from the same ensemble system without applying the localization procedure. The comparison to the NMC is also added for reference. The NMC method uses forecast comparisons at +12 h and +36 h averaged over three months of data (June–July–August (JJA) 2007) of the COSMO-I7 model in its operational configuration (Di Giuseppe et al., 2009). The GEM approach produces a day-by-day spatially averaged ensemble matrix employing all data points in the COSMO-SREPS domain. The LEM method uses the same fields of the GEM method, but introduces the spatial localization described in the previous section. At this stage, to ensure a fair comparison among the three B matrices, only observations which fall into defined islands are used. Table I summarizes the main characteristics of the three methodologies.

Table I. Summary of the configurations and main characteristics of the background-error covariance method used in the comparison.
Exp.MethodTimeSpaceComments
LEMLocal Ensemble MethodDaily upgrade at the beginning of each assimilation cycleSpatial localizationVariable number of B matrices are applied depending on the observation location in the domain
GEMGlobal Ensemble MethodDaily upgrade at the beginning of each assimilation cycleDomain spatial averageOne B matrix is available for all observations at the beginning of the assimilation cycle
NMCStatistics based on model departures calculated at +36 h and +12 h forecast timesSeasonal mean (June, July, August)Domain spatial averageOne climatological B matrix is available for all observations and during the whole period

The first comparison is performed on the estimates of background errors. Figure 7 compares the temperature and specific humidity variance profiles, i.e. the diagonal of the B matrices, for one day of example. The NMC and GEM provide only one mean profile, while the LEM approach produces, for the day chosen (13 August), nine islands, each of which has its own variance profile estimation. The vertical correlation length-scale for the same day is reported in Figure 8. The length-scale of the background-error correlation functions is used as an indicator of the degree of spatial smoothing. Following Daley (1993), here the length-scale is calculated as the distance at which the variable autocorrelation reaches 0.5. This point is identified by using the curvature of the correlation function near the origin and approximating the correlation function with a parabolic function (Pannekoucke et al., 2008). Thus, the smaller the length-scale is, the faster the correlation decreases with the vertical distance. The GEM method generates smaller estimates of variance than the NMC as a consequence of the limited sampling size of the error statistics. This is in agreement with the general findings that ensemble-based methods have smaller variances (Fisher, 2003) than the standard climatological method. This lack of variance is not atypical and is usually put forward as a justification for inflating the variance by blending procedures (Ott et al., 2004).

thumbnail image

Figure 7. (a) Temperature and (b) specific humidity variances for the three background-error models considered. The example day is 13 August 2007.

Download figure to PowerPoint

thumbnail image

Figure 8. Length-scale of background-error models as a function of height for (a) temperature and (b) specific humidity. The length-scale is calculated as the distance at which the variable autocorrelation reaches 0.5. The curvature of the correlation function near the origin is approximated with a parabola function (Pannekoucke etal., 2008).

Download figure to PowerPoint

The island approach produces variances which are still comparable to the NMC method. The localization procedure applied to the LEM approach is probably one of the causes of the increase of these variances compared to the GEM method. The islands are in fact selected on the basis of large model errors. The spiky structures visible may produce convergence problems when the B is applied to the 1D-Var retrievals. Figure 9 shows the number of points which satisfy our convergence criterion during the whole experimental period. Data gaps may occur in days of strong cloud cover, since retrieval is performed only over sea and for clear radiances. The LEM approach produces a smaller number of valid profiles than both the GEM and the NMC methods. Nevertheless, it has to be noted that most of the decrease in the number of valid retrievals should be attributed to the use of an ensemble-based method with respect to the standard NMC, since a substantial decrease is already visible when applying the GEM. Indeed, LEM and GEM produce almost five times fewer valid T and q profiles than NMC. These two approaches in their raw ensemble configuration, i.e. without the application of optimized filtering techniques (e.g. Berre et al., 2007), suffer from sampling noise due to the small number of profiles used to construct the B. It is evident that, in the light of an operational implementation, questions such as how to alleviate the retrieval convergence problem and how to assign a B to each model location (i.e. outside the islands) will need an answer. The simplest approach is probably to perform a blending between the LEM, GEM and NMC approaches, so that for each grid point j, a combination such as

  • equation image

where α ∈ [0;1] ∈ [0;1] are tuned appropriately, could be used in the retrieval process. Nonetheless, the aim of this work is to study the specific statistical properties of selective model background-error sampling and, for this reason, aspects connected to a full operational optimization are left to future work.

3.4.  Quality of the 1D-Var retrievals

The absolute quality of the retrieval procedure is difficult to quantify. Nevertheless, a consistency check is obtained by looking at the effect of the 1D-Var analysis to come closer to prescribed observations. A measure is provided by the fraction of observed root mean square error (RMSE) variability corrected by the analysis and defined as:

  • equation image(5)

where RMSE(BTobsBTb) and RMSE(BTobsBTa) are the root mean square errors of the background and analysis departures, respectively. If F = 1 then the 1D-Var analysis produces brightness temperatures which correlate perfectly with the observed observations; if F = 0 then the effect of 1D-Var is null; if it is negative then the 1D-Var analysis procedure degrades the background. Figure 10 shows this factor for two of the five SEVIRI channels: the WV channels at 6.2 μm which sounds the high troposphere and the WV continuum, and one of the window channels at 10.8 μm which is very sensitive to surface temperature.

thumbnail image

Figure 9. Number of retrievals which produce valid temperature and humidity profiles from the convergence tests of the 1D-Var algorithm. Only observations inside islands are considered for the three experiments.

Download figure to PowerPoint

thumbnail image

Figure 10. The variance reduction calculated from Eq. (5) for the water vapour channel 6.2 μm and infrared channel 10.8 μm brightness temperatures for the three methods over the whole experimental period. Only observation inside defined islands are used for all three experiments.

Download figure to PowerPoint

The fraction F is overall positive for all three experiments, but while it shows little variability for 10.8 μm, a substantial improvement in the WV channel can be observed. There are two factors that have to be taken into account to explain these discrepancies. Firstly, the localization procedure by construction picks up those regions with large errors at levels which are mostly influenced by the WV channels. This would explain for example the larger impact on this channel of the LEM background error. Secondly, at high levels the use of mean error profiles, as in the NMC and GEM cases, can produce unrealistic WV increments which can be detrimental to the quality of the analysis. To summarize, we should expect a better analysis of the upper humidity field when using the LEM method.

Clearly this is only an internal 1D-Var coherency test; the real quality assessment of the methodology cannot avoid comparisons to independent observations of the forecast field. In the next section, the impact of the assimilation of the retrieved profiles will therefore be assessed in full 3D experiments.

4.  Forecast experiments

  1. Top of page
  2. Abstract
  3. 1.  Introduction
  4. 2.  Modeling tools
  5. 3.  The ‘island’ localization method
  6. 4.  Forecast experiments
  7. 5.  Conclusions
  8. Acknowledgements
  9. References

Three-dimensional forecast experiments were conducted for the eight days under test using the modification to the COSMO nudging scheme introduced by Di Giuseppe. The experiments consisted of a 24 h assimilation cycle plus another 24 h of free forecast. Initial and boundary conditions are provided by the ECMWF, IFS model.

The 1D-Var retrievals were generated during the assimilation cycle employing Bs provided by one of the three methods, LEM, GEM and NMC. These profiles were then ingested as ‘pseudo’ radiosondings, assuming a lower weight than the real TEMP observations. The optimization of the nudging coefficients has been performed in Di Giuseppe. The horizontal spread of the resulting 1D-Var increments from these retrievals is set to around 15 km at 850 hPa and 10 km at 500 hPa. In all assimilation experiments, only profiles which fall into selected islands are used. In this way, a fair comparison is performed and differences in the forecast scores can be connected in a clear way to the number and quality of the retrievals obtained by the three methods.

Forecast scores are calculated for precipitation by comparison to an independent satellite product, for surface variables using the > 500 surface stations of the SYNOP network, and for upper-air variables against ECMWF analyses.

4.1.  Precipitation

An estimate of the impact in the precipitation during the first 24 h of free forecast for the three experiments is performed against the Climate Prediction Center Morphing technique (CMORPH) satellite product, which produces global precipitation analyses at very high spatial and temporal resolution (Joyce et al., 2004). This dataset uses precipitation estimates that have been derived from low-orbiter satellite microwave observations exclusively, and whose features are transported via spatial propagation information that is obtained entirely from geostationary satellite infrared data. The dataset has a nominal resolution of 0.25°, therefore forecast scores have been calculated over averaged boxes of 20 km size. During the first few days of the experimental period, the north of Italy experienced convective activity which produced spots of precipitations over the eastern Alps and along the Croatian coast. Figure 11 shows the 24 h cumulative precipitation skill scores for the three assimilation experiments. The frequency bias score indicates whether the forecast system has a tendency to under-forecast (< 1) or over-forecast (> 1) events, but does not give an estimation of the accuracy of the forecast. A measure of the forecast accuracy is instead provided by the threat score which measures the fraction of observed and/or forecast events that were correctly predicted. The frequency bias score > 1 shows that COSMO-I7 tends to overestimate precipitation, especially for high thresholds (> 20 mm). This explains the high false-alarm rate which, for intense events, can reach as much as 50% of the forecasted cases. This overestimation is slightly improved in the LEM experiment, probably as a consequence of the improved humidity analysis provided by this assimilation system.

thumbnail image

Figure 11. 24 h cumulative precipitation skill scores for the three assimilation experiments.

Download figure to PowerPoint

4.2.  Surface variables

Surface (2 m) temperatures from the SYNOP network are not assimilated in the COSMO-I7 analysis cycle. Therefore they are used for verification. Bias and root-mean-square errors have been computed with respect to observations from the SYNOP network (about 500 surface stations) over the whole model domain. Figures 12 and 13 show domain mean bias and RMSE for the analyses computed at 0000 UTC and for the +12 h forecast for dry bulb and dew-point temperatures. The biases in temperature and humidity confirm well-known problems of the COSMO model: warm and dry night-time biases and cold and wet daytime biases. The first prevents the establishment of stable stratified planetary boundary-layer conditions, the latter inhibits daytime mixing with delay in the triggering of local convection. The daily temperature cycle is therefore too weak. The causes of these errors have been clearly identified in deficiencies in the soil temperature and humidity initialization (Di Giuseppe et al., 2010) and are therefore difficult to correct in this context. In fact, the assimilation of 1D-Var profiles is confined to sea areas and analysis increments are applied to upper-air variables. Nevertheless, the three experiments do show some differences in terms of bias especially in terms of moisture which is correctly reduced during daytime and increased during night-time in the LEM and GEM methods. The NMC produces a smaller RMSE, probably due to the larger amount of data ingested.

thumbnail image

Figure 12. Bias and RMSE relative to the 2 m dry-bulb temperatures obtained for the eight days considered, at 0000 UTC (ANALYSIS) and at 1200 UTC (+12 h forecast). The lowest panel shows the total number of SYNOP stations used in the comparison.

Download figure to PowerPoint

thumbnail image

Figure 13. As Figure 12, but for the dew-point temperature.

Download figure to PowerPoint

Therefore, even if eight days of statistical analysis cannot be considered robust enough to draw general conclusions, the overall small positive to neutral impact on the analysis/forecast is encouraging.

4.3.  Upper-air variables

Considering the small domain of COSMO and the limited number of TEMP stations, the upper-air verification is performed against ECMWF analyses. Figure 14 shows domain mean bias and RMSE for the analyses computed at 0000 UTC and for the +12 h forecast for the temperature and relative humidity fields at selected vertical levels. Errors are smaller than in Figures 12 and 13 because of the incomplete independence of the two models. The assimilation of different 1D-Var profiles in the three experiments produces a small improvement in the bias while causing a degradation of RSME. Since forecast scores often depend on various factors (e.g. the length and weather regime of the considered period, and the number, location, spatial resolution and weight of the assimilated observations), it is hoped that the expected operational tuning of the methodology could also be beneficial in this respect.

thumbnail image

Figure 14. Vertical profiles of bias and root mean square error (RMSE) of COSMO versus ECMWF analysis for (a) temperature and (b) relative humidity fields averaged for the eight days considered, at 0000 UTC (ANALYSIS, thin lines) and at 1200 UTC (+12 h forecast, bold lines).

Download figure to PowerPoint

5.  Conclusions

  1. Top of page
  2. Abstract
  3. 1.  Introduction
  4. 2.  Modeling tools
  5. 3.  The ‘island’ localization method
  6. 4.  Forecast experiments
  7. 5.  Conclusions
  8. Acknowledgements
  9. References

This paper tries to address the problem of characterizing the spatial components of model background-error covariance matrices for the purposes of data assimilation by proposing a method for the geographical selection of areas where error statistics are likely to be homogeneous and different from climatology. To this end, a simple idea is followed, that is to use the post-processed products of an ensemble forecasting system, COSMO-SREPS (Marsigli et al., 2006) to identify ‘islands’ of uniform error, as classified by means of a metric which takes into account both the homogeneity and size of the ensemble spread and the observation location. Error covariance matrices are then calculated locally over the specified islands. The starting assumption is thus that B matrices calculated from local statistics over the selected islands as opposed to global domain-averaged ones are more representative of actual background errors in those regions. The consequent observation assimilation which uses those local B matrices is able to produce analysis increments representative of the local-scale processes since the information is contained in the columns of the B matrix.

The new methodology (called the Local Ensemble Method –LEM) has been compared with commonly used background-error models such as the NMC method and a GEM similar to Houtekamer et al. (1996), Fisher (2003), and Pereira and Berre (2006) which use global or climatological statistics. Then the LEM has been tested in a 1D-Var system designed for the assimilation of temperature and humidity profiles from satellite radiances into the regional COSMO model (Di Giuseppe et al., 2009). Statistics calculated over a period of eight days starting on 8 August 2007 showed that LEM produces variance profiles characterized by spiky structures when compared to the other two methods which employ global statistics. This shows that the new approach is able to effectively reproduce background covariances characteristic of the smaller scales. As a consequence, better retrieved profiles were produced, as was demonstrated by the reduction of the background variance provided by the analysis. Nevertheless, using the LEM approach has the drawback that a smaller number of retrieved profiles are effectively considered good enough to be assimilated. This is not found to be detrimental to the overall quality of the produced analysis and of the consequent forecasts, which were verified against independent observations. In particular, bias and RMSE of the 2 m temperature fields, computed over the whole domain, showed a small but positive impact of the LEM methodology.

The final aim of this paper was to verify the relevance of using local statistics to characterize local sources of error, and the localization has been tested in a simplified 1D context where only vertical error correlations are needed. Nevertheless, given the results, the extension of the presented idea to 3D-Var systems is certainly an option. However, this would require important technical problems to be faced such as, for example, how to handle the transition between one island and another while retaining the needed 3D structure of the B matrix. A foreseen solution could be to extend the concept of ‘island’ to a local smoothing/deformation of mean structure functions. In this case, the B matrix would be a merger between a component coming from global statistics and opportunely selected Bs from identified regions. How this can be achieved and the general feasibility of this extension is left to future investigations.

Acknowledgements

  1. Top of page
  2. Abstract
  3. 1.  Introduction
  4. 2.  Modeling tools
  5. 3.  The ‘island’ localization method
  6. 4.  Forecast experiments
  7. 5.  Conclusions
  8. Acknowledgements
  9. References

The regional model COSMO is developed within the COSMO Consortium; full details can be found on http://www.cosmo-model.org. The operational suite COSMO-I7 is managed by ARPA-SIMC through a three-party agreement, signed by the Meteorological Office of the Italian Air Force (USAM), ARPA-Piemonte and ARPA-SIMC. SEVIRI data are provided by EUMETSAT and licensed by USAM. Reinhold Hess, Christoph Schraff, Blazej Krzeminski and Marco Elementi contributed to the implementation of the 1D-Var system in the COSMO model. The INRIA (Institut National de Recherche en Informatique et Automatique) provided the M1QN3 minimization code. Thanks are due to Frédéric Chevallier for providing the core 1D-Var algorithm. This paper has finally benefited from the many suggestions provided by three anonomous reviewers to whom we are sincerely in debt. This work is part of the activities of ARPA-SIMC funded by the Italian National Department for Civil Protection (Protezione Civile Nazionale, Roma).

References

  1. Top of page
  2. Abstract
  3. 1.  Introduction
  4. 2.  Modeling tools
  5. 3.  The ‘island’ localization method
  6. 4.  Forecast experiments
  7. 5.  Conclusions
  8. Acknowledgements
  9. References
  • Bannister RN. 2008. A review of forecast-error covariance statistics in variational data assimilation. I: Characteristics and measurements of forecast-error covariances. Q. J. R. Meteorol. Soc. 134: 19511970.
  • Berre L, Pannekoucke O, Desroziers G, Stefanescu S, Chapnik B, Raynaud L. 2007. ‘A variational assimilation ensemble and the spatial filtering of its error covariances: Increase of sample size by local spatial averaging’. In: Proceedings of the workshop on flow-dependent aspects of data assimilation. ECMWF: Reading, UK. 1113.
  • Bouttier F. 1994. A dynamical estimation of forecast error covariances in an assimilation system. Mon. Weather Rev. 122: 23762390.
  • Daley R. 1993. Atmospheric data analysis. Cambridge University Press: Cambridge, UK.
  • Di Giuseppe F, Cesari D, Bonafé G. 2010. Soil initialization strategy for use in mesoscale weather prediction systems. Mon. Weather Rev. submitted.
  • Di Giuseppe F, Elementi M, Cesari D, Paccagnella T. 2009. The potential of variational retrieval of temperature and humidity profiles from Meteosat Second Generation observations. Q. J. R. Meteorol. Soc. 135: 225237.
  • Evensen G. 2003. The ensemble Kalman filter: Theoretical formulation and practical implementation. Ocean Dyn. 53: 343367.
  • Fisher M. 2003. ‘Background-error covariance modelling’. In: Proceedings of seminar on recent developments in data assimilation for atmosphere and ocean. ECMWF: Reading, UK. 4563.
  • Gilbert JC, Lemaréchal C. 1989. Some numerical experiments with variable-storage quasi-Newton algorithms. Math. Programming 45: 407435.
  • Hamill T. 2006. Ensemble-based atmospheric data assimilation. In Predictability of Weather and Climate. Palmer TN, Hagedorn R. (eds.) Cambridge University Press: Cambridge, UK. 124156.
  • Harris BA, Kelly G. 2001. A satellite radiance-bias correction scheme for data assimilation. Q. J. R. Meteorol. Soc. 127: 14531468.
  • Houtekamer P, Mitchell H. 1998. Data assimilation using an Ensemble Kalman Filter technique. Mon. Weather Rev. 126: 796811.
  • Houtekamer P, Lefaivre L, Derome J, Ritchie H, Mitchell H. 1996. A system simulation approach to ensemble prediction. Mon. Weather Rev. 124: 12251242.
  • Joyce R, Janowiak J, Arkin PA, Xie P. 2004. CMORPH: A method that produces global precipitation estimates from passive microwave and infrared data at high spatial and temporal resolution. J. Hydrometeorol. 5: 487503.
  • Kain J, Fritsch J. 1990. A one-dimensional entraining/detraining plume model and its application in convective parameterization. J. Atmos. Sci. 47: 27842802.
  • Lindskog M, Vignes O, Gustafsson N, Landelius T, Thorsteinsson S. 2007. ‘Background errors in HIRLAM variational data assimilation’. In: Proceedings of Workshop on Flow-dependent Aspects of Data Assimilation, 11–13 June 2007. ECMWF: Reading, UK.
  • Marsigli C, Montani A, Paccagnella T. 2006. ‘The COSMO-SREPS project’. In: Newsletter of the 28th EWGLAM and 13th SRNWP meetings. SRNWP, 912.
  • Montmerle T, Lafore J, Berre L, Fischer C. 2006. Limited-area model error statistics over Western Africa: Comparisons with midlatitude results. Q. J. R. Meteorol. Soc. 132: 213230.
  • Ott E, Hunt B, Szunyogh I, Zimin A, Kostelich E, Corazza M, Kalnay E, Patil D, Yorke J. 2004. A local ensemble Kalman filter for atmospheric data assimilation. Tellus A 56: 415428.
  • Pannekoucke O, Berre L, Desroziers G. 2008. Background-error correlation length-scale estimates and their sampling statistics. Q. J. R. Meteorol. Soc. 134: 497508.
  • Parrish DF, Derber JC. 1992. The National Meteorological Center's spectral statistical-interpolation analysis system. Mon. Weather Rev. 120: 17471763.
  • Pereira M, Berre L. 2006. The use of an ensemble approach to study the background error covariances in a global NWP model. Mon. Weather Rev. 134: 24662489.
  • Schmetz J, Pili P, Tjemkes S, Just D, Kerkmann J, Rota S, Ratier A. 2002. An introduction to Meteosat Second Generation (MSG). Bull. Amer. Meteorol. Soc. 83: 977992.
  • Široká M, Fischer C, Cassé V, Brožková R, Geleyn J. 2003. The definition of mesoscale selective forecast-error covariances for a limited-area variational analysis. Meteorol. Atmos. Phys. 82: 227244.
  • Steppeler J, Doms G, Schättler U, Bitzer HW, Gassmann A, Damrath U, G G. 2003. Meso-gamma scale forecasts using the non-hydrostatic model LM. Meteorol. Atmos. Phys. 82: 7596.
  • Tiedtke M. 1989. A comprehensive mass flux scheme for cumulus parameterization in large-scale models. Mon. Weather Rev. 117: 17791800.
  • Zappa M, Rotach M, Arpagaus M, Dorninger M, Hegg C, Montani A, Ranzi R, Ament F, Germann U, Grossi G, Jaun S, Rossa A, Vogt S, Walser A, Wehrhan J, Wunram C. 2008. MAP D-PHASE: Real-time demonstration of hydrological ensemble prediction systems. Atmos. Sci. Lett. 9: 8087.
  • *

    COSMO-SREPS is an ensemble system for the short range developed within the COnsortium for Small-scale MOdelling (COSMO; http://www.cosmo-model.org; Marsigli et al., 2006).

  • These runs are extracted from the Spanish Meteorological Agency (AEMET) Short-Range Ensemble Prediction System (SREPS).

  • COSMO-I7 is the ARPA-SIMC implementation of the COSMO model.

  • §

    The weighting function provides information on which regions of the atmosphere are affecting the satellite measurements. It is determined by the absorption properties of gases, their vertical concentration, temperature and pressure.