Diagnostic and impact studies of a wavelet formulation of background-error correlations in a global model

Authors


Abstract

A wavelet formulation on the sphere is being considered for modelling heterogeneous background-error correlations for the Météo-France global numerical weather prediction (NWP) model. This approach is compared with the operational spectral formulation, which is horizontally homogeneous to a large extent. Diagnostic studies have been conducted to examine geographical variations of three-dimensional correlations over the whole globe. Results indicate that the contrast between relatively broad horizontal correlations in the Tropics and sharp ones in midlatitudes is well represented by the wavelet formulation. Heterogeneities in vertical correlations are also better captured in the wavelet approach than in the spectral one, with visible changes as functions of e.g. latitude and land/sea contrasts. In addition, wavelet-based correlation estimates are shown to be partly sensitive to the choice of the calibration period. The impact of the wavelet formulation on the forecast quality has been investigated during a three-week calibration period, and also during the following three weeks. While the impact of the wavelet formulation is globally positive during the two periods, it tends to be more spectacular during the calibration time interval, as expected. These results indicate that an on-line calibration should be considered in the future, in order to exploit fully the ability of wavelets to extract correlation heterogeneities from ensemble data. Copyright © 2011 Royal Meteorological Society

1. Introduction

Data-assimilation methods based on variational or Kalman-filter techniques are widely used to determine initial states for numerical weather prediction (NWP). This is done by combining direct and indirect observations of meteorological fields, as well as a background (a priori) state corresponding to a short-range forecast. Specifying realistic background-error covariances is particularly crucial for correctly weighting and propagating observed information (Bannister, 2008). In a typical NWP system, the state vector currently has a dimension of about 108, leading to a background-error covariance matrix (denoted by B) containing about 1016 elements. It is thus impossible to compute such a matrix explicitly, firstly because it cannot be stored in current computers and secondly because of a lack of available information (Dee, 1995). In variational systems, a solution is to model the B matrix in the form of sparse operators applied sequentially. In particular, a spectral block-diagonal approach is convenient to represent spatial covariances and scale dependences in an economical way (Rabier et al., 1998; Derber and Bouttier, 1999). However, it relies on a horizontal homogeneity assumption, which prevents the representation of horizontal and temporal variations of covariances (Berre and Desroziers, 2010). This can be relaxed partly (Fisher, 2003) by representing local standard deviations in grid-point space and spatial correlations in wavelet space, and by using flow-dependent balance equations. Such a model is now commonly calibrated by using data provided by ensemble variational assimilation experiments (Fisher, 2003; Belo Pereira and Berre, 2006). The calibration is often performed off-line to obtain a climatological estimate of B, by using an ensemble experiment carried out over a period of a few weeks. However, real-time ensemble variational assimilation systems are also being developed, and they have been running operationally at Météo-France since 2008 (Berre et al., 2007; Berre et al., 2009) and also at the European Centre for Medium-Range Weather Forecasts (ECMWF) since 2010 (Bonavita et al., 2011). Such real-time ensembles are currently used to calculate flow-dependent background-error standard deviations, and also to provide perturbed initial states for ensemble prediction systems. Researchers are also considering, in future, the use of such real-time ensembles to calculate flow-dependent correlations in wavelet space for instance. However, the first operational implementations such as at ECMWF are currently restricted to climatological correlation estimates in wavelet space. Moreover, these climatological wavelet-based correlations have not been much documented yet, not only in terms of diagnostics regarding associated geographical variations (in particular for vertical correlations) but also in terms of impact on the forecast quality. The goal of this article is thus to document the wavelet representation of geographical variations of three-dimensional (3D) correlations in a global NWP context, and also their impact on forecast quality.

The structure of the article is as follows. In section 2, the formulation of B as a sequence of sparse operators is presented, with either spectral- or wavelet-based correlations. In section 3, a diagnostic comparison between spectral and wavelet approaches is shown, with regards to 3D correlations and sensitivity to the choice of calibration period. Section 4 is an evaluation of the impact of the wavelet B matrix on the global model forecast quality for two different periods. Finally, conclusions are given in section 5.

2. Spectral and wavelet formulations of the B matrix

2.1. Formulation of B using sparse matrices and spectral modelling of correlations

The background-error covariance matrix B in variational systems typically includes 3D auto- and cross-covariances of vorticity, divergence, temperature, logarithm of surface pressure and specific humidity. The approach currently employed for modelling B is to decompose it as a sequence of sparse operators:

equation image(1)

K is the so-called balance operator, which accounts for cross-covariances between the mass and wind variables. It is constructed from a combination of flow-dependent balance equations (such as the non-linear balance equation and the quasi-geostrophic omega equation (Fisher, 2003) and linear regressions between mass and wind variables (Derber and Bouttier, 1999; Berre and Desroziers, 2010). Linear regressions are calculated in a scale-dependent way, through calculation of block-diagonal cross-covariances in spectral space. Σ is a diagonal matrix of local background-error standard deviations in grid-point space and C is a 3D correlation matrix.

In the operational implementation at Météo-France, the correlation matrix is calculated as a block-diagonal matrix in spectral space, corresponding to height- and scale-dependent 3D covariances of normalized spectral coefficients of background error. These height and scale dependences allow two non-separable features to be represented, namely the increase of horizontal length-scales with height and sharper vertical correlations for small horizontal scales (Rabier et al., 1998). On the other hand, this spectral block-diagonal approach is associated with an assumption of horizontal homogeneity (Courtier et al., 1998), which implies that 3D spatial correlation functions are implicitly calculated from a global spatial average of local correlation functions (Pannekoucke et al., 2007; Berre and Desroziers, 2010).

Linear regressions, local standard deviations and 3D correlations can be calibrated from data provided by an ensemble variational experiment (Fisher, 2003; Belo Pereira and Berre, 2006). Moreover, as shown for example in figure 6 of Fisher et al. (2005) and in figure 6 of Raynaud et al. (2009), globally averaged covariances are relatively static in time and therefore are not very sensitive to the choice of calibration period (Derber and Bouttier, 1999, end of their section 2). The spectral part of the matrix B is thus usually calibrated off-line from a 5–10 member ensemble run over a past period of a few weeks. Hence, although it restricts the representation of flow-dependent covariances, this static property of the spectral formulation gives some technical flexibility in the operational implementation of this formulation, in the sense that the calibration does not need to be redone at each analysis time.

2.2. Spherical wavelet expansion of the background-error field

In order to represent heterogeneous 3D correlation functions, a wavelet block-diagonal formulation of C has been proposed by Fisher (2003), which is currently operational in the ECMWF variational system and which has been used in this study. This first subsection describes the spherical wavelets used in this study, while the associated wavelet formulation of the correlation matrix is presented in the next subsection. The direct and inverse wavelet transforms of the background-error field ε are defined by convolution products (denoted by ⊗) with radial band-pass functions ψj for different scales j, corresponding to these two equations with respect to the wavelet field equation image of background error:

equation image(2)
equation image(3)

These wavelet functions ψj are band-limited and defined in spectral space. Convolutions in Eqs (2) and (3) can be calculated in the form of products in the space of spherical harmonics (Courtier et al., 1998; Fisher, 2004). The spherical harmonic coefficients of ε can be denoted equation image, where m and n are respectively the zonal and total wave numbers on the sphere. As shown in eqs (17), (27) and (32) of Fisher (2004), the convolution associated with Eq. (2) corresponds to the following product in spectral space:

equation image(4)

where for (Nj)j∈[0,J] with Nj< Nj+1 the spectral coefficients Ψj(m,n) are given by, for j≠0:

equation image(5)

Here equation image are spherical harmonic coefficients of ψj for m = 0. For j = 0 the definition is the same, except that the range Nj−1nNj is replaced by 0 ≤ nN0, for which Ψj(m,n) = 1. These coefficients are considered for the following set of (Nj)j∈[0,J] as in Fisher (2003): {Nj}={0, 1, 2, 3, 5, 7, 10, 15, 21, 31, 47, 63, 95, 127, 159, 215, 255, 319, 399}. Wavelet functions can also be represented in grid-point space, as shown in Figure 1 in a 1D circular context (see also figures 9 and 10 of Fisher (2003)). Figure 1 illustrates the fact that each wavelet function has both a specific position and a specific scale. The associated coefficients equation image of the transformed error field correspond to error values at scale j and position xj(i) on a grid, the resolution of which depends on j (with i = 1,...,Nx(j)).

Figure 1.

Examples of wavelet functions equation image on a 1D circle, for different scales j and different positions xj(i) located on the grid of scale j (from Pannekoucke et al., 2007).

2.3. Wavelet formulation of the correlation matrix

The wavelet-correlation model is calibrated from ensemble data by projecting the normalized grid-point background-error field ε′ = Σ−1ε to wavelet space (where Σ is the background-error standard deviation in grid-point space), followed by the calculation of the block-diagonal matrix VW, which contains vertical covariances of each wavelet component of ε′. Note that in this matrix VW each diagonal block corresponds to vertical covariances for a given wavelet scale and location, whereas the covariances between different horizontal scales and between different horizontal grid points of the wavelet grids are both neglected (i.e. these are the off-diagonal blocks that are not computed). The wavelet-based correlation model CW thus corresponds to

equation image(6)

where W−1 is the inverse wavelet transform. As wavelet functions are localized in both spectral and grid-point spaces, wavelet coefficients contain information about both scale and position. Typically, small-scale wavelet coefficients of background error tend to have a larger amplitude (i.e. variance) in regions where correlation functions are sharper. In other words, the diagonal part of VW contains information about geographical variations of local horizontal correlation functions. Similarly, the off-diagonal part of the diagonal blocks of VW contains information about heterogeneities of vertical correlations. Another appealing property of wavelets is that, as shown by Pannekoucke et al. (2007), wavelet-based correlation functions cW(x,s) can be seen as local spatial averages of raw ensemble-based local correlation functions c(x,s′):

equation image(7)

where x is a reference position and s is the separation distance considered (from x). c(x,s′) is the raw correlation function for the reference position x′ calculated from the ensemble data and ϕ is a weighting function, which gives more weight to values (x,s′) that are close to (x,s). As discussed in Pannekoucke et al. (2007) and Berre and Desroziers (2010), this property implies that the wavelet formulation allows sampling noise to be reduced, through spatial filtering. On the other hand, because wavelet-based correlations correspond to local spatial averages (instead of a global spatial average in the spectral formulation) they are likely to be more sensitive to the choice of calibration period. This aspect will be studied in sections 3 and 4, by comparing wavelet-based correlations calibrated over two different periods and also by comparing the impact of the wavelet formulation on the forecast quality over two different periods.

2.4. Experimental ensemble data

In order to compare spectral and wavelet approaches, these two formulations of B have been calibrated from common data provided by an ensemble four-dimensional variational analysis (4D-Var) experiment, which has been run between 15 February and 7 March 2010 (21 days, with four analysis networks for each day, leading to a total number of networks equal to Nt = 84). This ensemble is similar to the current Météo-France operational ensemble 4D-Var version (Berre et al., 2007; Berre et al., 2009). It uses explicit perturbations of observations and implicit perturbations of the background, provided by the perturbation evolution during the data-assimilation cycling (Houtekamer et al., 1996; Fisher, 2003; Belo Pereira and Berre, 2006). The ensemble has been conducted in the Arpege system (Courtier et al., 1991), with N = 6 ensemble members, 70 vertical levels and a spectral truncation equal to T399 in the non-linear forecast model part and T107 in the inner loop of the minimization. The same wide range of data types is used as in the deterministic system: surface observations, aircraft data, sea-surface observations (e.g. drifting buoys, ship reports), in situ sounding data, wind-profiler radar data, geostationary satellite winds (atmospheric motion vectors), Global Positioning System (GPS) ground-based data and radiances from polar-orbiting and geostationary satellites (e.g. AMSU-A/B, AIRS, SSMI, IASI, SEVIRI). The diagnostic study of spectral and wavelet formulations has been made for both horizontal and vertical correlations, which are compared with Cens,clim, namely raw ensemble-based correlation estimates (considered as a reference) derived from Bens,clim, the raw ensemble-based climatological estimate of B:

equation image(8)
equation image(9)

where equation image is the difference between the background state equation image of member k(= 1,...,N) at time t(= 1,...,Nt) and the ensemble mean equation image. The sensitivity to the choice of the calibration period has been studied by looking at diagnostic results obtained with ensemble data corresponding to a subsequent three-week period, namely 8–28 March 2010.

3. Diagnostic study of wavelet-based 3D correlation heterogeneities

In this section, a diagnosis of correlation heterogeneities represented by the wavelet formulation, when applied to an Arpege ensemble assimilation dataset, is presented. While previous diagnostic results such as in Fisher (2003) have focused on 3D correlations for two examples of local grid points, the present study will include a diagnosis of 3D correlations over the whole globe. Moreover, while Pannekoucke et al. (2007) show maps of horizontal length-scales for surface pressure, results for horizontal length-scales of wind and vertical correlations of temperature will be presented here. Finally, the sensitivity to the choice of the calibration period is also investigated at the end of this section.

3.1. Horizontal correlation length-scales

One way to diagnose how horizontal correlations vary over the globe is to compute local correlation length-scales Belo Pereira and Berre (2006), Pannekoucke et al. (2008) and to plot them as a geographical field. For instance, by analogy with eq. (4) in Belo Pereira and Berre (2006), a local length-scale of wind can be defined from respective local variances of zonal wind u, meridional wind v and their spatial derivatives, namely vorticity ζ and divergence η, as follows:

equation image(10)

where σ2(.) is the local variance for a given variable. These length-scale geographical fields can be calculated from specified spectral- and wavelet-based formulations of B by using a randomization technique (Andersson et al., 2000), and also directly from the ensemble 4D-Var experiment. As discussed in Fisher and Courtier (1995), the randomization technique allows e.g. background-error variances to be diagnosed for variables such as u and v, which are not explicitly specified in the B formulation considered (which is based on vorticity and divergence instead, as mentioned in section 2.1). In the case of the spectral formulation of B, as expected, the length-scale field is horizontally homogeneous and its values Lspec(u,v) are nearly equal to 101 km at a level near 500 hPa. Wavelet-based and ensemble-based length-scale fields are shown as global maps in Figure 2 and in terms of latitudinal profiles in Figure 3. Note that, in order to reduce sampling noise effects in the diagnosis, variance fields have been spatially filtered by using eq. (15) of Raynaud et al. (2009) with Ntrunc = 42 before calculating the associated length-scale field. Figure 2(a) indicates that ensemble-based length-scales are larger in the Tropics than in the midlatitudes (in accordance with e.g. Ingleby (2001)), with the largest values located in tropical oceanic areas. Moreover, length-scale values tend to be relatively small over Europe and the Himalayas, reflecting effects of data density and orography respectively. These complex latitudinal and geographical variations in the length-scale field are well captured by the wavelet formulation of B (Figures 2(b) and 3), although the range of minimum and maximum length-scale values is less wide than in the raw ensemble data. This is a first illustration of the ability of the wavelet formulation to represent horizontal heterogeneities in horizontal correlation functions.

Figure 2.

Horizontal correlation length-scales (in km) of wind near 500 hPa, diagnosed respectively from (a) raw ensemble data over the three-week calibration period and (b) the correspondingly calibrated wavelet formulation of B. The corresponding length-scales for the spectral formulation are nearly homogeneous and equal to 101 km (not shown). This figure is available in colour online at wileyonlinelibrary.com/journal/qj

Figure 3.

Horizontal correlation length-scales (in km) of wind near 500 hPa, represented as a function of latitude, diagnosed from raw ensemble data over the three-week calibration period (thin solid line), the correspondingly calibrated wavelet formulation (bold solid line) and the correspondingly calibrated spectral formulation (dashed line). Length-scale values are shown on the y-axis and the latitude is shown on the x-axis. This figure is available in colour online at wileyonlinelibrary.com/journal/qj

3.2. Horizontal correlation functions diagnosed from single-observation experiments

Another way to diagnose specified local correlation functions is to carry out single-observation assimilation experiments. This has been done by using first a wind observation located in the Tropics (0°N, 120°W) and then a wind observation in the midlatitudes (50°N, 10°E). Both observations are situated at pressure level 500 hPa, and the considered wind innovation is purely zonal in both cases. The analysis increments provided by the spectral and wavelet formulations of B are shown in Figure 4. These analysis increments have been normalized by their amplitude at the observation point, so that the plots can be considered as an approximate diagnosis of specified correlation functions under the assumption that variances are locally homogeneous. Due to this approximation, in the midlatitude example, the apparently larger values in the northeast direction than in the southwest one are likely to reflect variance heterogeneities that are not accounted for in the homogeneous normalization of analysis increments considered. Note also that local variances are similar, although not exactly identical, in the spectral and wavelet formulations, as shown in Fisher (2004) for instance.

Figure 4.

Analysis increments for zonal wind near 500 hPa obtained by assimilating one observation of zonal wind at 500 hPa. The results correspond to the use of, respectively, ((a), (c)) a spectral formulation of B and ((b), (d)) a wavelet formulation of B. Two observation points have been considered: ((a), (b)) one located at (50°N, 10°E) and ((c), (d)) another located at (0°N, 120°W). This figure is available in colour online at wileyonlinelibrary.com/journal/qj

The comparison between the normalized increments in the two different locations allows the representation of correlation heterogeneities to be compared between the two formulations. It can be noted that specified horizontal correlation functions are different in the two regions, not only for the wavelet formulation but also for the spectral one. This is partly due to the specification of geographically varying background-error variances of vorticity and divergence. In the midlatitudes the vorticity variance is larger than the divergence variance, which tends to make the zonal wind correlation function somewhat elongated in the zonal direction (Daley, 1985). In this midlatitude example, the combination of this zonal elongation and variance heterogeneities leads to a specific anisotropy in the normalized increment field. In the Tropics, vorticity and divergence have similar variance values, which makes the zonal wind correlation function nearly isotropic. Moreover, the wavelet-based correlation functions exhibit differences from the spectral-based correlation functions which are consistent with Figure 2: in the midlatitudes the correlation function is sharper in the wavelet formulation than in the spectral one, whereas the reverse holds in the Tropics (i.e. the tropical correlation function is broader in the wavelet formulation).

3.3. Geographical variations of vertical correlations

The wavelet approach is likely to modify not only horizontal correlations but also vertical correlations. The randomization technique has been applied to spectral and wavelet formulations, in order to diagnose specified local vertical correlations of temperature between model levels near 250 and 300 hPa for each point of the globe (Figure 5). Correlation values calculated directly from the ensemble are also presented in Figure 5(a). According to this figure, correlations tend to be negative in the Tropics and also in the western part of the USA for instance, whereas in the midlatitudes correlations tend to be positive and to have a larger amplitude (up to 0.6). These changes as function of latitude are likely to arise from changes in the tropopause height for instance, as discussed by McNally (2000). As shown in Derber and Bouttier (1999, their figures 12 and 13), the dependence of the mass/wind balance operator on the Coriolis parameter f implies that the spectral formulation is expected to represent some latitudinal variations in the vertical correlations of temperature. This latitudinal dependence can be made explicit (Bouttier et al., 1997, p 29) by the f-plane approximated expansion of CTPs, the auto-covariance matrix of temperature and surface pressure (denoted by TPs):

equation image(11)

where Δ is the Laplacian operator, N and P are linear regression matrices and equation image are the respective auto-covariance matrices of vorticity ζ, unbalanced divergence ηu and the unbalanced part of TPs. This expected latitudinal dependence in the spectral formulation is confirmed by Figure 5(b), but it also appears that the amplitude of tropical negative correlations is underestimated compared with Figure 5(a). This can be compared with wavelet-based vertical correlation values, which are shown in Figure 5(c). The spatial extension and amplitude of negative correlations, e.g. in the Tropics and the eastern part of the USA, appears to be better represented than in the spectral formulation.

Figure 5.

Local vertical correlations between 250 and 300 hPa for temperature. Correlations are (a) calculated directly from the ensemble, (b) diagnosed from the spectral formulation and (c) diagnosed from the wavelet formulation. This figure is available in colour online at wileyonlinelibrary.com/journal/qj

3.4. Sensitivity to the choice of calibration period

As discussed in section 2.1, globally averaged covariances are relatively static in time and therefore spectral-based correlation estimates are not very sensitive to the choice of calibration period (Derber and Bouttier, 1999). Conversely, this potential sensitivity is expected to be larger for wavelet-based correlation estimates, as they can be seen as local spatial averages of raw correlations. This aspect has been studied by calculating wavelet-based correlation estimates during a second three-week calibration period (8–28 March 2010) that is subsequent to the previous one (15 February–7 March 2010). The results for this other calibration period are illustrated in terms of wavelet-implied length-scales in Figure 6(a), which can be compared with Figure 2(b). Similar results are obtained when comparing raw length-scales between the two periods (not shown). On the one hand, large-scale geographical variations look broadly similar between the two periods, for instance with larger length-scales over tropical oceans than in midlatitudes. On the other hand, however, noticeable differences can be seen regionally, for instance regarding spatial variations between the North Atlantic and North America. In the first period (15 February–7 March), the North Atlantic is affected by relatively small length-scales connected to cyclogenic processes with strong horizontal gradients, whereas larger length-scales can be seen over Canada. This spatial contrast is much less marked during the second period (8–28 March), since Atlantic length-scales tend to be larger than in the first period in connection with reduced cyclonic activity during this second period. Actually, the spatial contrast is almost opposite in the second period, since the smallest length-scales occur over the USA instead of the North Atlantic. These regional changes between the two periods are also highlighted in Figure 6(b), which corresponds to the length-scale differences between the second and first time intervals over these regions. In accordance with the previous discussion, it can be noted that during the second period length-scales are increased over the North Atlantic whereas they are decreased over North America. These results indicate that wavelet-based correlations are partly sensitive to the choice of calibration period. This aspect has been studied further in the next section, by evaluating the impact of a given set of calibrated wavelet-based correlations during two different periods.

Figure 6.

(a) Horizontal correlation length-scales (in km) of wind near 500 hPa, diagnosed from the wavelet formulation of B calibrated over the second period (8–28 March 2010). (b) Length-scale differences between the second calibration period (8–28 March 2010) and the first one (15 February–7 March 2010), for the North Atlantic and North America. This figure is available in colour online at wileyonlinelibrary.com/journal/qj

4. Impact of the wavelet formulation of B

4.1. Experimental framework

The diagnostic study presented in the previous section indicates that the wavelet formulation is able to capture and represent realistic geographical variations of correlation functions to a larger extent than in the spectral formulation. In this section, the impact of the wavelet approach on the forecast quality is investigated. This is done by running the operational deterministic version of Arpege 4D-Var with two different formulations of B. The non-linear forecast model of this version uses a stretched horizontal resolution corresponding to a spectral truncation T798 and a stretching factor c equal to 2.4 (Courtier and Geleyn, 1988): the resolution is around 10 km over France (corresponding to truncation T1915 ∼ T798 × 2.4) and around 58 km at the Antipodes (corresponding to truncation T332 ∼ T798/2.4). There are 70 vertical levels, and the assimilation scheme is based on a multi-incremental 4D-Var (Veersé and Thépaut, 1998) with two minimizations performed at truncations T107 and T323. As wavelet-based correlations are expected to be more sensitive to the choice of calibration period than in the spectral approach, the impact of the calibrated wavelet-based B matrix has been studied during two different three-week periods. The first three-week impact period coincides with the calibration period, namely from 15 February–7 March 2010. This first period is used in order to evaluate whether heterogeneous wavelet-based correlation estimates are sufficiently realistic to have a positive impact on the forecast quality. The second impact period corresponds to the subsequent three weeks, namely from 8–28 March 2010. This second period is considered in order to study to what extent a given set of calibrated wavelet-based correlations can be applied to a period that is different from the calibration time interval.

4.2. Impact on the forecast quality during the calibration period

The impact of the wavelet formulation of the B matrix on the Arpege forecast quality has been quantified by statistics such as the root-mean-square error (RMS) of the model forecasts, evaluated by using either the ECMWF analysis or radiosonde data as a reference. With respect to the spectral approach, a global reduction of the time-averaged RMS is observed. This is illustrated in Figure 7(a) for the 72 h forecast of geopotential over Europe and the Northern Atlantic. The RMS reduction is particularly visible at the jet level, for instance, where it reaches an amplitude of 2 m.

Figure 7.

Root-mean-square error (RMS) values of 72 h forecasts of geopotential over Europe and the Northern Atlantic during the three-week calibration period, with the ECMWF analysis used as the reference. RMS scores are presented (a) as a time-averaged vertical profile over the calibration period (RMS values are shown on the x-axis) and (b) at 250 hPa for each day of the calibration period (RMS values are shown on the y-axis). The solid and dashed lines correspond to the wavelet and spectral approaches respectively. This figure is available in colour online at wileyonlinelibrary.com/journal/qj

The corresponding time evolution of the RMS at 250 hPa is shown in Figure 7(b), when using the ECMWF analysis as a reference. Similar results are obtained when using radiosondes as a reference (not shown). This figure indicates that the average positive impact is robust temporally, since the RMS is almost systematically reduced. Moreover, similar positive impacts are visible in the other parts of the globe, as illustrated in Figure 8 for the 24 h forecast of geopotential at 250 hPa in the Southern Hemisphere extratropics.

Figure 8.

Time evolution of RMS values of 24 h forecasts of geopotential at 250 hPa in the Southern Hemisphere extratropics during the three-week calibration period, with the ECMWF analysis used as the reference. The solid and dashed lines correspond to the wavelet and spectral approaches respectively. This figure is available in colour online at wileyonlinelibrary.com/journal/qj

These results support the idea that the wavelet formulation is able to extract realistic correlation heterogeneities, leading to robust positive impacts compared with the nearly homogeneous spectral formulation.

4.3. Impact on the forecast quality during the subsequent period

The impact of the wavelet formulation has also been investigated during the subsequent three-week period, i.e. after the three-week time interval of calibration. For some regions and forecast ranges, a robust positive impact similar to that during the calibration period is observed. This is illustrated in Figure 9, which corresponds to the time evolution of the RMS for the 24 h forecast of geopotential at 250 hPa in the southern extratropics. The RMS appears to be nearly systematically reduced, in a similar way to Figure 8. This robust positive impact indicates that the calibrated wavelet formulation has been able to represent important correlation heterogeneities which are valid for the two periods, such as the contrast between large length-scales over tropical oceans and relatively small length-scales in the circumpolar ocean of the Southern Hemisphere (Figures 2 and 6).

Figure 9.

Time evolution of RMS values of 24 h forecasts of geopotential at 250 hPa in the Southern Hemisphere extratropics during the three weeks following the calibration period (i.e. after the calibration period), with the ECMWF analysis used as the reference. The solid and dashed lines correspond to the wavelet and spectral approaches respectively. This figure is available in colour online at wileyonlinelibrary.com/journal/qj

However, more generally, while the impact of the wavelet formulation remains globally positive during this second period, it tends to be less spectacular than during the calibration time interval. This is illustrated in Figure 10, which corresponds to (a) the time-averaged RMS vertical profile of the 72 h forecast of geopotential over Europe and the Northern Atlantic and (b) the associated time evolution at 250 hPa. For this period and region the time-averaged impact is nearly neutral, due to a succession of positive and negative cases.

Figure 10.

RMS values of 72 h forecasts of geopotential over Europe and the Northern Atlantic during the three-week subsequent period (i.e. after the calibration period), with the ECMWF analysis used as the reference. RMS scores are presented (a) as a time-averaged vertical profile over this subsequent period (RMS values are shown on the x-axis) and (b) at 250 hPa for each day of this subsequent period (RMS values are shown on the y-axis). The solid and dashed lines correspond to the wavelet and spectral approaches respectively. This figure is available in colour online at wileyonlinelibrary.com/journal/qj

This is consistent with section 3.4, which indicates that some of the correlation heterogeneities that have been captured by the wavelet formulation in the North Atlantic are specifically representative of cyclogenic situations that occurred during the calibration period, and are less representative of weather situations affecting the subsequent period.

5. Conclusions

Because wavelet functions contain information about both scale and position, a wavelet formulation of the B matrix can be considered to represent 3D correlation heterogeneities. This is supported by diagnostic studies of geographical variations of horizontal and vertical correlations. The results indicate that the contrast between relatively broad horizontal correlation functions in the Tropics and sharp ones in the midlatitudes is realistically captured by the wavelet approach. Horizontal variations of vertical correlations induced by e.g. latitude and land/sea contrasts are also better represented by the wavelet formulation than by the spectral approach. Moreover, sensitivity to the choice of calibration period has been studied, through a comparison with diagnostic results obtained during a subsequent three-week period. On the one hand, large-scale geographical variations appear to be similar between the two periods, such as the contrast between large length-scales over tropical oceans and small length-scales in midlatitudes. On the other hand, regional contrasts between North America and the North Atlantic are different during the second period, with a relative increase of length-scales over the North Atlantic in the second period related to lesser cyclonic activity. This indicates that correlation heterogeneities that are captured by the wavelet formulation are partly sensitive to the choice of calibration period.

The impact of the wavelet formulation on the final forecast quality has also been investigated during two different three-week periods. The impact is remarkably positive during the calibration period, which supports the idea that wavelet-based correlation heterogeneities are realistic. The impact remains globally positive during the subsequent three-week period, but it tends to be less spectacular and systematic than during the calibration time interval. This is consistent with the fact that some of the captured heterogeneities are relatively specific to cyclonic situations that occurred during the calibration period and are less representative of weather situations affecting the subsequent period.

This suggests that an on-line calibration of the wavelet formulation should be considered in future, in order to exploit fully its ability to capture correlation heterogeneities from ensemble data. This could be explored with different possible strategies, depending e.g. on available computational resources. If the latter are sufficiently large to allow a large-sized ensemble to be run in real time, the wavelet formulation could be calibrated on-line after each forecast step from the set of ensemble forecasts valid at the target analysis time. On the other hand, if the ensemble size is relatively small or if sampling noise effects remain relatively large, a local time average of correlations, to be updated on-line, could be considered in order to increase the sample size in an analogous way to the experiments of e.g. Xu et al. (2008) in the context of an ensemble Kalman filter. These different possible approaches will be considered in the future in order to provide flow-dependent correlation estimates. More generally, a comparison between this wavelet approach and techniques based on a Schur product (e.g. Buehner et al., 2010a,b) would also be interesting to consider, including their respective sensitivities to ensemble size.

Acknowledgements

We thank Mike Fisher for his help in the use of the wavelet formulation, such as providing horizontal grids for different wavelet scales.

Ancillary