NWP model forecast skill optimization via closure parameter variations



We apply a recently developed method, the Ensemble Prediction and Parameter Estimation System (EPPES), to demonstrate how numerical weather prediction (NWP) model closure parameters can be optimized. As proof of concept, we tune the medium-range forecast skill of the ECMWF model HAMburg version (ECHAM5) atmospheric general circulation model using an ensemble prediction system (EPS) emulator. Initial state uncertainty is represented in the EPS emulator by applying the initial state perturbations generated at the European Centre for Medium-range Weather Forecasts (ECMWF). Model uncertainty is represented in the emulator via parameter variations at the initial time. We vary four closure parameters related to parametrizations of subgrid-scale physical processes of clouds and precipitation. With this set-up, we generate ensembles of 10-day global forecasts with the ECHAM5 model at T42L31 resolution twice a day over a period of three months. The cost function in the optimization is formulated in terms of standard forecast skill scores, verified against the ECMWF operational analyses. A summarizing conclusion of the experiments is that the EPPES method is able to find ECHAM5 model closure parameter values that correspond to smaller values of the cost function. The forecast skill score improvements verify positively in dependent and independent samples. The main reason is the reduced temperature bias in the tropical lower troposphere. Moreover, the optimization improved the top-of-atmosphere radiation flux climatology of the ECHAM5 model, as verified against the Clouds and the Earth's Radiant Energy System (CERES) radiation data over a 6-year period, while the simulated tropical cloud cover was reduced, thereby increasing a negative bias as verified against the International Satellite Cloud Climatology Project (ISCCP) data.

1. Introduction

Numerical weather prediction (NWP) models are complex technical expressions of the science of weather forecasting. They operate at a generally high level of forecast skill which implies that all relevant multi-scale interactions and dynamics–physics feedbacks are tuned into harmony. The need for model tuning arises in part from the fact that discrete numerical representation splits atmospheric processes into resolved and unresolved ones. Subgrid-scale physical processes are parametrized with numerical schemes that contain explicit closure parameters (e.g. Stensrud, 2007). Typically, expert knowledge and manual techniques are used to specify the optimal parameter values at various stages of the model development and tuning process. This is a laborious task, which needs to be repeated after any major model upgrade. Due to the high computational cost of NWP models, the tuning is limited by the affordable number of test cases. The optimal values and uncertainties of these parameters are therefore only approximately known.

Algorithmic techniques to estimate model parameters can speed up model development, and improve usefulness of simulation results as their uncertainties are better understood. A prerequisite for parameter estimation is to understand the relationship between parameter variations and model response. Parameter variations can be used in ensemble prediction systems (EPS) to represent model uncertainty in addition to stochastic parametrization schemes (e.g. Bowler et al., 2008, and references therein). The reason is that in EPS initial condition perturbations alone do not generate enough spread to the ensemble of forecasts. Thus the ensemble, which should properly sample forecast uncertainty, may appear overconfident unless uncertainties in the model formulation and boundary forcing are accounted for, as well. For instance, the impact of parameter variations related to convection and boundary-layer parametrization on tropical ensemble spread and Brier scores was positively assessed by Reynolds et al. (2011) in a global forecasting system.

Nielsen-Gammon et al. (2010) advocate studies of the sensitivities of model simulations to model parameter variations: successful parameter estimation requires that variations in a subset of parameters to be estimated produces sufficiently large, well-behaved, and unique signatures to model output. Hacker et al. (2011) studied the model response to parameter variations in a mesoscale ensemble prediction system. They did not find any clear linear scaling between parameter variations and ensemble properties: the perturbed models were typically indistinguishable. They concluded that ensemble prediction using perturbed parameters complement more complex model-error simulation methods, but parameter estimation may prove difficult or costly for real mesoscale NWP applications. A possible alternative avenue is to apply meta-models for parameter dependencies (Neelin et al., 2010).

Applicability of ensemble techniques in parameter estimation has usually been considered from the state augmentation viewpoint, i.e. using state filters augmented with parameters as artificial states (e.g. Aksoy et al., 2006a, 2006b). In filtering approaches, the focus is on very-short-range forecasting as the state is propagated basically from one observation time to the next. Thus, parameter estimation is conditional on model performance in very-short-range forecasts. NWP systems are known to suffer from spin-up/down problems. For instance, moist variables exhibit a tendency towards the model attractor because the model hydrological cycle is not in balance at the initial state (e.g. Trenberth and Guillemot, 1998; Betts et al., 2003). Uncertainties are often related to parameters in moist physical processes, and may be affected by this imbalance early in the forecast. In the following, we put forward our parameter estimation approach, which is not in the immediate context of data assimilation, but uses short-to-medium range forecasts generated in abundance by ensemble prediction systems.

Forecast error growth studies hint at how parameter variations evolve in complex systems. Forecast errors are due to initial state errors and model errors but these are not easily separable because the estimation of the initial state involves a forecast model, and thus initial state errors are affected by model errors too (Leutbecher and Palmer, 2008). Growth of very-short-range forecast error is nevertheless dominated by the exponential growth of initial state errors, and the linear growth of model errors becomes important later in the forecast (Savijärvi, 1995). This tends to imply that early in the forecast range, the parameter variations do not yet have a sizable effect. On the other hand, late in the forecast, parameter variations have a stronger impact but are masked by the quadratic nonlinearity of the system and weaken parameter identifiability—nonlinearity is considered quadratic because the main terms of atmospheric dynamics have quadratic expressions. Thus, somewhere in between these extremes, there might be an optimal forecast range where parameter variations already affect the model output but chaoticity does not yet dominate the system behaviour and overwhelm parameter identifiability. This is supported by the finding of Zhu and Navon (1999) that a low-resolution global atmospheric general circulation model (GCM) tends to first lose the impact of the optimal initial condition while the impact of optimally identified parameter values persists beyond 72 hours. Interestingly, experiments with the European Centre for Medium-range Forecasts (ECMWF) analysis and forecasting system, in which satellite data are first denied and then reintroduced, suggest that observations older than about three days have no influence on the quality of the analysis (Fisher, 2006).

A recent dual article (Järvinen et al., 2012; Laine et al., 2012; hereafter JL2012) presented an Ensemble Prediction and Parameter Estimation System (EPPES), and argued that ensemble prediction systems can be utilized to make statistical inference about the NWP model closure parameters by means of parameter perturbations. The idea of JL2012 was to impose initial-time parameter variations on an ensemble of forecasts and to infer the parameter values and their uncertainties based on how likely different ensemble members appear against observations. From the parameter-estimation point of view, the initial values can be seen as ‘nuisance parameters’ whose uncertainty should be integrated out. In EPPES, this is done by sampling over a large number of different flow types (i.e. initial states). This is further enhanced by the use of initial state perturbations.

In EPPES, the likelihood can be formulated in terms of forecast skill at some suitable forecast range covered by the ensemble, say at five days. Thus, one directly attempts to optimize the medium-range forecast skill. The appeal of EPPES is that the computational power traditionally used in operational ensemble production for assessing the forecast uncertainties could be harnessed for model tuning too. The method can be implemented into operational EPS with minimal technical changes to the code infrastructure. The EPPES algorithm itself is virtually cost-free. Moreover, it is a model-independent algorithm which can be easily transferred to new modelling systems, as long as the relationship of parameter variation and signatures in the model output are sufficiently understood. NWP model tuning is certainly challenging, but these potential benefits render further experimentation worthwhile.

Based on experimentation with a stochastic version of the Lorenz-95 model (Lorenz, 1995; Wilks, 2005), JL2012 concluded that EPPES might be a step towards algorithmic model parameter estimation. The results of JL2012 cannot, however, be directly scaled up to realistic systems since the rich dynamics of the atmospheric circulation cannot be simulated with the Lorenz-95 model. Therefore, this article takes a step towards a more realistic set-up, and demonstrates the EPPES method using a global atmospheric GCM. An ‘EPS emulator’ is developed based on the ECMWF model HAMburg version (ECHAM5: Roeckner et al., 2003). The motivation to use a climate model rather than an NWP model is that for the proof of concept in large- and multi-scale systems, basically any primitive equation system should suffice. In our case, the ECHAM5 model provided the shortest development path. Furthermore, we rely on the ECMWF operational analyses and their EPS initial state perturbations. We copy the perturbed initial conditions and verifying analyses from ECMWF, and use ECHAM5 as a state propagator to make 10-day global forecasts. This enables very convenient testing of the EPPES algorithm and allows full control of the necessary components while avoiding the need to develop the EPS infrastructure. We present the experimental set-up in section 2, the parameter estimation and validation results in section 3, before discussion and conclusions.

2. Experimental set-up

2.1. The ECHAM5 model and the subset of parameters

Version 5.4 of the ECHAM5 atmospheric general circulation model (Roeckner et al., 2003, 2006) was used. The dynamical part of ECHAM5 is formulated in spherical harmonics, while physical parametrizations are computed in grid-point space. The simulations reported here used a coarse horizontal resolution of T42, i.e. triangular truncation at wave number 42, corresponding to a grid spacing of 2.8125°. The model vertical grid had 31 layers with model top at 10 hPa. A semi-implicit time integration scheme is used for model dynamics with a time step of 20 min. Model physical parametrizations (Roeckner et al., 2006) are invoked every time step with the exception of radiation, which is computed once per two hours.

Four ECHAM5 closure parameters were considered (Table 1). These parameters are related to physical parametrizations of clouds and precipitation. The choice of these parameters is motivated by their substantial influence on the model's climate. Additionally, our research group is familiar with the impact of varying these parameters. In Järvinen et al. (2010), adaptive Markov chain Monte Carlo technique (MCMC: Haario et al., 2006) was applied to estimate posterior joint probability densities of these parameters; two of them (CMFCTOP, ENTRSCV: Fig. 4 of Järvinen et al., 2010) were well identified while two others (CAULOC, CPRCON) were found to be rather poorly identifiable with the chosen formulation of the cost function. As ECHAM5 is primarily a climate model, the impact of variations of the chosen subset of parameters on medium-range forecast skill was not well understood prior to the experiments.

Table 1. The subset of ECHAM5 closure parameters which contain parameter variations.
CAULOCA parameter influencing the accretion of cloud droplets by precipitation (rain formation in stratiform clouds)
CMFCTOPRelative cloud mass flux at the level above non-buoyancy (in cumulus mass flux scheme)
CPRCONA coefficient for determining conversion from cloud water to rain (in convective clouds)
ENTRSCVEntrainment rate for shallow convection

There are two principal approaches to parameter variations: either to keep the parameter variation fixed during the forecast, or to treat parameters as stochastic variables and model their variations during the forecast by some autoregressive process (Lin and Neelin, 2000). In EPPES, the parameter variations are specified at initial time and are kept fixed during the forecast.

2.2. The ensemble prediction system emulator

Ensemble prediction systems are composed of two main functionalities: generation of initial state perturbations, and representation of model errors. These functionalities are included into our EPS emulator as follows.

First, the initial state perturbations are taken directly from the ECMWF operational EPS data archive. We use twice-daily (0000 and 1200 UTC) control and 50 perturbed initial conditions over a period of three months (January to March 2011). The ECHAM5 model state variables are the same as in the ECMWF forecast model (temperature, vorticity, divergence, logarithm of surface pressure, and specific humidity). Also, representation of the states is the same in the two systems, and hence conveniently applicable in the ECHAM5 model to generate 10-day forecasts. For validation purposes, the operational analyses were copied also for the periods of January to March 2010, and for April 2011.

Second, the forecast model uncertainty is represented in the EPS emulator with the initial-time parameter variations. The parameter variations are generated by the EPPES sampling algorithm and serve the purpose of parameter estimation, though they are probably efficient also in sampling model uncertainties. No stochastic physics schemes were developed nor applied, and thus the ensembles generated with the EPS emulator may be somewhat underdispersive. The spread–skill relationship was not calibrated at any stage of the parameter estimation process. We note that for on-line parameter estimation with EPPES using operational EPS runs, one has to ensure that the spread–skill calibration is maintained, and that no poorly performing parameter values are used. Here, off-line parameter estimation is performed and the spread–skill calibration is not such an important issue.

The ensemble spread caused by the initial state perturbations and the parameter variations were first tested in separation. Both generate roughly equal amounts of spread to the ensemble, but these are not additive: in the EPS emulator, both perturbation sources were switched on, but the spread was only slightly larger than the separate effects of one or the other. This seems to indicate, as remarked by one of the peer-reviewers, that the initial state perturbations optimized to the ECMWF system do not carry over well to our ECHAM5-based EPS emulator since they only generate a similar spread as the parameter variations. This EPS emulator may be thus sub-optimal but nevertheless sufficiently effective for the proof of concept of NWP model forecast skill optimization. An additional remark is that the role of initial state perturbations in parameter estimation using the EPPES algorithm is to ‘integrate out’ the uncertainty related to the initial state. The estimates obtained without using the initial state perturbations would probably lean more towards these particular analyses, and would not be the same ones as obtained when the initial state perturbations were used as well.

2.3. Implementation of the estimation algorithm

The Ensemble Prediction and Parameter Estimation System (EPPES) algorithm is described in detail in JL2012, and demonstrated with an implementation using a stochastic version of the Lorenz-95 model (Lorenz, 1995; Wilks, 2005). In EPPES, we assume that for a time window i, the optimal model parameter θi is a realization of a random variable, which follows a multivariate Gaussian distribution with a mean vector μ and a p × p covariance matrix Σ, as follows:

equation image

The distribution parameters μ and Σ are assumed unknown but static over time. In EPPES, the problem of estimating the model parameter θ is formulated as a problem of estimating the distribution parameters (or, hyper-parameters) μ and Σ. The interpretation is that there is a mean parameter value μ that performs best on average considering all weather types, seasons etc., but due to the evident modelling errors (possibly weather regime dependent), the optimal parameter value varies according to Σ between different time windows, i.e. between different ensembles.

The EPPES algorithm proceeds by drawing parameter value proposals from a distribution that accounts for the parameter uncertainty. The quality of the parameter values is tested in ensembles of medium-range forecasts, and well-performing parameter values are qualified to feed back to the proposal distribution. We will refer to an ensemble by index i, covering one time window, including verifying observations regardless the observing time. An outline of the EPPES algorithm can be written as follows:

  • 1.Initialize the hyper-parameters μ0 and Σ0. The distribution N(μ0, Σ0) is the initial prior for the first time window and the proposal distribution for the first sample.
  • 2.At each time window i, sample a set of proposed values for the parameters θi—call them equation image—from the multivariate Gaussian distribution, equation image, j = 1,…,nens, where nens is the ensemble size.
  • 3.Generate the ensemble of predictions using the parameters equation image.
  • 4.Evaluate the cost function equation image for each ensemble member and compute the importance weights equation image, i.e. the likelihood of the observations yi given the parameters equation image.
  • 5.Using the importance weights, make a re-sampled ensemble of equation image as equation image, j = 1,…,nens.
  • 6.Update the hyper-parameters μi and Σi with the re-sampled parameter values, using the EPPES update formulae (for details, see JL2012).
  • 7.For the next time window i + 1, set the proposal distribution for parameter θi+1 as N(μi, Σi) and go back to step 2.

In the EPPES algorithm, the prior distribution mean and standard deviation for the first ensemble are specified based on expert knowledge. The initial mean values μ0 correspond to the default model parameter values. The parameter uncertainties Σ0 at the initial time are specified so that the resulting parameter variations can be used in the model without significant loss of modelling accuracy. Initially, parameters can be assumed independent and therefore the Σ0 at the initial time is a diagonal covariance matrix. Possible parameter covariances will emerge during the estimation process. As a safety measure, minimum and maximum allowed parameter values are specified to prevent unrealistic values entering the forecast model. For the ECHAM5 model at T42L31 resolution, these values are given in Table 2.

Table 2. ECHAM5 (T42L31 resolution) parameter values applied in the EPPES tests.
 meanstd. dev. meanstd. dev.
  1. Prior mean values correspond to the default model values. Prior standard deviation (the width of the proposal distribution of the first ensemble) and bounds (minimum and maximum allowed parameter values) are subjectively specified. Posterior mean and standard deviation are the EPPES estimates after 180 estimation steps with the specified cost function.

CPRCON1.5 × 10−44 × 10−30–1.5 × 10−26.4 × 10−32.4 × 10−3
ENTRSCV3 × 10−41 × 10−30–5 × 10−34.6 × 10−42.5 × 10−4

We want to emphasize that varying these parameters does not capture the full uncertainty in the ECHAM5 physical parametrizations. Only the parametric uncertainties related to the selected parameters are covered. Also, the entire structural uncertainty of the model formulation is intact. Furthermore, we believe that the optimal closure parameter values are implementation-specific. Hence, they are not necessarily optimal any longer if the model resolution or structure is changed.

In JL2012, EPPES was tested using a stochastic version of the Lorenz-95 model. Because this model is an ordinary differential equation system without spatial dimensions, the likelihood was formulated as a simple sum-of-squares of the difference between forecast and verifying analysis with some noise. In the case of ECHAM5, the three-dimensional state vector needs to be compared either with observations or with a reference state. The proper formulation of the likelihood is thus a major technical difference here as compared with JL2012. Here, the cost function is a twin-criterion formulation of the squared forecast error:

equation image

where equation image (equation image) is a 72-hour (240-hour) forecast of the 500 hPa geopotential height, za is the verifying analysis, and dA is the areal element of model grid. The factor 5/2 makes the two terms approximately equal in magnitude, and thereby balances their contributions to the cost function. The parameters θ in the formula imply that the forecasts depend on the sampled parameter values. We note that the cost function is closely related to the root-mean-squared forecast error (RMSE) commonly used as a headline score in NWP. Also, this cost function is only intended for demonstration purpose rather than as the ultimate formulation. Some alternative optimization criteria related to, for example, precipitation rate or cloud frequency of occurrence may lead to different parameter optima, but this has not been tested.

3. Results

3.1. Performance of ensembles in terms of cost function values

In the EPPES runs performed here, each ensemble consists of 50 members with perturbed initial conditions, and one member using the unperturbed control analysis as initial state. All these 51 members contain parameter variations. Thus, 51 forecasts twice daily for three months result in an EPPES sequence of 9180 sample points in total. For diagnostic purposes, a forecast from the unperturbed control analysis is run also with the default parameter values in each ensemble. This control member does not affect the estimation process. Figure 1 displays the cost function values for ensemble members in the first 20 ensembles. One ensemble appears as a vertically aligned group of markers (black crosses). In the first ensemble, the cost function value of the default model (large grey cross) is about 215 (in arbitrary units) while the other members (black crosses) are distributed around the default value; their values range from about 205 to 232.

Figure 1.

The cost function values in the ensembles launched during 1–10 January 2011, i.e. in the first 20 ensembles. Each vertically aligned group of markers corresponds to one ensemble. The cost function values are plotted for each ensemble member (black crosses), the default model run (large grey cross), and the re-sampled ensemble members affecting the proposal distribution (grey crosses; slightly offset to the right).

The ensemble spread in terms of cost function values (Figure 1) varies from one ensemble to the next. For instance, the first and second ensembles have a pronounced spread while the 15th ensemble is relatively compact. Note that the spread is due to both the initial state perturbation and the parameter variations. Changes in atmospheric predictability appear such that, for instance, all members of the ensemble number 15 are at a higher cost function level than any of the members in the ensemble number 16. These varying levels highlight the fact that comparison of cost function values between consecutive ensembles has no particular relevance for statistical inference, as discussed in JL2012, but comparison of members within one ensemble can be very informative.

Figure 1 also illustrates the EPPES re-sampling procedure. The re-sampled members (grey crosses; slightly offset to the right for better visibility) correspond to a relatively high forecast skill, and appear at the lower part of the group of markers. Note that the re-sample size is equal to the original ensemble, but members with presumably little impact on the posterior are abandoned and replaced with multiple copies of members with a greater impact. Only these ensemble members affect the hyper-parameters of proposal distribution. The control member applying the default parameter values would probably have been accepted in the re-sampling in the first ensemble (the control member is within the spread of the re-sample). This is the case in ten out of 20 ensembles. In this particular realization, the default model is never the best ensemble member but it is twice the worst performing member.

3.2. Evolution of the parameter values and their covariances

Evolution of the four parameter values in 180 consecutive ensembles is shown in Figure 2. A vertical column of markers corresponds to one ensemble of proposed (grey) and re-sampled (black) parameter values. It is common for all four parameters that the adjustment rate of the mean value μ (continuous line in Figure 2) is large during the first 20 to 40 steps, after which it remains rather low. The width of the 95% probability range of the parameter uncertainty (μ ± 2×standard deviation; dashed lines in Figure 2), on the other hand, are in a slow and slightly erratic decrease during the sequence.

Figure 2.

Evolution of the parameter values in 180 consecutive ensembles. A vertical column of markers corresponds to one ensemble of proposed (grey) and re-sampled (black) parameter values. The parameter distribution mean value μ (continuous line) and μ ± 2×standard deviation (dashed lines) are also shown. For clarity, only every fourth ensemble is plotted.

Parameter mean and standard deviation values after 180 steps (i.e. the parameter posterior values) are given in Table 2. Three parameters tend to increase their mean values: CAULOC by a factor of 5, CPRCON by a factor of 40, and ENTRSCV by a factor of 1.5. CMFCTOP remains close to its default value. The standard deviation decreases by a factor of about 2, except for ENTRSCV where the decrease is by a factor of 4. Note that the initial uncertainty estimates were subjective.

The parameter pairwise covariances are presented in Figure 3 at the initial time (Figure 3(a)), and after 180 estimation steps (Figure 3(b)). Initially (Figure 3(a)), the parameters are assumed independent and the specified prior parameter uncertainties appear as ellipses centred at the default values μ0. After 180 steps, the parameter mean values μ have drifted apart from the default values, and there is a slight tilt in the ellipses, most notably between CMFCTOP and CAULOC. This tilt is an indication of parameter covariance. Note that the default value of CPRCON is outside the 95% probability range of the parameter uncertainty, and the default value of CAULOC is right on the 95% probability contour. It would be easy to impose limits to parameters in the EPPES method to respect the theoretically justified parameter ranges of which the experts are convinced.

Figure 3.

(a) Pairwise parameter covariances at the initial time. Default parameter values (μ0) are denoted by thin dashed lines. The ellipse represents the prior parameter uncertainty as specified initially (the 95% probability range of the parameter uncertainty Σ0). The small circles are the proposed parameter values at the first step; darkness of colour is indicative of the weights given to re-sampled parameter values. (b) As (a), but after 180 consecutive ensembles. The small circles are the proposed parameter values at step 180.

Finally, we note that in Järvinen et al. (2010), two out of four parameters were well identified (CMFCTOP, ENTRSCV) while two others (CAULOC, CPRCON) identified poorly. Here the posterior distributions of all four parameters are compact. This supports our conclusion that the EPPES method has identified the parameter distributions of all four parameters.

3.3. Asymptotic behaviour

In order to study the asymptotic behaviour of this implementation of the EPPES algorithm, we made the following test. After reaching the end of the three-month period (180 steps), the sequence was restarted from the first date. Ten sweeps over the same dataset were performed, totalling 1800 steps and 91800 sample points. We want to emphasize that the parameter estimates obtained in this procedure represent a close fit to this particular training dataset. In this experiment, the cost function contained only the 10-day forecasts error contribution. Thus, the final parameter values are not directly comparable with the results presented above.

Asymptotic parameter evolution logically follows from the behaviour of the first sweep (Figure 2): in sweeps two to ten (not shown), the parameter mean values are in a slow and smooth evolution towards some asymptotic values. Just for comparison, these values are some 30% lower than the posterior values in Table 2. The 95% probability contours of the parameter uncertainty, on the other hand, reach an asymptotic level after 3–4 sweeps with no further decrease thereafter. These values, on the other hand, are some 40% lower than those in Table 2. ENTRSCV is an exception in this experiment: its mean value and uncertainty remained close to the default value.

The parameter pairwise covariances after 1800 steps are presented in Figure 4. Strong covariances emerge, especially between ENTRSCV and CAULOC, and ENTRSCV and CMFCTOP. Some words of caution, though, are due on the experimental set-up. First, the covariances may be very specific for this training set, and random in that sense; in another data set the covariances might be different. Second, the covariances may be too strong, because the variability in the training dataset is limited; in another dataset with better representation of the natural variability, the covariances might be weaker. Interestingly enough, Klocke et al. (2011) are aware of the correlation between ENTRSCV and CMFCTOP, and have in fact coupled the variations of these two parameters because of their opposite effects on top-of-atmosphere net radiation through their impact on low cloudiness. Here, coupling of ENTRSCV and CAULOC appears even more pronounced.

Figure 4.

As Figure 3(a), but after 1800 consecutive ensembles. The small circles are the proposed parameter values at step 1800. Note that the axis scales are different in Figure 3.

3.4. Validation of forecast skill

The optimized parameters of Table 2 are validated in forecast experiments, with focus on the 500 hPa geopotential height field. We use two skill scores as validation measures: root-mean-squared forecast error (RMSE), and anomaly correlation coefficient (ACC), defined as

equation image

Here dzf and dza are the forecast and analysis fields, respectively, minus a long-term climatology of the ECMWF operational analyses. RMSE is a commonly used forecast error measure in NWP. It is sensitive to forecast bias but tends to favour smooth forecast fields. ACC, on the other hand, is not sensitive to forecast bias and favours correct patterns in the forecast fields. Note that the optimization criterion (cost function) is closely related to RMSE while ACC is independent of the optimization. We use these two validation measures because they are complementary: if RMSE is decreased while ACC is not significantly degraded, we can conclude that the reduction in the cost function values (RMSE) is not due to the smoothing effect, but related either to bias reduction and/or more accurate forecasts of spatial variations of the height field.

The forecast skill validation results are presented next. The RMSE differences are computed between the default and the optimized model (Figure 5, left column), while the ACC differences are between the optimized and the default model (Figure 5, right column). Thus the difference (continuous line in Figure 5) is the higher the better is the optimized model. The 95% confidence intervals are also indicated (grey bars in Figure 5; the bar width is two times the standard deviation of the differences divided by the square root of number of cases). The top row of Figure 5 presents the validation in dependent data, that is, the dataset in which the parameters were optimized. The forecast dataset consists of 180 cases at 0000 and 1200 UTC, January–March 2011. The RMSE is consistently better in the optimized model than in the default model at all forecast lead times. This is true at the 95% confidence level, except at 10-day range. In practice, the optimized model is some three hours ahead of the default model in terms of RMSE forecast quality at 10-day range. This result indicates that the EPPES algorithm works as intended because the optimization criterion used here is closely related to RMSE skill score. Based on ACC (right column), the optimized model is neutral or better than the default model indicating a genuine model improvement.

Figure 5.

The differences of the 500 hPa geopotential height forecast skill scores. Left column: RMSE (default minus optimized model), right column: ACC (optimized minus default model). Top row: dependent sample (January–March 2011), middle row: independent sample of April 2011, bottom row: independent sample of January–March 2010. Mean forecast score difference (continuous line), and the 95% confidence interval of the difference (grey bars).

Next, the validation is performed in an independent dataset of April 2011 (Figure 5, middle row), containing 60 forecast cases immediately after the training period. Presumably the flow type is somewhat similar to the last weeks of the training period, at least early in this period. Based on RMSE, the optimized model is again consistently better than the default model at all forecast lead times. However, at ranges beyond 6–7 days the result is not significant at the 95% confidence level. In terms of ACC, the optimized model is worse than the default model at 6–8 day ranges, and better at 9–10 day ranges. This result is, however, not 95% significant. Thus, we consider the optimized model as neutral in this dataset compared with the default model. The second independent dataset covers the same months (January–March) as the training period but from the previous year, and includes 180 forecast cases (Figure 5, bottom row). This forecast experiment tests the robustness of the optimized model to interannually varying atmospheric state while the effects of intra-annual variability (seasonal cycle) are excluded. RMSE is again consistently better in the optimized model than in the default model. This result is 95% significant up to 8-day forecast ranges. In this dataset, the optimized model is neutral in terms of ACC up to 6-day range and consistently negative at longer ranges. This result is not significant at the 95% level.

The conclusion is that the optimized model is generally better than the default model when the validation measure is similar to the one used in the optimization (RMSE). This conclusion holds both in dependent and independent datasets, including interannual variations but excluding seasonal variations in the atmospheric states. An independent and complementary criterion (ACC) is not degraded at 95% significance level, thus supporting the conclusion that the RMSE improvement is genuine.

The skill scores were separately computed (not shown) for the Northern Hemisphere (NH; north of 20°N), Southern Hemisphere (SH; south of 20°S), and the tropical belt. The global RMSE skill improvement is mainly due to a reduction of the tropical temperature bias in the lower troposphere. A minor reason is the reduction of the 500 hPa geopotential height random error in the NH and SH scores. This is concluded from the fact that the RMSE decreased in the extratropics despite the bias slightly increased in the longer forecast ranges. Next we present a diagnosis of the model response to the parameter changes that explain the reduced tropical temperature bias.

3.5. Diagnosis of the reduced tropical temperature bias

Here we concentrate on the 10-day forecasts in the period January–March 2011. First, the parameter changes from the default values to the optimized ones resulted in increased convective precipitation between 30°N and 30°S (not shown), and consequently, reduced large-scale precipitation, both by about 10% on average. These changes are associated with a slightly larger cloud fraction in the lower troposphere, and a smaller fraction in a deep layer higher in the troposphere in the Tropics (Figure 6(a)), as well as a temperature change with 0.5 K warming at 700 hPa and a cooling of about 0.1 K at 850 hPa and 0.3 K at 300 hPa (Figure 6(b)). Therefore, the mean temperature below 500 hPa is increased, thus increasing the 500 hPa geopotential height in the Tropics. A qualitative cause–effect relationship is as follows: (i) cloud fraction is increased in the lower troposphere because of the increased cloud lifetime (larger value of the parameter ENTRSCV, although opposed by the larger value of CAULOC), (ii) cloud fraction is decreased higher in the troposphere due to the increased precipitation efficiency (increased value of CRPCON), (iii) the combined effect is that the convection is shallower; (iv) there is enhanced latent heating due to the shallower and more intense convection, and perhaps (v) increased latent cooling due to the evaporation of precipitation and/or enhanced long-wave radiative cooling at cloud tops around 850 hPa; and (vi) the cooling at 300 hPa may result from the vertical re-distribution of heating and/or enhanced short-wave radiative cooling due to the reduced cloud fraction.

Figure 6.

Pressure–latitude (hPa–°) cross-section of the cloud fraction (left panel; non-dimensional values between 0 and 1) and temperature (right panel; unit K) difference between the optimized and default models in 10-day forecasts in the period January–March 2011.

3.6. Validation of model climate

Since ECHAM5 is a climate model, it is of interest to see how the parameter optimization affects the climate simulated by the model. For the validation purpose, we use two data sources. First, the Clouds and the Earth's Radiant Energy System (CERES) Energy Balanced and Filled dataset (Loeb et al., 2009) is used to compare net radiative fluxes (long-wave + short-wave) at the top-of-atmosphere (TOA). Second, total cloud fraction (CTOT) is compared with the International Satellite Cloud Climatology Project (ISCCP) D2 data (Rossow et al., 1996; Rossow and Dueñas, 2004). We recall that the maximum-random overlap assumption is applied in the ECHAM5 model.

Two six-year ECHAM5 model simulations (2000–2005) are prepared using the default and optimized parameters (Table 2), respectively, so as to cover the CERES/ISCCP observation period. Prescribed distributions of sea-surface temperature and sea ice are used (AMIP Project Office, 1996). The comparisons of time–latitude cross-sections of TOA variables are presented in Figure 7, where the left (middle) column is for the default (optimized) simulation minus observation, and the right column for the optimized minus default simulation. The rows one to four in Figure 7 are for the net, short-wave (SW), and long-wave (LW) TOA radiation fluxes, and CTOT, respectively. In the default model simulation, the largest net radiation flux errors (about −40 W m−2) appear at high latitudes (∼55°S and ∼60°N) during local summer (Figure 7(a)). At lower latitudes, smaller positive biases prevail. In the simulation using the optimized parameter values (Figure 7(b)), the maximum monthly mean biases are reduced by about 10 W m−2. The default model global annual mean SW flux is biased negative (−5.21 W m−2; Figure 7(d)), and the LW flux is biased positive (7.05 W m−2; Figure 7(g)). These biases partly cancel out such that the global annual-mean net radiation flux is biased positive by 1.84 W m−2 (Figure 7(a)); this indicates that the default model might have been tuned with the net radiation flux as a target criterion. In the optimized model, the corresponding biases are 1.62 W m−2 for the SW (Figure 7(e)) and −1.63 W m−2 for the LW fluxes (Figure 7(h)), respectively. Therefore, the global annual-mean net flux bias practically vanishes in the optimized model (−0.01 W m−2; Figure 7(b)). Note that the radiation fluxes were not used as an optimization criterion.

Figure 7.

Time–latitude cross-section of TOA net flux difference between the default ECHAM5 model (DEF) and CERES observations (panel (a)), the corresponding difference for the optimized ECHAM5 model (OPT; panel (b)), and the difference between these two model runs (panel (c); note the different scale of shading). (d)–(f): Same as (a)–(c) but for the (down–up) short-wave flux at the TOA. (g)–(i): Same as (a)–(c) but for the (down–up) long-wave flux at the TOA. (j)–(l): Same as (a)–(c) but for total cloud fraction (CTOT; per cent) compared with ISCCP satellite observations.

Finally, total cloud fraction is considered. In comparison to the ISCCP data, the default model features too much cloudiness at high latitudes and too little cloudiness at lower latitudes, with the largest negative biases around 30°S and 30°N (Figure 7(j)). In the optimized model, cloudiness is reduced in the latitude band 30°S–30°N (Figure 7(k)) thus making the cloud climatology inferior as compared with the default model. As in the 10-day forecasts, the tropical cloud fraction has increased in the lower troposphere and decreased higher up. Clearly, the improved radiation flux climatology has been obtained at the expense of representation of tropical and subtropical radiatively active clouds. It is a known feature of the ECHAM5 model that a good simulation of both clouds and TOA radiation fluxes in the Tropics is challenging.

The conclusion is that the optimized model is better than the default model in terms of net, SW, and LW radiation fluxes, despite the fact that these were not used as target criteria in the optimization. At the same time, the optimized model simulates smaller total-cloud fraction in the Tropics which further increases the model cloud bias.

4. Discussion

We have been able to improve some aspects of the ECMAM5 model predictive skill. However, ECHAM5 is a climate model and has not been optimized for short-range weather forecasting. Thus, it may be an easy task to succeed either with EPPES or some other optimization method, although we are not aware of any previous attempts of algorithmic forecast skill optimization. Nevertheless, the complexity of the ECHAM5 model is in practical terms equal to NWP models currently in operational production. All relevant multi-scale interactions and dynamics–physics feedbacks of the operational NWP models are present also in the ECHAM5 model. The parametrizations of the ECHAM5 model are very comprehensive too. In fact, they are in part even heavier than those used in NWP models, especially regarding climate-relevant physical processes near the surface. Two outstanding differences remain. First, the problem size, i.e. the number of grid points due to the spatial discretization, is clearly higher in the operational NWP than in the ECHAM5 model used here. Second, operational EPS systems contain additional stochastic effects to represent model errors. Our current research efforts are thus directed towards tests using a top-end global NWP system. We want to clarify whether or not the huge problem size and the additional stochastic effects of modern ensemble prediction systems overwhelm the parameter estimation with the EPPES method.

In EPPES, we assume that the optimal model parameter θi in time window i is a realization of a random variable, which follows a multivariate Gaussian distribution θiN(μ,Σ). In model optimization, we are obviously interested in the mean parameter value μ. The distribution parameter Σ is potentially very informative too, although in this article we have not utilized it. We point out three options to utilize the distribution parameter Σ in model development: ensemble generation, detection of model deficiencies, and coupling of parameters. First, parameter variations can be used as a complementary technique to represent model uncertainty. If parameter variations are assumed independent, there is a risk of generating parameter variations that correspond to sub-optimal models, and/or outlying ensemble members. This risk can be potentially alleviated by the parameter covariance information provided by the EPPES method. Second, model deficiencies can appear as excessive parameter uncertainty and/or weak parameter identifiability. With the EPPES method, one can systematically explore the identifiability of parameters related to subgrid-scale parametrizations. Third, strong parameter covariance can be indicative of a need to couple some parameters together.

Finally, we note that the EPPES algorithm is not specifically designed for model bias reduction, although the forecast skill improvement of the ECHAM5 model was largely due to reduction of systematic model error, especially in the Tropics. We want to emphasize that stochastic parametrization schemes can significantly reduce systematic model errors (e.g. Berner et al., 2012). In contrast to these approaches, the ECHAM5 model was applied in a purely deterministic form in the validation runs where the parameter values were fixed to their optimized values.

5. Conclusions

In this article, closure parameters of ECHAM5 model at T42L31 resolution are estimated using the Ensemble Prediction and Parameter Estimation System (EPPES: Järvinen et al., 2012; Laine et al., 2012). To emulate the functionalities of an ensemble prediction system (EPS), we applied the following procedure. First, an EPS emulator is set up by copying initial state perturbations from the ECMWF operational archive. Second, parameter perturbations are imposed on the ECHAM5 model to represent model errors. Here, four closure parameters relating to clouds and precipitation are considered. With this set-up, ensembles of 10-day global forecasts are generated with the ECHAM5 model over a period of three months such that each ensemble member has a different parameter perturbation at the initial time. After an ensemble of forecasts has been generated, the EPPES algorithm re-samples the ensemble members that verify well against observations; mean-squared error of 3- and 10-day forecasts of 500 hPa geopotential height, as verified against the ECMWF analyses, is used here as a target criterion.

In a three-month period, 9180 sample points are generated in total. The mean values of the four closure parameters evolve towards more optimal values, and their uncertainties reduce from the values specified a priori. In the three-month twice-daily sequence, weak parameter covariances emerge. In asymptotic tests (ten times more sample points), the pairwise parameter covariances are stronger.

For the validation of the ECHAM5 forecast skill, we use two validation measures: root-mean-squared forecast error (RMSE) and anomaly correlation coefficient (ACC) of 500 hPa geopotential height. RMSE is better or neutral in dependent and independent datasets, while ACC is mostly neutral. Thus we conclude that the reduction in the cost function values is not due to a smoothing effect, but is related either to bias reduction and/or more accurate forecasts of spatial variations of the height field.

Since ECHAM5 is a climate model, we also evaluated the effects of parameter optimization on the climate simulated by the model. The default model is close to top-of-atmosphere global annual-mean net radiation flux balance, because sizable SW and LW radiation biases cancel out. The optimization improved the radiation flux climatology (net, SW and LW), as verified against the CERES radiation data over a 6-year period. At the same time, the model's negative bias in the tropical cloud cover has become worse, as compared with the ISCCP data. We conclude that optimization of medium-range forecast skill and maintaining realistic model climate are not conflicting targets. This is encouraging since a general requirement for climate models is that they perform well both in terms of bias and variability of essential climate variables. Climate models are typically optimized in terms of biases, while good forecast skill implies that the variability is also well captured. This tends to support our approach to use ensemble prediction with a relatively short forecast range in benefit of climate simulation.

Our current research efforts are directed towards testing the EPPES method in the context of top-end global NWP systems. We want to clarify whether or not the huge problem size and the additional stochastic effects of modern ensemble prediction systems overwhelm the parameter estimation with the EPPES method. Finally, we note that the EPPES codes used here and some examples are available on-line at http://helios.fmi.fi/∼lainema/eppes/.


The authors are grateful for ECMWF for the access to operational data archives. Petri Räisänen from FMI is warmly acknowledged for discussions and comments on the manuscript. The peer-reviewers provided insightful comments which are acknowledged too. The research has been funded in part by the Academy of Finland (project numbers 127210, 132808, 133142 and 134999), the Nessling foundation, and by the European Commission's 7th Framework Programme, under Grant Agreement number 282672, EMBRACE project (www.embrace-project.eu).