Maximum likelihood estimation of error covariances in ensemble-based filters and its application to a coupled atmosphere–ocean model

Authors


Abstract

We propose a method for estimating optimal error covariances in the context of sequential assimilation, including the case where both the system equation and the observation equation are nonlinear. When the system equation is nonlinear, ensemble-based filtering methods such as the ensemble Kalman filter (EnKF) are widely used to deal directly with the nonlinearity. The present approach for covariance optimization is a maximum likelihood estimation carried out by approximating the likelihood with the ensemble mean. Specifically, the likelihood is approximated as the sample mean of the likelihood of each member of the ensemble. To evaluate the sampling error of the proposed ensemble-approximated likelihood, we construct a method for examining the statistical significance using the bootstrap method without extra ensemble computation. We apply the proposed methods to an EnKF experiment where TOPEX/POSEIDON altimetry observations are assimilated into an intermediate coupled model, which is nonlinear, and estimate the optimal parameters that specify the covariances of the system noise and observation noise. Using these optimal covariance parameters, we examine the estimates by the EnKF and the ensemble Kalman smoother (EnKS). The effect of smoothing decreases by 1/e approximately one year after the filtering step. One of the properties of the smoothed estimate is that westerly wind anomalies over the western Pacific are not reproduced around the period of an El Niño event, while those over the central Pacific are strengthened. From additional experiments, we find that (1) the westerly winds in the western Pacific are phenomena outside of the coupled model and are not necessary to model El Niño, (2) the model El Niño is maintained by the westerlies over the central Pacific, and (3) the modelled evolution process of the sea-surface temperature (SST) requires improvement to reproduce the westerly winds over the western Pacific. Copyright © 2010 Royal Meteorological Society

1. Introduction

In data assimilation, covariance matrices are introduced in order to prescribe the properties of the uncertainty of the initial state, the system noise (model error, process noise), and the observation noise (observation error). The inverse matrices of the respective covariances work as weights of the initial estimates, model dynamics, and observations. Suitable specification of the covariance matrices is essential for obtaining sensible estimates, and mis-specification of the matrices may lead to over- or underfitting of the data and/or failure of the assimilation altogether (e.g. Fukumori 2001). This paper presents a technique for optimizing covariance matrices in data assimilation.

The weight of the initial estimate is usually modelled by the inverse of a covariance matrix, known as a background-error covariance matrix in the literature on variational assimilation. System noise (process noise, model error) introduces uncertainties in the temporal evolution of the model state due to numerical truncation and/or parametrization error. Observation noise is, by definition, the difference between observations and the model state. The difference can be interpreted as a sum of instrument error and representation error (Cohn 1997). The instrument error is due to the experimental properties of the instrument, and the representation error depends on the dynamic model. One of the sources of the representation error is neglected physics in the model, such as subgrid-scale variability. A part of the subgrid-scale variability, however, would not be considered as representation error if we used an alternative model that was constructed on a finer grid scale. In addition, background error and system noise also prescribe the behaviour of the model state, and consequently affect the observation noise.

Given the fact that these noise terms depend on each other, optimal covariance matrices of the noise should be simultaneously determined based on both the model and observation properties. In the following, we review methods for estimating the optimal covariances based on dynamic models and observations, as summarized in Table I, and clarify the nature of our proposed method. Although categorized in Table I, these methods are commonly based on the specific statistic, equation image, which is the difference between the observation equation image and the predicted model states equation image, where equation image is the observation matrix. According to custom, we refer to the statistic as innovation, though this terminology is not entirely correct since it presumes optimality of the data assimilation algorithm (e.g. Anderson and Moore 2005), section 5.3.

Table I. Methods for estimating optimal covariances for the initial state, system noise, and observation noise for data assimilation.
• Minimizing the squared innovation
• Maximum likelihood
1. Gaussian likelihood
º Original form
º Ensemble mean and covariance of state
º Expected value of the cost function
º Adjustment to the squared innovation
2. Non-Gaussian likelihood
º Ensemble mean of likelihood (present study)
• Bayesian estimation
• Covariance matching

1.1. Minimizing the squared innovations

One method of estimating the optimal covariances is to minimize the square of the innovation. Gaspar and Wunsch (1989) obtained the optimal magnitude of the system noise covariance by maximizing the variance explained by the model prediction, which is equivalent to minimizing the square of the innovation. Through the minimization of the squared innovation, Hoang et al. (1997) directly determined the optimal Kalman gain, which is a function of the forecast-error covariance and observation noise covariance.

1.2. Maximum likelihood

Maximization of Gaussian likelihood functions has also been adopted to estimate the covariance parameters (Dee 1995; Dee and da Silva 1999; Dee et al.1999). Since the likelihood function can be interpreted as a distribution of the innovations, the maximum likelihood is also considered to be an innovation-based method. Maximization of the likelihood function is equivalent to minimization of the sum of the quadratic of the innovations with a covariance matrix of a Gaussian innovation distribution and the log-determinant term of the covariance, which gives different estimates from the method that simply minimizes the squared innovation itself (section 1.1). Mitchell and Houtekamer (2000) extended the likelihood function in the framework of the ensemble Kalman filter (EnKF; Evensen 2003) by approximating the mean and the covariance of the model state in the Gaussian likelihood function by the ensemble mean and ensemble covariance.

The optimality condition, where the derivative of the likelihood by the covariance parameters vanishes, leads to an explicit representation of the optimal covariances (Maybeck 1982). Using a tropical Pacific Ocean model, Blanchet et al. (1997) compared the covariances obtained through the procedure of Dee (1995) and Maybeck (1982), to another covariance obtained through an intuitive approach by Myers and Tapley (1976).

The optimality condition is also consistent with the relation of the expected value of a cost function (Talagrand 1999; Chapnik et al.2004). Hence, a method by Desroziers and Ivanov (2001), who evaluated the expectation of the cost function to tune parameters of observation noise covariance, is also categorized as a maximum likelihood method.

Adaptive Kalman filtering (Mehra 1970; Gelb 1974) estimates the optimal covariances so that the sample mean of the squared innovations should be equal to the covariance of a Gaussian innovation distribution. This approach is also interpreted as a maximum likelihood method, because the maximum likelihood estimator (MLE) of the covariance matrix of a Gaussian distribution is a sample covariance matrix (e.g. Magnus and Naudecker 1999). While the adaptive Kalman filtering estimates the optimal covariances at each time step, a time-averaged version of the method was also proposed to estimate optimal stationary covariances (Miller and Cane 1989). The relation between the squared innovation and the covariance of the Gaussian leads to alternative relations (Desroziers et al.2005), which are also consistent with the condition of the MLE.

1.3. Bayesian estimation

In addition to the likelihood function, Purser and Parrish (2003) introduced a prior distribution to obtain smooth estimates of parameters of the background-error covariance.

1.4. Covariance matching

The ‘covariance matching’ method (Fu et al.1993) and its extended version (Menemenlis and Chechelnitsky 2000) utilize the sample covariance of the innovations, in which the output of a free simulation run is assigned as the predicted model states, equation image. In the framework of linear and stationary systems, the methods estimate the optimal covariance matrices of system noise and observation noise by combining a relation of the innovation covariance and the Lyapunov equation for the state covariance.

1.5. Present study

The above-mentioned methods for estimating optimal covariance are constructed based on linear-Gaussian state space models, i.e. it is assumed that both the system equation and the observation equation are linear, and that both the system noise and the observation noise obey Gaussian distributions. In the present study, we propose a method for estimating optimal noise covariances in the context of sequential assimilation, even when the system and observation equations are nonlinear. Nonlinearities in the system equation are typically introduced by the advection term in the momentum equation. When the system equation is nonlinear, ensemble-based assimilation methods such as the EnKF are applied to deal directly with the nonlinearity. The present approach for covariance optimization is a maximum likelihood estimation carried out by approximating the likelihood with the ensemble (Table I). Specifically, the approach is also applicable when the noise obeys a non-Gaussian distribution such as the log-normal distribution (e.g. Fletcher and Zupanski 2006) for estimating the optimal parameters that describe the non-Gaussian distribution.

The present paper is organized as follows. In section 2, the basic concept of the EnKF is reviewed, and an ensemble approximation of the likelihood is derived. We then give an example of an application of the method to a coupled atmosphere– ocean model (Zebiak and Cane 1987), into which sea-surface height (SSH) observations are assimilated using the EnKF and the ensemble Kalman smoother (EnKS). The experimental set-up and results are given in sections 3 and 4, respectively. In section 5, we discuss the properties of the estimated covariance and missing physics in the coupled model; the westerly wind bursts (WWBs) in the western Pacific are not reproduced in the EnKS estimate. Conclusions are given in section 6.

2. Ensemble Kalman filter and maximum likelihood

2.1. Ensemble Kalman filter (EnKF) and smoother (EnKS)

The EnKF is designed to deal with state space models consisting of a nonlinear and non-Gaussian system model and a linear and Gaussian observation model:

equation image(1)
equation image(2)

where equation image is a state vector at time t, equation image is a function describing model dynamics including the effects of system noise equation image (possibly non-Gaussian), equation image is a time series, equation image is a matrix, and equation image is Gaussian observation noise with a mean vector of equation image and a covariance matrix of equation image. In the EnKF, distributions of equation image, equation image, and equation image are approximated by many realizations (an ensemble). Even when the observation model is nonlinear, i.e.

equation image(3)

where equation image is a nonlinear function, the EnKF algorithm can still be utilized by introducing an alternative state space model whose observation model is linear. In the alternative state space model, a linear approximation of equation image is used as is the case for the extended Kalman filter EKF; (e.g. Anderson and Moore 2005), section 8.2 or else the state vector equation image is replaced by an augmented state vector consisting of equation image and equation image Evensen (2003), section 4.5.

With N ensemble members, the predicted distribution of equation image given equation image, equation image is approximated by an ensemble consisting of

equation image(4)

for n = 1, ···, N. Here, equation image is a member of the ensemble that approximates the filtered state distribution at the previous time step, equation image, and equation image is a realization of system noise equation image. An ensemble of the current filtered distribution equation image is obtained as

equation image(5)

for n = 1, ···, N. Here, equation image is a realization of observation noise equation image, and equation image is the approximated Kalman gain. The approximated Kalman gain is

equation image(6)

where equation image is a sample covariance matrix of equation image:

equation image(7)
equation image(8)

Here, superscript T means the matrix transpose, and equation image means an ensemble-approximated statistic. The observation noise covariance equation image in (6) may be approximated as a sample covariance of equation image to reduce the computational costs Evensen (2003), section 4.3.

The EnKS (Evensen 2003) is a smoothing algorithm for ensemble approaches, which stems from the fixed-lag smoothing algorithm. The smoothed distribution at time tj, given equation image, equation image, is approximated by ensemble members and computed as

equation image(9)

for n = 1, ···, N. Index j denotes a time lag for smoothing and takes 1, ···, L where L is the fixed lag. Here, equation image is the smoother gain for equation image using equation image:

equation image(10)

where equation image is a sample cross-covariance matrix between equation image and equation image:

equation image(11)
equation image(12)

2.2. Likelihood and its ensemble approximation

Given the observations equation image, the likelihood of the parameter equation image of a state space model is obtained by a product of predictive likelihoods, as (e.g. Kitagawa and Gersch 1996)

equation image(13)
equation image(14)

The parameter vector equation image typically consists of system noise covariances equation image (when equation image obeys a Gaussian distribution: equation image) and observation noise covariances equation image, and may also include an initial state covariance (background-error covariance) and some parameters appearing in the dynamic model equation image, such as a diffusion coefficient.

In the ensemble-based assimilation procedure, the predictive likelihood can be approximated by the mean of each ensemble member. That is,

equation image(15)

where the dotted equality equation image denotes an equality with an ensemble approximation, and δ is the Dirac delta function. Substituting (15) into (14), the log-likelihood is approximated by

equation image(16)
equation image(17)
equation image(18)

where we let equation image, which can be interpreted as the predictive log-likelihood of each ensemble member. We can select optimal parameters equation image by maximizing the ensemble-approximated log-likelihood function (18).

The ensemble approximation (18) does not require the linear and Gaussian observation equation (2); it is instead applicable for general nonlinear and non-Gaussian state space models, which should be dealt with by nonlinear filtering methods such as the particle filter PF, Gordon et al. (1993), Kitagawa (1993), Kitagawa (1996), van Leeuwen (2009). The ensemble approximation is, therefore, justified for any ensemble-based filters.

When the observation equation is linear and observation noise is Gaussian (2), the likelihood of equation image given equation image becomes a Gaussian distribution:

equation image(19)

and the predictive log-likelihood of each member then becomes

equation image(20)

2.3. Sampling error of the ensemble-approximated likelihood function

The log-likelihood function approximated by ensemble (18) has sampling error due to the finite number of ensemble members, N. Here, we propose a method for estimating the sampling error using the bootstrap method (e.g. Efron and Tibshirani 1993).

One way to evaluate the sampling error of the approximated likelihood is carried out by conducting the EnKF computation many times using different sets of random numbers (e.g. Higuchi 1995). The approximated likelihood takes a different value depending on the random numbers used, and distribution of the resultant likelihoods indicates the accuracy of the estimated likelihood. In practice, however, this approach is not feasible, because even a single EnKF computation may have considerable computational cost, especially when a large dynamic model is used. As an alternative, we simply resample the predictive likelihood of each ensemble member obtained in the original EnKF run. From (18), the approximated log-likelihood is computed with the temporal trajectory of the predictive likelihood of each member, equation image. From the set of the N trajectories of the likelihood, we resample N trajectories with replacement and compute the log-likelihood using (18). We repeat the resampling procedure B times to obtain likelihood distribution consisting of B replications of the likelihood.

3. Experiments

In the present study, we conduct experiments for estimating covariance parameters for system noise and observation noise, using the proposed maximum likelihood based on the framework of ensemble-based filters. We use a coupled atmosphere– ocean model with intermediate complexity, specifically the ZC model (Zebiak and Cane 1987), as an application of the proposed method. Intermediate coupled models have previously been used as testbeds for ensemble assimilation experiments such as modelling of model error (Zheng et al.2006; Zheng et al.2007; Zheng and Zhu 2008; Zheng et al.2009), smoothing (Ueno et al.2007), examination of filter parameters ensemble size, covariance localization, and covariance inflation; Karspeck and Anderson (2007), as well as estimation of parameters in the models, although not the covariance parameters as done in the present study, using the EKF (Kondrashov et al.2008).

3.1. Model and data

The ZC model couples two linear shallow-water equations: a steady-state atmospheric model and a dynamic reduced-gravity ocean model. The atmospheric component is forced by heating that depends on sea-surface temperature (SST) and surface wind convergence. The SST evolves through time according to the thermodynamic equation. The ocean component is forced by surface wind stress calculated from surface wind through the bulk formula. The ZC model includes nonlinear equations for the heating anomaly due to the surface wind convergence and for wind stress. All variables are expressed as anomalies with respect to the prescribed monthly mean climatology, and they consist of the horizontal ocean current equation image in the upper layer, the thermocline depth h, the SST T, the atmospheric wind equation image, the surface wind convergence c, heating due to SST s, and the wind stress equation image. The ocean model is regional and extends from 124°E to 80°W and from 29°N to 29°S, in which equation image and h are defined at grid points of 2° × 0.5° resolution, while T and τ are computed at coarser grid points of 5.625° × 2° resolution. The atmospheric model, on the other hand, is global, and equation image is defined at 5.625° × 2° grid points. The dimension of the state vector of the ZC model is 54 403.

We assimilate the TOPEX/POSEIDON (T/P) SSH anomaly observations by the Altimeter Ocean Pathfinder from its repeat cycle 1 to 364 (from 23 September 1992 to 11 August 2002; each cycle takes 9.915625 days). While the data are provided at 1° × 1° spatial resolution, we use those at the grid points every 8° in the zonal and 2° in the meridional directions inside the model ocean basin. We avoided using data outside the Pacific (that is, those in the Arafura Sea, Gulf of Mexico, and Caribbean Sea). The number of data points in each cycle was usually 503, but occasionally decreased to 433 due to partially missing data, and in cycle 118 the data were totally missing.

The ZC model does not solve the SSH anomaly itself. Instead, it computes the thermocline depth anomaly h, which can be converted into the SSH through the isostatic relation as equation image where gg) is the (reduced) gravity and Δg/g = 0.00573.

Figures 1a and 1b show SSH anomalies along the equator observed by T/P, and those reproduced by the ZC model without data assimilation, respectively. The model time step is set to be identical to the T/P sampling interval, 9.915625 days. The spin-up period is set to 133 years. This period was selected because it produces the ZC model output (without data assimilation) that gives the maximum value of the log-likelihood (18) among trial periods from 90 to 180 years. In calculating the log-likelihood, we tentatively assume an observation noise covariance matrix using a scaling parameter α = 1 (see section 4.1). A minimum trial period of 90 years was selected because the characteristic standing oscillation is well reproduced for the period (Zebiak and Cane 1987).

Figure 1.

Temporal variations of the SSH anomalies along the Equator (a) observed by TOPEX/POSEIDON, and (b) reproduced by the ZC model.

The ZC model displays a periodic oscillation that includes an El Niño-like event in 1997– 98, and a La Niña event in 1999– 2000. However, the reproduced SSH variations have many aspects inconsistent with the T/P observation. Firstly, the amplitudes produced by the model are significantly smaller than the observations (note that the scale of the ZC model is three times that of the observation). In addition, the ZC model reproduces SSH anomalies that vary slowly compared to the observation. For example, the 1997– 98 El Niño event ended in the middle of 1998, while the ZC model output shows a warm event that continues until 1999.

3.2. System model

We conduct an EnKF experiment by adding system noise to the thermocline depth anomaly,

equation image(21)

which can be interpreted as uncertainty in the prescribed monthly mean thermocline depth. We first generate realizations of Gaussian system noise equation image with zero mean and covariance matrix equation image: equation image. The matrix equation image is non-diagonal and set to have a horizontal correlation between model grid points equation image and equation image (in degrees) depending on the distance between them:

equation image(22)

where σh is the magnitude and Lx and Ly are zonal and meridional length-scales. Parameters σh, Lx, and Ly are to be optimized to maximize the likelihood function (18).

Since the model thermocline depth anomaly is represented by the sum of the Kelvin and the Rossby components, we can choose either of these components to add system noise. In the present experiment, we add system noise to the Rossby component because it appears on the thermocline depth alone, while the Kelvin component also affects the zonal current anomaly (Cane and Patton 1984). The Gaussian system noise realization is then slightly modified to maintain physical consistency as it should be orthogonal to the Kelvin waves. Specifically, we decompose the Gaussian noise into two components, orthogonal and non-orthogonal to the Kelvin wave, and use the former component as the system noise realization. That is, instead of using Gaussian equation image themselves, we use

equation image

as system noise, where ψ−1 is the parabolic cylinder function of order − 1.

3.3. Observation model

3.3.1. Observation matrix

As mentioned in section 3.1, the model SSH anomaly is linearly related to the thermocline depth anomaly h as equation image. Thus, the observation equation becomes linear.

Since observation matrix equation image is a equation image-by-equation image matrix, the number of rows varies from 433 to 503, depending on missing data. We prescribe a matrix equation image with the maximum number of rows, 503, and reduce equation image by eliminating the elements corresponding to the actual data points and assume it to be equation image. We assume that equation image has an element

equation image(23)
equation image(24)

where equation image and equation image are model grid points (in degrees) at which h is computed and that of the T/P observation, respectively. (23) means that the model SSH anomalies are expressed by a weighted mean of h multiplied by Δg/g. In the present experiments, we set the correlation lengths equation image and equation image to the model ocean grid size, 2° and 0.5°, respectively.

3.3.2. Observation noise

We also assume a Gaussian distribution for the observation noise with zero mean: equation image, where equation image is a covariance matrix. Similarly to the observation matrix equation image, the dimension of equation image varies from 433 × 433 to 503 × 503. We prescribe a matrix equation image of maximum dimension, 503 × 503, and reduce equation image by eliminating the elements corresponding to the data points that are empty and assume it to be equation image. This elimination procedure is equivalent to integrating out the missing observations (e.g. Magnus and Naudecker 1999).

We assume the observation noise matrix as equation image, where equation image is a fixed covariance matrix and α is a scalar coefficient to be optimized with the likelihood function (18). We construct the fixed matrix equation image in two steps: (1) calculate a sample covariance matrix equation image, and (2) regularize equation image to obtain equation image.

In step 1, we calculate a sample covariance equation image for detrended SSH observations. Specifically, we first treat the SSH anomaly data at each point as a one-dimensional time series and smooth it using the first-order trend model (Kitagawa and Gersch 1996). The top panel of Figure 2 shows the SSH anomaly data and its smoothed trend at the Equator, 90°W between 23 September 1992 and 11 August 2002 (T/P cycle 1 to 364). The bottom panel shows the residual between the original data and the smoothed value. Applying the same procedure to the other data points, we obtain residuals for all 503 points inside the basin. Using the residuals, we calculate a sample covariance matrix, denoted as equation image.

Figure 2.

(Top) Temporal variation of the SSH anomaly TOPEX/POSEIDON observation (black) at 0°, 90°W and its smoothed trend (red). The ratio of hyperparameters, τ22, is fixed to 1. (Bottom) The residual between the SSH anomaly and the smoothed value.

Figure 3 shows the sample variance (diagonal elements of equation image) and sample covariance for the element at 5°N, 136°W. According to Kaplan et al. (2004), a band structure in the variance at 2– 8°N between 160°W and 120°W may represent temporal variability, while the northwestern variability maxima and the eastern Pacific variability band near 10°N represent spatial variability. Regarding the covariance (Figure 3(b)), while positive elements are identified in the meridional direction along 136°W, negative elements appear at neighbouring observation points aligned in the zonal direction (along 5°N). On average, the variance is 2.68 cm2 = (1.64 cm)2, and the zonal and meridional correlation scales are 5.47° and 2.75°, respectively.

Figure 3.

Sample covariance equation image calculated from the SSH residuals: (a) variance (diagonal elements), and (b) covariance for the element at 136°W, 5°N.

In step 2, we convert the sample covariance matrix equation image into a regular matrix equation image, because equation image is singular. The singularity comes from the fact that the 503 × 503 matrix equation image has a lower rank than the length of the SSH data, 364. If we assumed equation image rather than equation image, the singularity of equation image prevents us from evaluating the log-likelihood function (18) because its determinant equation image becomes zero and the inverse equation image does not exist. We convert equation image to a regularized matrix equation image using a Gaussian graphical model of 12 neighbours (Ueno and Tsuchiya 2009). The obtained covariance equation image has identical values for the diagonal elements (variances) and for the non-diagonal elements for variables inside the 12 neighbours. Figure 4(a) shows the regularized variance, equation image, which is identical to the sample variance equation image shown in Figure 3(a). The regularized covariance shown in Figure 4(b) displays reduced correlations between distant points.

Figure 4.

Estimated covariance matrix equation image with a Gaussian graphical model of 12 neighbours in the same format as Figure 3.

4. Results

Using 512 ensemble members, we run the EnKF procedure from T/P cycle 1 to cycle 364. The original form of the EnKF is used in this study although variants of the EnKF are available (e.g. Evensen 2003), section 2. Neither multiplicative covariance inflation nor covariance localization are applied. The number of ensemble members, 512, is selected because it is larger than the maximum number of data points, 503; mathematically, the EnKF is not degraded when the ensemble size is greater than the number of independent observations (Ueno et al. 2006).

4.1. Maximum likelihood estimation of the covariance parameters

Our data assimilation model has four parameters that specify the noise covariances: the system noise magnitude σh, its zonal decorrelation length Lx, its meridional decorrelation length Ly, and the coefficient α that governs the magnitude of the observation noise matrix by equation image. We optimize these parameters equation image by maximum likelihood based on the assumption that the parameters take several prescribed values. Specifically, we assume that

equation image(25)
equation image(26)
equation image(27)
equation image(28)

The set of system noise magnitude σh corresponds to a SSH (cm) of

{0.0573, 0.115, 0.287, 0.573, 1.15, 2.87, 5.73}, and those of α correspond to the mean variance of the observation noise of equation image We select the best combination of these parameters that gives the maximum value of the likelihood equation image (18).

We show the profile log-likelihoods (e.g. Diggle and Ribeiro 2007) for the system noise parameters and the observation noise coefficient in Figure 5. The profile log-likelihoods for equation image, equation image, and equation image are respectively defined as

equation image(29)
equation image(30)
equation image(31)
Figure 5.

Profile log-likelihood for system noise parameters (magnitude σh, decorrelation length in the zonal direction Lx, and in the meridional direction Ly) and the observation noise coefficient α. (a, d), (b, e) and (c, f) show the dependence on σh, Lx, and Ly, respectively. (a)– (c) are presented for 1 ≤ α≤ 20, and (d)– (f) for 10 ≤ α≤ 500.

Figure 5(a) shows equation image for 1 ≤ α≤ 20. When α = 1, the profile log-likelihood gradually increases with σh, reaches a peak at σh = 1, and rapidly decreases for σh ≥ 5. When we increase the value of α to 2, 5, 10, and 20, we observe that the profile log-likelihood shifts upwards and becomes flatter. In addition, two peaks are identified for α = 2 and α = 5: σh = 0.1 m and 1 m for α = 2, and σh = 0.1 m and 2 m for α = 5. The curves for α = 10 and α = 20 also have two peaks, which will be identified in Figure 5(d).

The dependence on the zonal decorrelation length is shown in Figure 5(b). The equation image curve increases with Lx and has a single peak of − 1656623.9 at Lx = 20°, and slightly decreases to − 1657810.7 at Lx = 40°. For increasing α, curves having a single peak at Lx = 8° for α = 2 and Lx = 20° for α = 5 are also observed, and the curves become flatter.

Regarding the meridional decorrelation length, the Ly dependence contains two peaks at Ly = 1° and 5° for 1 ≤ α≤ 5 (Figure 5(c)). The latter peak corresponds to the maximum. Similarly to the σh- and Lx- profiles, as α becomes larger equation image profiles are flatter. Comparing the profiles of equation image, equation image, and equation image for a certain fixed α, we find that the profile likelihood is more sensitive to σh than to Lx and Ly.

Figures 5(d)– (f) show the profile log-likelihoods for 10 ≤ α≤ 500. The vertical axes have been enlarged compared to the top three panels. We can clearly identify the upward shift of the curves when α increases from 10 to 20, which was seen in the top panels. When α is over 20, the profile likelihoods begin to decrease with increasing α. The profile likelihoods also become flatter for larger α values. The α dependence in Figure 5(d) shows two peaks at σh = 0.1 m and 1 m for α = 10, and at σh = 0.2 m and 2 m for α = 20. While we identified only a single peak in the Lx dependence for small α (Figure 5(b)), two peaks appear for α = 10 and 20, and for larger α at Lx = 4° and 20° as shown in Figure 5(e). On the other hand, the Ly curves for α = 10 and 20 have a single peak at Ly = 5° (Figure 5(f)), while there were two peaks for smaller α values (Figure 5(c)). For larger α (≥50), the number of peaks again becomes two at Ly = 1° and 5°.

From Figure 5, we find that the likelihood maximizes at α = 20, σh = 2 m, Lx = 20°, and Ly = 5°.

Figure 6 shows the profile log-likelihood only for α, which is defined as

equation image(32)
Figure 6.

Profile log-likelihood function for observation noise coefficient α.

The equation image curve shows an increases for small α values, a single peak at α = 20, and a decrease for α≥ 50.

We confirm that the properties of the profile likelihoods shown above are statistically significant. Table II shows the distribution of the bootstrap replication of likelihoods as a function of α, where the bootstrap sample size B is 512. The distribution of the log-likelihood values are very confined and there are no overlaps for different α values. We therefore conclude that α = 20 clearly produces a better EnKF estimate than the other α values. Table III shows the distribution of the replicated likelihoods when α = 20. Again, the distribution of the log-likelihood values are very confined and there are no overlaps for different parameters. We conclude that σh = 2 m, Lx = 20°, and Lx = 5° produce the maximum likelihood value, and that there are two peaks in equation image (at σh = 0.2 m and 2 m) and equation image (at Lx = 4° and 20°).

Table II. Summary of the profile likelihood distribution generated with the bootstrap method, as a function of α, from 512 samples.
αMinimumLower quartileMedian Upper quartileMaximum
1–1 659 269.2–1 658 614.6–1 658 420.2–1 658 276.1–1 659 252.3
2–967 336.4–966 977.0–966 891.8–966 802.0–966 551.6
5–601 592.0–601 355.2–601 295.1–601 237.7–601 075.4
10–522 902.1–522 772.5–522 712.0–522 656.1–522 556.8
20–513 771.1–513 672.7–513 650.8–513 630.8–513 579.4
50–554 755.4–554 736.2–554 727.2–554 719.9–554 704.7
100–604 073.8–604 070.6–604 069.6–604 068.7–604 073.3
200–660 064.0–660 061.6–660 060.9–660 060.2–660 058.0
500–739 303.2–739 301.6–739 301.2–739 300.6–739 299.1
Table III. Summary of the profile likelihood distribution generated with the bootstrap method. The samples are obtained for α = 20 as a function of (a) system noise magnitude σh, (b) zonal decorrelation length Lx, and (c) meridional decorrelation length Lx.
 MinimumLower quartileMedian Upper quartileMaximum
(a) σh (m)     
0.1–515 111.4–515 079.2–515 069.1–515 061.3–515 110.1
0.2–514 230.7–514 194.6–514 185.3–514 173.9–514 228.1
0.5–514 935.6–514 891.2–514 876.2–514 862.4–514 824.6
1–516 454.7–516 392.4–516 375.2–516 359.1–516 444.6
2–513 771.1–513 672.7–513 650.8–513 630.8–513 579.4
5–517 027.7–516 935.0–516 907.7–516 881.8–517 027.7
10–526 419.9–526 289.2–526 255.4–526 215.5–526 108.8
(b) Lx (deg)     
4–515 141.0–515 116.4–515 109.3–515 101.9–515 141.0
8–515 570.4–515 512.1–515 498.2–515 483.4–515 561.9
20–513 771.1–513 672.7–513 650.8–513 630.8–513 579.4
40–515 111.4–515 079.2–515 069.1–515 061.3–515 110.1
(c)Ly (deg)     
1–516 402.8–516 377.1–516 369.2–516 362.0–516 340.5
2–514 230.7–514 194.6–514 185.3–514 173.9–514 228.1
5–513 771.1–513 672.7–513 650.8–513 630.8–513 579.4
10–514 935.6–514 891.2–514 876.2–514 862.4–514 824.6

4.2. Covariance dependence of the EnKF estimates

Figure 7 shows the filtered estimates of the SSH anomalies along the Equator for selected values of σh and α. The estimates are defined as means of the ensemble members. Figure 7(g) is the result with the smallest system noise and the largest observation noise, while Figure 7(c) corresponds to that with the largest system noise and the smallest observation noise.

Figure 7.

Temporal variations of the filtered estimates of SSH anomalies along the Equator for σh∈{0.5 m,1 m,5 m} and α∈{5, 20, 500}.

Even with the smallest system noise and the largest observation noise (Figure 7(g)), the estimate displays realistic SSH variations compared to the ZC model output (Figure 1(b)), but the reproduced amplitude is still small. When we decrease the observation noise, we obtain filtered estimates that have larger amplitudes and appear to be similar to the observation (Figure 1(a)), as we expected. However, the result for the largest system noise and the smallest observation noise (Figure 7(c)) appears overestimated compared to the observations: positive SSH anomalies reproduced in the central-to-eastern Pacific after 1999 are too large. This fact is contrary to our expectation, i.e. the larger the system noise, the closer the filtered estimates to the observation. We consider that this may be brought about by the nonlinearity of the system model. Figure 7(e) is the result that is obtained using the system noise and observation noise that maximize the likelihood, and will be examined in the next section.

It is known that the standard Kalman filter for linear and Gaussian state space models produces the same estimates when the system noise and the observation noise covariance matrices equation image and equation image are replaced by those multiplied by a common scalar factor a. (i.e. equation image and equation image, respectively) (e.g. Kitagawa and Gersch 1996). However, this is not necessarily the case for this experiment, in which the system model based on the ZC model is nonlinear. Filtered estimates of SSH in Figures 7(a), (e) and (i) are obtained using σh and α that have a common ratio equation image of 1 m2/5, equivalent to equation image in terms of the variance ratio of SSH anomalies. The filtered estimates appear similar, but are not identical. The log-likelihood indicates that the estimates shown in Figure 7(e) are better than the other estimates, Figures 7(a) and (i).

4.3. State estimation using the optimal covariance

4.3.1. SSH estimate

Using the parameters identified to maximize the likelihood, namely, α = 20, σh = 2 m, Lx = 20°, and Ly = 5°, we obtain the filtered SSH anomalies along the Equator shown in Figure 8(a). We use the ensemble means as the estimates. The filtered estimates appear to reproduce the observed variation after 1994: the 1997– 98 ENSO event as well as the annual variations are identifiable in the filtered estimates. Figure 8(b) presents the smoothed estimates. Here, we assume the lag to be L = t − 1. All states before time t then will be updated by equation image, and resultant states smoothed by equation image are equivalent to those obtained by the fixed interval smoother (364 is the total number of observation time steps; section 3). Improvements of the smoothed over the filtered estimates are found in the negative SSH anomalies reproduced in the central and eastern equatorial Pacific in 1995– 1997 and 1999– 2002.

Figure 8.

Temporal variations of the (a) filtered and (b) smoothed estimates of SSH anomalies along the Equator for the optimal parameters: α = 20, σh = 2 m, Lx = 20°, and Ly = 5°. The estimates are assumed to be the ensemble means.

The T/P observation, the ZC model output, the filtered estimate, and the smoothed estimate are compared in Figure 9, which shows the time evolution of the area-averaged SSH anomalies in the Niño 1 + 2, 3, 3.4, and 4 SST regions. As seen in Figure 1(b), the amplitude of the SSH anomalies from the ZC model is very small in all regions. The retrieved negative anomalies by the EnKS found in Figure 8(b) were clearly seen in Niño 3 and 3.4 in 1995 and during 1998– 2000. Similarly, the filtered estimate in Niño 1 + 2 is positively biased, and this effect is corrected in the smoothed estimate. The smoothed estimates appear to be reasonable in Niño 1 + 2, while the observed strong and sharp variations in Niño 3 and 3.4 cannot be reproduced by the smoothed estimates. The smoothed estimate in Niño 4 appears almost to coincide with the filtered estimate.

Figure 9.

Area-averaged SSH anomalies of the T/P observation (thin black), the ZC model output (bold black), the filtered estimate (blue), and the smoothed estimate (red) in areas (a) Niño 1 + 2 (0– 10°S, 90– 80°W), (b) Niño 3 (5°N– 5°S, 150– 90°W), (c) Niño 3.4 (5°N– 5°S, 170– 120°W), and (d) Niño 4 (5°N– 5°S, 160°E– 150°W).

Figure 10 shows sample correlation coefficients between T/P observation and model SSH variations. The ZC model displays correlations of ∼0.5 around the Equator, while no correlations or even negative correlations are found in the midlatitude regions (Figure 10(a)). The predicted and the filtered estimates (Figures 10(b) and (c)) had higher correlations with the observations of approximately 0.8 over almost the whole domain, but the correlations are still low (0.2) along the Equator. For the smoothed estimates (Figure 10(d)), correlations with the observations along the Equator increase, while those in off-equatorial regions decrease compared to the filtered estimate. Figure 10(e) shows the difference in the correlation coefficients between the filtered estimate and the predicted estimate. This difference is interpreted as correlation gained by each observation. Since the difference in the correlation coefficients is small (∼0.1) compared to the correlation of the filtered estimates (Figure 10(c)), the present EnKF setting is not considered to result in over-fitting. The differences between the smoothed estimate and the filtered estimate are shown in Figure 10(f). The correlations gained by smoothing are positive along the Equator, but are negative in off-equatorial regions. That is, the filtered estimates along the Equator are updated by the smoothed estimates that are closer to the data, while those in the off-equatorial regions are separated from the data.

Figure 10.

Sample correlation coefficients between T/P observations and model SSH variations for (a) the ZC model, (b) the predicted estimate, (c) the filtered estimate, and (d) the smoothed estimate. (e) shows differences (c) minus (b), and (f) shows differences (d) minus (c).

Figure 11 shows the mean squared error (MSE) between the T/P observations and model SSH variations. Large MSEs are found in the central-to-eastern equatorial Pacific and in the western Pacific by the ZC model (Figure 11(a)). Since the correlation in the equatorial region was not low (∼0.5, Figure 10(a)), the large MSE is due to inconsistency between the amplitudes of the observed and the model SSH variations. With the EnKF, MSEs are significantly improved in the western regions but the inconsistency in the central-to-eastern equatorial Pacific remains (Figures 11(b) and (c)). The inconsistency is lessened by the smoothed estimate as shown in Figure 11(d), but there are still large MSEs of ∼80 cm2 or more in the central equatorial Pacific and in the region of Indonesia. Since correlations were not low in these regions, as shown in Figure 10(d), the present assimilation system cannot reproduce such SSH amplitudes with a fine structure in space and time. Figures 11(e) and (f) display the same tendency that was found in the correlation in Figures 10(e) and (f). Namely, the reduction in MSE from the predicted estimate to the filtered estimate is small compared to the MSE of the filtered estimate, and the smoothing procedure significantly reduces the MSE along the Equator but increases the MSE in off-equatorial regions.

Figure 11.

Mean squared error between T/P observation and model SSH variations for (a) the ZC model, (b) the predicted estimate, (c) the filtered estimate, and (d) the smoothed estimate. (e) shows the differences (b) minus (c), and (f) shows differences (c) minus (d).

4.3.2. SST and wind estimates

While only SSH anomaly data were assimilated, the coupled model can produce estimates of all variables included in the model. In Figure 12, we show SST anomalies from the ZC model output, the filtered estimate, and the smoothed estimate, and compare them with independent SST observations (Reynolds et al.2002). We obtained the monthly anomalies by subtracting the mean values (from December 1981 to February 2006) from the original monthly data. The primary temporal characteristics of the smoothed estimate were similar to the observation, although the magnitude appeared slightly small. The filtered estimates tend to be large, which is slightly corrected by the EnKS by approximately 0.5 °C.

Figure 12.

Area-averaged SST anomalies of the Reynolds et al. observation (thin black) and of the ZC model output (bold black), the filtered estimate (blue) and the smoothed estimate (red) in areas (a) Niño 1 + 2 (0– 10°S, 90– 80°W), (b) Niño 3 (5°N– 5°S, 150– 90°W), (c) Niño 3.4 (5°N– 5°S, 170– 120°W), and (d) Niño 4 (5°N– 5°S, 160°E– 150°W). Note that the smoothed estimate is obtained without SST data assimilation.

Figure 13 shows the temporal evolution of area-averaged wind anomalies for the observations (National Centers for Environmental Prediction (NCEP) Reanalysis 2 data), the ZC model output, the EnKF estimate, and the EnKS estimate. The wind anomalies were computed by subtracting the mean values (from January 1979 to December 2005) from the original data. Figures 13(a)– (c) show three indices representing equatorial zonal wind anomalies in the western, central and eastern Pacific, designated as regions TW1, TW2, and TW3, respectively. In TW1, the filtered and the smoothed estimates display wind anomalies that are slightly weakened compared to the ZC model, and reproduce almost none of the characteristic variations of the independent reanalysis data. In particular, the estimates cannot reproduce westerly anomalies in 1993, 1994, 1997, and 2002, some of which are believed to correspond to the westerly wind bursts (WWBs) related to the El Niño events (e.g. McPhaden 1999). In TW2, the estimates are better, that is, westerly anomalies in 1997 and 1998 are reproduced, but the amplitude is still half that of the observation. The wind estimates in TW3 are again better. In TW3, though the ZC model output showed very different variations from the NCEP 2 reanalysis data, the estimated variations by the EnKF and the EnKS nearly coincide with the observations.

Figure 13.

Area-averaged zonal wind anomalies (positive westerly) for the observation (thin black), for the ZC model (bold black), for the filtered estimate (blue), and for the smoothed estimate (red) in areas (a) TW1 (5°N– 5°S, 135°E– 180°), (b) TW2 (5°N– 5°S, 180– 140°W), and (c) TW3 (5°N– 5°S, 140– 120°W).

Figure 14 shows temporal variations of zonal wind anomalies of the reanalysis data and smoothed estimates obtained by the SSH assimilation. The reanalysis data show strong westerlies in the TW1 region. These westerlies appear particularly in the western half of the region (west of 160°E), but no such intensification is identified in the smoothed estimates. This produces the difference between the observed and the estimated TW1 indices shown in Figure 13(a).

Figure 14.

Temporal variations of zonal wind anomalies of the (a) independent reanalysis data (NCEP Reanalysis 2) and (b) smoothed estimates obtained by the SSH assimilation. Values are averaged over 5°N– 5°S.

4.3.3. 1997– 98 ENSO event

We examine the 1997– 98 ENSO event, bearing in mind that there is discrepancy between the wind reanalysis data and the wind estimates over the western Pacific (in the TW1 region). Figure 15 shows the observed evolution of SST and wind anomalies during the 1997– 98 ENSO event. As for the wind anomalies, the SST anomalies were computed by subtracting the mean values (from January 1979 to December 2005) from the original data. Figure 16 shows the monthly mean of the smoothed estimates of the SST and wind anomalies. The scale of the wind anomaly estimates is twice that of the observation displayed in Figure 15.

Figure 15.

Observed SST anomalies (left) and wind anomalies (right) in (a, b) April, (c, d) July, and (e, f) October 1997 and (g, h) January, (i, j) April, and (k, l) July 1998. Wind anomalies larger than 1 m s−1 are shown.

Figure 16.

As Figure 15, but for smoothed estimates of SST anomalies (left) and wind anomalies (right). For wind anomalies here, those larger than 0.5 m s−1 are displayed, and the vector scale is twice that in Figure 15.

In April 1997, the observations show warm anomalies around the date line and strong westerly wind anomalies in the western Pacific (Figures 15(a) and (b)). However, the smoothed estimates in Figures 16(a) and (b) have a rather uniform SST field and weak easterly wind field. This discrepancy of the wind anomalies were previously identified with the inconsistent TW1 indices shown in Figure 13.

After that, however, the smoothed estimates reproduce consistent wind fields, at least qualitatively. In October 1997, both observed and estimated wind anomalies intensified (Figures 15(f) and 16(f)): large westerly wind anomalies covered the whole equatorial central Pacific, and enhanced equatorward wind anomalies appeared in the eastern Pacific. Observed wind anomalies in April and July 1998 displayed easterlies in the western Pacific and westerlies in the central and eastern Pacific (Figures 15(j) and (l)). These wind properties were also evident in the smoothed estimate (Figures 16(j) and (l)).

4.4. Effects of lag on the EnKS

In this section, we evaluate the effects of lag on the EnKS. Following Evensen (2003), we represent the ensemble consisting of realizations of state and observation noise with matrices equation image and equation image, respectively:

equation image(33)
equation image(34)

The EnKF and the EnKS algorithms ((5) and (9)) are then written as

equation image(35)
equation image(36)
equation image(37)
equation image(38)

both of which have a common matrix equation image defined as

equation image(39)

where equation image is the unit matrix of order N, and we put

equation image(40)
equation image(41)

using equation image as the N × N matrix that has values of unity for its all elements.

From (38), the smoothed ensemble at time step t and lag j is represented as

equation image(42)
equation image(43)
equation image(44)
equation image(45)

Therefore, the predicted ensemble equation image is updated to the filtered ensemble equation image by multiplying by equation image, and to the smoothed ensemble equation image by further multiplying by equation image.

We evaluate the changes to the ensemble at both the filtering and the smoothing steps. From (36) and (45), we can write the increments of the filtered ensemble equation image and the smoothed ensemble equation image compared to the predicted ensemble equation image and the previous smoothed ensemble equation image, respectively, as

equation image(46)
equation image(47)

where

equation image(48)
equation image(49)

The matrix equation image is interpreted as an increment scaled by the predicted ensemble equation image and quantifies the effect of filtering and smoothing for j = 0 and j = 1, ···, L, respectively.

Figure 17(a) shows the temporal variation of equation image in terms of the 2-norm of the matrix. The 2-norm of a matrix is equal to the largest singular value of the matrix (e.g. Saad 2003), p. 9. equation image increases with time until 1996, when it abruptly drops. The curve increases again, and then decrease slowly after 1999.

Figure 17.

Temporal variations of 2-norms of (a) equation image, (b) equation image, and (c) those that are normalized to equation image.

The increments equation image are shown in Figure 17(b), for estimates at the first time step in January of each year. Every curve decreases exponentially with increasing lag j. Figure 17(c) shows normalized increments to equation image, which is an increment at the filtering step. The normalized increments are larger than unity for a few months after filtering is performed. This means that the effects of the smoothing is no less than that of the filtering for a few months. The effect of smoothing decreases by 1/e approximately one year after the filtering step.

5. Discussion

5.1. Estimated covariance parameters

In the present study, we have estimated three parameters that specify the system noise covariance and one parameter for the observation noise covariance. The estimated parameters for system noise imposed on the thermocline depth anomalies are σh = 2 m, Lx = 20°, and Ly = 5°. The magnitude, σh = 2 m, is smaller than system noise at the 20° isotherm depth used by Bennett et al. (1998), who assumed the magnitude to be 7.5 × 10−6 m s−1 × 9.915625 days = 6.4 m. That is, the ZC model in the present experiment operates within their expected accuracy.

The longer decorrelation scale in the zonal direction than in the meridional direction was also found in estimations of system noise imposed on wind stress for the tropical Pacific as equation image by Miller and Cane (1989) (or those assumed by Miller et al. (1995) as equation image), but the estimated scales for the thermocline depth anomalies in the present paper are longer than those for the wind stress.

The profile likelihood has multiple peaks as shown in Figure 5. We observed two peaks in equation image at σh = 0.2 m and 2 m, in equation image at Lx = 4° and 20°, and in equation image at Ly = 1° and 5°. If we examine the original form of the log-likelihood equation image, rather than the profile likelihood shown in section 4.1, we find two peaks in the four-dimensional parameter space of equation image. The first peak is located at equation image, and the second one is at equation image. The latter peak corresponds to what we identified by examining the profile log-likelihood. We speculate that the two peaks may correspond to two characteristic length-scales in the equatorial Pacific. Specifically, the longer zonal scale of Lx = 20° may be related to the equatorial Kelvin waves, and the other peak of Lx = 4° to the equatorial Rossby waves.

The estimated observation noise parameter α = 20 corresponds to a mean variance of equation image. The mean decorrelation lengths are fixed to be 5.47° and 2.75° in the zonal and meridional directions, respectively. The optimal variance is comparable to the estimate of equation image by Fukumori (1995), who obtained this value with the covariance matching method (Fu et al.1993) for an intermediate ocean model. For intermediate coupled models, observation noise variances have been assumed apriori to be comparable or smaller than the optimal variance in this study: equation image by Lee et al. (2000), equation image by Ballabrera-Poy et al. (2001), equation image by Bennett et al. (1998, 2000), equation image by Ueno et al. (2007), and equation image by Sun et al. (2002) and Kondrashov et al. (2008).

Furthermore, the observation noise variance of equation image is found to be larger than the sample variance of the SSH observation; the mean of the sample variance is computed as equation image. This fact may imply that the present assimilation system has properties that are inconsistent with the data.

To examine the details, the spatial distribution of the variances of observation noise is displayed in Figure 18(a), accompanied by the sample variances of the SSH observation (Figure 18(b)). Figure 18(c) shows the difference between the two. While the estimated variances of observation noise are smaller than the sample variances of the SSH observation along the Equator and in the western equatorial Pacific (coloured in blue), the relation is opposite in off-equatorial regions at latitudes higher than 15°, especially east of 150°W in the Southern Hemisphere. The regions of small observation noise variance are considered to be regions where the present assimilation system based on the ZC model can reproduce realistic SSH variations. In the off-equatorial regions, on the other hand, the ZC model has almost no ability to reproduce SSH variations in phase with the observations (Figure 10(a)). Due to the out-of-phase variations by the dynamic model, the difference between the model and the observations becomes large in off-equatorial regions.

Figure 18.

Variance of (a) the estimated observation noise, and of (b) the SSH observation. (c) shows the difference (a)– (b) between the two variances.

5.2. Estimated meridional decorrelation length and equatorial waves

As seen in Figure 1(b), the ZC model reproduces eastward propagating anomalies along the Equator. For example, positive anomalies in the western Pacific in 1996 and 1997 propagate eastward to reach the eastern Pacific in 1997 and 1998, and negative anomalies successively propagate from the west to the east. In addition, eastward propagation of the SSH anomalies is also identified in the observations (Figure 1(a)).

However, the eastward propagation is not clearly evident in the data assimilation results (Figure 8). In the filtered estimates (Figure 8(a)), some positive SSH anomalies propagate westward from the eastern to western equatorial Pacific in 1999, 2000, and 2001, and negative anomalies in 1998 are confined to the western and central regions. Both of these phenomena seem physically inconsistent with the observations and the ZC model. In the smoothed estimates (Figure 8(b)), while westward-propagating positive anomalies are reduced, negative anomalies are still confined to the western and central Pacific.

This inconsistency may signify that the present data assimilation experiment is eliminating a property of the equatorial Kelvin wave, namely the fact that the wave propagates eastward. This may be due to the meridional length-scale of system noise, Ly, which was estimated to be 5°≃ 550 km by the maximum likelihood, being too large to represent the equatorial Kelvin waves. These waves are known to have a meridional length-scale of the equatorial radius of deformation of 100– 250 km (e.g. Gill 1982, p. 437). We next examine the effect of a smaller meridional scale length of Ly = 1°≃110 km (all other parameters being the same as those obtained by the maximum likelihood: α = 20, σh = 2 m, and Lx = 20°). Figures 19(a) and (b) show the filtered and smoothed estimates of SSH anomalies, respectively. As expected, the westward-propagating positive anomalies are greatly reduced in the filtered estimates, and eastward-propagating anomalies become dominant in 2000 and 2001. In the smoothed estimates, the negative anomalies in 1998 propagate beyond the central Pacific and reach the eastern Pacific, which is what we would expect from the ZC model and T/P observations. In Figure 19(c), we show the smoothed estimates of zonal wind anomalies obtained with Ly = 1°. Westerly winds in the central Pacific in 1997 and 1998 are found to propagate eastward to the eastern Pacific. This is also what was observed (Figure 14(a)), but was not reproduced in the data assimilation with Ly = 5° (Figure 14(b)).

Figure 19.

Temporal variation of the (a) filtered and (b) smoothed estimates of SSH anomalies along the Equator, and (c) smoothed estimates of zonal wind anomalies averaged over 5°N– 5°S. The estimates are obtained with Ly = 1° and the other parameters remain unchanged (α = 20, σh = 2 m, and Lx = 20°).

Judging from the above estimates, Ly = 1° seems preferable to Ly = 5°. However, this value was not selected by the maximum likelihood. This implies that the smaller value of Ly is better only along the Equator, whereas Ly = 5° is preferable in non-equatorial regions. Figure 20 shows SSH anomalies along 12.5°N. Around the off-equatorial regions the Rossby waves become dominant, and these propagate westward. While the smoothed estimates with Ly = 5° shown in Figure 20(b) appear reasonable, those using Ly = 1° in Figure 20(c) exhibit waves with unrealistically fine structures, in which positive and negative anomalies are reproduced alternatively in the model grid scale of 2° in the zonal direction. The fact that Ly = 1° is suitable only for the narrow equatorial region is the reason why Ly = 5° was selected by the maximum likelihood, which accounts for both equatorial and non-equatorial regions.

Figure 20.

Temporal variations of the SSH anomalies along 12.5°N (a) observed by TOPEX/POSEIDON, (b) estimated by the EnKS with Ly = 5°, (c) estimated by the EnKS with Ly = 1°, and (d) reproduced by the ZC model.

5.3. Gaussian predictive distribution

The likelihood is a product of the predictive likelihood at each time step (14). We show that the predictive likelihood is equivalent to the ensemble-approximated likelihood that was proposed by Mitchell and Houtekamer (2000) if the predictive distribution is assumed to be a Gaussian.

In addition to the linear and Gaussian assumption of the observation model (2), if the predictive distribution is assumed to be a Gaussian equation image, that is,

equation image(50)

then the predictive likelihood can be derived analytically and is also found to be Gaussian:

equation image(51)
equation image(52)

(e.g. Kitagawa and Gersch 1996, Lemma 2, p. 86). When the mean vector equation image and the covariance matrix equation image are respectively approximated by equation image and equation image ((7) and (8)), the Gaussian predictive likelihood (52) becomes

equation image(53)

The Gaussian predictive likelihood (52) and its ensemble approximation (53) are consistent with what was proposed by Dee (1995) and Mitchell and Houtekamer (2000), respectively, both of whom estimated model error covariance parameters.

The assumption of Gaussian predictive ensemble (50) may appear to be suitable for the EnKF, because the filtering procedure in EnKF is also constructed based on the Gaussian predictive distribution. However, from a conceptual point of view, we suggest the use of the ensemble approximation of the likelihood ((18) and (20)) rather than the Gaussian predictive likelihood (53) for the following two reasons. Firstly, the assumption of the Gaussian predictive distribution is groundless when the dynamic model is nonlinear as in (1). Secondly, the evaluation of the predictive likelihood is made with the predicted ensemble, and therefore should be independent of the filtering procedure, which updates the predicted ensemble to the filtered ensemble.

To understand the two reasons more clearly, let us suppose that we conduct two assimilation experiments using the EnKF and the PF, and by chance have two identical predicted ensembles at a certain time step. This situation rarely occurs, but it does at least just before the first filtering computation when we assume a common system model in the two experiments. In any case, if the two predicted ensembles are identical, it is natural that the two resultant predictive likelihoods also takes identical values regardless of the adopted filtering method (i.e. EnKF or PF), as long as ensemble-based filters are adopted. To obtain identical likelihood, we need to perform a common procedure for likelihood computation in the two experiments. As the common procedure, we propose the ensemble approximation of the likelihood rather than the Gaussian likelihood, which is conceptually unjustified.

In the computational sense, however, the Gaussian assumption may be useful, especially when the ensemble size is small. With a small ensemble size, the estimate of the predictive likelihood (15) may become unstable. This is because the predictive likelihood of each member, equation image, tends to take values that differs significantly from the other members. The difference results from the fact that equation image is defined in a form that amplifies the difference of each member's state equation image with its quadratic form to obtain equation image (20) and with the exponential function. With the Gaussian assumption, on the other hand, each member's state is used to compute the ensemble mean equation image and the ensemble covariance equation image before the quadratic form and exponential function are applied (53). Therefore, the differences of the states among the members are not amplified, and we expect the Gaussian predictive likelihood to be stable even with an ensemble of small size. The Gaussian likelihood is qualitatively different from the ensemble-approximated likelihood even with large ensemble members; the Gaussian likelihood may select sub-optimal covariances.

Here we demonstrate a qualitative difference between the ensemble-approximated likelihood and the Gaussian likelihood in the present application. Figures 21 and 22 show the profile likelihoods using the Gaussian likelihood function (53). It can be seen from Figure 21 that the α = 10 and α = 20 curves cross each other, whereas using the ensemble-approximated likelihood, no such crossing occurs for any two α values (Figure 5). Using Gaussian likelihood, the α = 10 curve is highest with system noise parameters σh = 2 m, Lx = 20°, and Ly = 2°, but it becomes lower than the α = 20 curve when a different set of parameters is selected. Figure 22 displays a single peak at α = 10, which corresponds to the maximum. Compared with ensemble-approximated likelihood (Figure 6), the Gaussian likelihood tends to be large for small values of α, that is for a small observation noise matrix. The optimal parameters for Gaussian likelihood (α = 10, σh = 2 m, Lx = 20°, and Ly = 2°) represent a smaller observation noise and meridional length-scale for system noise than those for ensemble-approximated likelihood (α = 20, σh = 2 m, Lx = 20°, and Ly = 5°; section 4.1).

Figure 21.

Profile log-likelihood as a function of system noise parameters and observation noise coefficient using Gaussian likelihood (53), for comparison with Figure 5.

Figure 22.

Profile log-likelihood as a function of observation noise coefficient using Gaussian likelihood (53), for comparison with Figure 6.

Figure 23 shows the assimilation results with the parameters estimated by the Gaussian likelihood. The filtered estimates (Figure 23(a)) appear to be positively biased compared with those obtained by the ensemble-approximated likelihood (Figure 8(a)). The positively biased estimates may be the reason why the set of parameters was not selected with the ensemble-approximated likelihood, and may be due to the small observation noise parameter. With the small meridional length-scale, westward-propagating anomalies are reproduced more evidently in the smoothed estimates as shown in Figure 23(b). The value of Ly = 2° does not estimate waves with unrealistically fine structures in the off-equatorial regions (Figure 23(c)), as was the case for Ly = 1° shown in Figure 20(c).

Figure 23.

Temporal variation of the (a) filtered and (b) smoothed estimates of SSH anomalies along the Equator, and (c) smoothed estimates of SSH anomalies along 12.5°N. The estimates are obtained with parameters estimated by the Gaussian likelihood (53): α = 10, σh = 2 m, Lx = 20°, and Ly = 2°.

5.4. Dominance of the observation noise coefficient

In section 4.1, while the likelihood was found to be more sensitive to σh than to Lx and Ly for a fixed α, the likelihood was mainly controlled by α. That is, once the observation noise magnitude is determined, the other parameters for system noise have little effect on the likelihood. This tendency can be identified even when the uncertainty due to the sampling error of the ensemble is taken into account in the bootstrap experiments as seen in Tables II and III.

We consider the imbalance between the system noise parameters and the observation noise parameter in this section. One possibility is that the intervals between the prescribed values of α (28) are too large compared to the intervals between the prescribed system noise parameters. However, this does not seem to be the case when we compare the intervals of α and σh. As seen in section 4.1, the profile likelihood has its peak at α = 20 and σh = 2 m. If we normalize α to α = 20, we obtain

equation image(54)

whose elements are interpreted as variance ratios of the observation noise with respect to the peak. Similarly, since the system noise variance is proportional to equation image, the variance of the system noise normalized to that at its peak (σh = 2 m) becomes

equation image(55)

Comparing the intervals given by (54) and (55), we will find that the prescribed values of α have smaller intervals than those of equation image. That is, the parameter search for the optimum value of α is not coarser than the search for optimum equation image.

Another possibility is the nonlinearity of the system model. That is, since the system model (1) is nonlinear with regard to system noise equation image, the magnitude of the system noise, σh, may have a smaller effect than would be expected with a linear system model.

Let us evaluate the contribution of the covariance of the observation noise and that of the forecast error in the approximated Kalman gain (6). In (6), the effect of the observation noise appears as equation image, and that of the forecast error is given by equation image. In Figure 24(a), we plot

equation image(56)
Figure 24.

(a) Mean variance of observation noise as a function of observation noise coefficient α; the dashed line passes through the origin and the value of the variance at α = 1. (b) Mean variance of forecast error in the observation space as a function of squared system noise magnitude equation image for observation noise coefficients α = 1, 20, and 500. The dashed lines pass through the origin and the value of the variance at equation image.

as a function of α. The quantity defined in (56) can be interpreted as a spatio-temporal mean of the observation noise variance because the summand can be interpreted as a spatially averaged variance at time t, and the spatially averaged variance is then averaged over time. Figure 24(a) shows that the mean observation noise variance is linearly proportional to α, that is, all the data points are aligned on the dashed line passing through the origin and the value of the mean variance for α = 1. This is a trivial result, because equation image is assumed to be proportional to α.

On the other hand, while system noise covariance equation image is proportion to equation image, the forecast-error covariance in the observation space equation image may not be. This is because the system equation (1) is nonlinear with respect to the system noise. Figure 24(b) shows the spatio-temporal mean of the forecast error variance in the observation space,

equation image(57)

as a function of the square of the system noise magnitude, equation image, for α = 1, 20, and 500. For each α, the mean variance of the forecast error increases with equation image, which is consistent with our expectations. Larger values of α can be seen to give larger forecast-error variances, which seems reasonable because state variances are reduced less in the filtering steps with large observation noise variance.

The dashed lines pass through the origin and the value of the mean variance at equation image. If the mean variance is linearly proportional to equation image, the data points are expected to align parallel to the dashed lines. Instead, as seen in Figure 24(b), the data points curve below the dashed lines for each α value. Therefore, the effect of equation image on the forecast-error variance is smaller than that expected for the case of a linear dependence on equation image. Considering that the observation noise variance is linearly proportional to α, we can conclude, therefore, that equation image is less sensitive to the approximated Kalman gain than α, and consequently that equation image has a smaller effect on the likelihood than α.

5.5. Missing westerly wind bursts (WWBs)

As shown in section 4.3, the estimates of the zonal wind anomalies showed westerlies that is strongly correlated with the El Niño variations of the SSH anomalies in the central equatorial Pacific (in the TW2 region). In the western Pacific (in the TW1 region), however, we identified almost no correlations between the zonal wind estimates and SSH anomalies.

It is well known from observations that El Niño events are preceded by westerly wind bursts (WWBs) over the western Pacific. This tendency was found in the recent events (McPhaden 1999; McPhaden 2004; McPhaden 2008) and has been statistically confirmed (Vecchi and Harrison 2000; Zhang and Gottschalck 2002; Hendon et al.2007; Seiki and Takayabu 2007). The WWBs were commonly considered external stochastic forcing for ENSO, but it was found from observations that WWBs depend on SST (Yu et al.2003; Tziperman and Yu 2007; Seiki and Takayabu 2007; Kug et al.2008), and an empirical WWB model as a function of SST was proposed (Gebbie and Tziperman 2009).

In response to those observations, dynamic models can reproduce ENSO events by imposing either stochastic WWBs (Neelin et al.1998; Perigaud and Cassou 2000; Roulston and Neelin 2000; Fedorov 2002; Zavala-Garay et al.2003; Boulanger et al.2004; Lengaigne et al.2004; Batstone and Hendon 2005; Zavala-Garay et al.2005; Zavala-Garay et al.2008) or by SST-dependent WWBs (Perez et al.2005; Eisenman et al.2005; Gebbie et al.2007). On the other hand, dynamic models can also reproduce WWBs (Vecchi et al.2006; Kug et al.2009).

Taking these observational and model studies into account, we should clarify why the westerly wind anomalies over the western Pacific were not reproduced in the smoothed estimate as shown in Figure 13(a).

5.5.1. Missing physics

We expect that the poor representation of westerly anomalies over the western Pacific may be the result of missing physics in the ZC model. That is, the westerlies cannot be generated from the coupled interaction in the model, and therefore cannot be reproduced by the SSH assimilation due to the poor covariance between the westerlies and the SSH. To clarify the poor estimates of the westerly anomalies, we carry out additional experiments to examine consistency of the ZC model and the atmospheric data.

Experiment 1. As mentioned in section 3.1, the ZC model is a coupled model, where the ocean component drives the atmosphere component, and the atmosphere component in turn drives the ocean component. In this experiment, we cut a model-coupling process from the ocean to the atmosphere, and insert wind observations instead. On the basis of the ocean output, we can select one of two options: (a) If the ocean output still reproduces El Niño, both the atmospheric data and the model atmosphere-to-ocean process can be regarded as reasonable. (b) If not, either the atmospheric data are incorrect and/or the model atmosphere-to-ocean process is too simplified.

We insert the atmospheric data in the zonal and the meridional components into the ZC model (equation image) at every time step. Values at the model grid points are generated using the optimal interpolation (OI), and those at model time steps are obtained by linear interpolation. The top of Figure 25(a) shows the atmospheric input along the Equator, with vertical dashed lines showing boundaries of TW1, TW2, TW3, and 120– 90°W regions. The WWBs are identified in the western half of the TW1 region in 1997. The output, representing the ocean component, is shown at the bottom of Figure 25(a). We identify El Niño-like variations in the SSH variation in 1997. We therefore select option (a), that is, both the atmospheric data and the model atmosphere-to-ocean process are considered reasonable.

Figure 25.

Temporal variations of zonal wind anomalies and SSH anomalies averaged over 5°N– 5°S for Experiments 1– 3: (a) insertion of wind anomaly data (created from NCEP 2 Reanalysis), (b) insertion of SSH anomalies obtained in Experiment 1, and (c) insertion of wind anomalies obtained in Experiment 2.

Note that the result of this experiment can be expected from previous studies. When the ocean component of the ZC model (Cane and Patton 1984) is driven by observed wind data, the resulting SSH variations are known to be reasonable for El Niño events before 1984 (Cane 1984; Busalacchi and Cane 1985). Here, we have confirmed a similar effect for the current dataset of 1992– 2002.

Experiment 2. Experiment 1 provides ocean output that shows El Niño-like variation. The next experiment focuses on another model coupling process: the process from the ocean to the atmosphere. We cut the model ocean-to-atmosphere process and insert the ocean variables obtained in Experiment 1. Depending on the atmospheric output, we can again select one of two options: (a) If WWB-like variations are reproduced in the atmospheric component, the model ocean-to-atmosphere process is reasonable. (b) If not, the process is unreasonable. We insert the ocean variables (ocean current equation image and thermocline depth h) obtained in Experiment 1. The bottom of Figure 25(b) shows SSH anomalies converted from h as input, which is identical to the bottom of Figure 25(a). The atmospheric output is shown at the top of Figure 25(b). No WWB-like variations are seen in the western half of the TW1 region, where easterlies dominate for the whole period. Therefore, option (b) is selected, i.e. the model ocean-to-atmosphere process is regarded as unreasonable in terms of its ability to reproduce WWBs.

5.5.2. Necessity of WWBs

Experiment 2 reproduces a situation where the WWBs are missing while the model SSH displays an El Niño-like variation. This situation is what was observed in the assimilation experiment in section 4.3. In the next experiment, we examine the necessity of WWBs to ENSO-like variation of the model SSH.

Experiment 3. We insert the atmospheric variations that have no WWBs obtained in Experiment 2 into the ZC model. This experiment is similar to Experiment 1, but the atmospheric input does not include WWBs. We have two options: (a) If El Niño-like variations are still reproduced in the ocean component, the ZC model does not require WWBs to drive El Niño; (b) If not, WWBs are required to excite El Niño events in the model. The top of Figure 25(c) shows the atmospheric input, which is the output of Experiment 2 and does not include WWBs, and the ocean output is shown in the bottom panel. Although the magnitude differs to the output in Experiment 1, El Niño-like variations can be identified in the SSH anomalies. From this result, we select option (a), i.e. we infer that westerly winds over the western Pacific are not required to drive El Niño events in the ZC model.

From Experiments 1– 3, we find:

(1) the westerly winds in the western Pacific are phenomena outside of the ZC model, and are not required for the model El Niño events;

(2) this result is consistent with the fact that the westerly winds cannot be estimated over the western Pacific by the SSH observations.

Although the ZC model does not describe the WWBs, which can be identified in the original paper Zebiak and Cane (1987), Figure 13, the ZC model is known to be effective for ENSO prediction (Chen and Cane 2008). On the other hand, it was also reported that the ZC model has no dynamical use and no real predictive power (Bennett et al.1998; Bennett et al.2000).

5.5.3. Source of WWBs

Assimilation of SSH did not reproduce the westerly winds. Judging from the dependence of the variables in the ZC model, there is likely a deficit in at least one of the following three processes:

(1) the thermodynamic equation, which solves the SST evolution as a function of SSH,

(2) the heating, which depends on SST and surface wind convergence, and

(3) the atmospheric model that is driven by the heating.

In order to identify the model deficit, we conduct another experiment.

Experiment 4. In this experiment, we cut the evolution process of SST by the thermodynamic equation and insert SST data instead. Here again we have two options: (a) If the westerly winds are reproduced, the heating process and the atmospheric model are reasonable, but the thermodynamic equation lacks some physics necessary for the reproduction of westerly winds. (b) If the westerly winds still do not appear, the heating process and/or the model heating process may be defective. The left-hand panels of Figure 26 shows temporal variations of the SST data that are inserted into the ZC model and the resultant zonal winds. The westerly winds are reproduced west of 150°E around the beginning of 1997. In addition, westerly winds around 170°E are identified in 1993 and 1994, as was also found in the wind data (Figure 14(a)) but was very weak in the smoothed estimates by SSH assimilation (Figure 14(b)). We therefore select option (a) and conclude that the model SST evolution needs to be improved to reproduce the westerly winds.

Figure 26.

Temporal variations in zonal wind anomalies and SSH anomalies averaged over 5°N– 5°S for (a) Experiment 4: insertion of SST anomaly data, and (b) those obtained as the smoothed estimates through SSH assimilation.

The right-hand panels of Figure 26 show the smoothed estimates of zonal winds and SST that were obtained through the SSH assimilation. The wind estimate is identical to Figure14(b). We find that the SST anomalies in the ZC model are very weak west of 150°E. This tendency was already identified in the 1997– 98 ENSO event (Figure 16(a)).

This experiment indicates that estimated SST anomalies are inconsistent in the western Pacific. However, we did not identify this inconsistency in the SST indices in Figure 12. This is because the region west of 150°E is outside the four Niño regions in which SST indices were calculated.

6. Conclusion

In this study, we have introduced a method of maximum likelihood for selecting optimal covariances in the framework of ensemble-based filters such as the EnKF. The likelihood is approximated as the sample mean of each member's likelihood. To evaluate the sampling error of the ensemble-approximated likelihood, we constructed a method for examining the statistical significance of the approximated likelihood by the bootstrap method without extra ensemble computation. We applied these methods to an EnKF experiment where TOPEX/POSEIDON altimetry observations are assimilated into an intermediate coupled ocean-atmosphere model by Zebiak and Cane (1987). We estimated optimal parameters that specify the covariances of system noise and observation noise. System noise is imposed on the thermocline depth anomalies and its covariance matrix is a function of three parameters: magnitude and decorrelation lengths in the zonal and the meridional directions. The observation noise covariance matrix has a single parameter that scales a fixed covariance matrix. From a candidate set of values of the parameters, we selected the optimal parameters that maximize the approximated likelihood. We found that the value of the obtained parameters were reasonable in comparison with previous studies.

Properties of the likelihood are as follows:

(1) the likelihood has multiple peaks. Specifically, the decorrelation length of the system noise covariance has two peaks, which may correspond to two characteristic length-scales of the Kelvin wave and the Rossby wave in the equatorial Pacific.

(2) The likelihood is more sensitive in terms of the magnitude of system noise and observation noise than the decorrelation lengths of system noise.

(3) The likelihood, and consequently the state estimates, depend not only on the ratio of the magnitude of system noise and observation noise, but on their individual specific values. This is due to the nonlinearities present in the ZC model.

(4) The likelihood tends to select large observation noise, compared with the Gaussian likelihood.

Using the optimal covariance parameters, we examined the estimates by the EnKF and the EnKS. The effect of smoothing decreases by 1/e approximately one year after the filtering step. One property of the smoothed estimate is that westerly wind anomalies over the western Pacific did not appear even around the period of El Niño events. We thus carried out additional experiments, which found that:

(1) the westerly winds are phenomena outside of the ZC model and are not necessary for the model El Niño,

(2) the model El Niño can be maintained by the westerlies over the central Pacific, and

(3) in order to reproduce the westerly winds over the western Pacific, the model SST evolution process needs to be improved.

The present method of the ensemble-approximated maximum likelihood is applicable to any ensemble-based filters. The method is an appropriate tool for selecting optimal error covariances in data assimilation in the presence of nonlinearities.

Acknowledgments

We would like to thank S. E. Zebiak for providing the code for the ZC model. The Altimeter Ocean Pathfinder TOPEX/POSEIDON SSH anomaly data were provided by the Physical Oceanography Distributed Active Archive Center (PO.DAAC) at the NASA Jet Propulsion Laboratory, Pasadena, California, through their web site at http://podaac.jpl.nasa.gov/. NCEP Reanalysis 2 data were provided by the NOAA-CIRES Climate Diagnostics Center, Boulder, Colorado, USA, from their web site at http://www.cdc.noaa.gov/. This research was partially supported by the Japan Science Technology Agency for the CREST (Core Research for Evolutional Science and Technology) project.

Ancillary