## 1. Introduction

In data assimilation, covariance matrices are introduced in order to prescribe the properties of the uncertainty of the initial state, the system noise (model error, process noise), and the observation noise (observation error). The inverse matrices of the respective covariances work as weights of the initial estimates, model dynamics, and observations. Suitable specification of the covariance matrices is essential for obtaining sensible estimates, and mis-specification of the matrices may lead to over- or underfitting of the data and/or failure of the assimilation altogether (e.g. Fukumori 2001). This paper presents a technique for optimizing covariance matrices in data assimilation.

The weight of the initial estimate is usually modelled by the inverse of a covariance matrix, known as a background-error covariance matrix in the literature on variational assimilation. System noise (process noise, model error) introduces uncertainties in the temporal evolution of the model state due to numerical truncation and/or parametrization error. Observation noise is, by definition, the difference between observations and the model state. The difference can be interpreted as a sum of instrument error and representation error (Cohn 1997). The instrument error is due to the experimental properties of the instrument, and the representation error depends on the dynamic model. One of the sources of the representation error is neglected physics in the model, such as subgrid-scale variability. A part of the subgrid-scale variability, however, would not be considered as representation error if we used an alternative model that was constructed on a finer grid scale. In addition, background error and system noise also prescribe the behaviour of the model state, and consequently affect the observation noise.

Given the fact that these noise terms depend on each other, optimal covariance matrices of the noise should be simultaneously determined based on both the model and observation properties. In the following, we review methods for estimating the optimal covariances based on dynamic models and observations, as summarized in Table I, and clarify the nature of our proposed method. Although categorized in Table I, these methods are commonly based on the specific statistic, , which is the difference between the observation and the predicted model states , where is the observation matrix. According to custom, we refer to the statistic as innovation, though this terminology is not entirely correct since it presumes optimality of the data assimilation algorithm (e.g. Anderson and Moore 2005), section 5.3.

• Minimizing the squared innovation |

• Maximum likelihood |

1. Gaussian likelihood |

º Original form |

º Ensemble mean and covariance of state |

º Expected value of the cost function |

º Adjustment to the squared innovation |

2. Non-Gaussian likelihood |

º Ensemble mean of likelihood (present study) |

• Bayesian estimation |

• Covariance matching |

### 1.1. Minimizing the squared innovations

One method of estimating the optimal covariances is to minimize the square of the innovation. Gaspar and Wunsch (1989) obtained the optimal magnitude of the system noise covariance by maximizing the variance explained by the model prediction, which is equivalent to minimizing the square of the innovation. Through the minimization of the squared innovation, Hoang *et al.* (1997) directly determined the optimal Kalman gain, which is a function of the forecast-error covariance and observation noise covariance.

### 1.2. Maximum likelihood

Maximization of Gaussian likelihood functions has also been adopted to estimate the covariance parameters (Dee 1995; Dee and da Silva 1999; Dee *et al.*1999). Since the likelihood function can be interpreted as a distribution of the innovations, the maximum likelihood is also considered to be an innovation-based method. Maximization of the likelihood function is equivalent to minimization of the sum of the quadratic of the innovations with a covariance matrix of a Gaussian innovation distribution and the log-determinant term of the covariance, which gives different estimates from the method that simply minimizes the squared innovation itself (section 1.1). Mitchell and Houtekamer (2000) extended the likelihood function in the framework of the ensemble Kalman filter (EnKF; Evensen 2003) by approximating the mean and the covariance of the model state in the Gaussian likelihood function by the ensemble mean and ensemble covariance.

The optimality condition, where the derivative of the likelihood by the covariance parameters vanishes, leads to an explicit representation of the optimal covariances (Maybeck 1982). Using a tropical Pacific Ocean model, Blanchet *et al.* (1997) compared the covariances obtained through the procedure of Dee (1995) and Maybeck (1982), to another covariance obtained through an intuitive approach by Myers and Tapley (1976).

The optimality condition is also consistent with the relation of the expected value of a cost function (Talagrand 1999; Chapnik *et al.*2004). Hence, a method by Desroziers and Ivanov (2001), who evaluated the expectation of the cost function to tune parameters of observation noise covariance, is also categorized as a maximum likelihood method.

Adaptive Kalman filtering (Mehra 1970; Gelb 1974) estimates the optimal covariances so that the sample mean of the squared innovations should be equal to the covariance of a Gaussian innovation distribution. This approach is also interpreted as a maximum likelihood method, because the maximum likelihood estimator (MLE) of the covariance matrix of a Gaussian distribution is a sample covariance matrix (e.g. Magnus and Naudecker 1999). While the adaptive Kalman filtering estimates the optimal covariances at each time step, a time-averaged version of the method was also proposed to estimate optimal stationary covariances (Miller and Cane 1989). The relation between the squared innovation and the covariance of the Gaussian leads to alternative relations (Desroziers *et al.*2005), which are also consistent with the condition of the MLE.

### 1.3. Bayesian estimation

In addition to the likelihood function, Purser and Parrish (2003) introduced a prior distribution to obtain smooth estimates of parameters of the background-error covariance.

### 1.4. Covariance matching

The ‘covariance matching’ method (Fu *et al.*1993) and its extended version (Menemenlis and Chechelnitsky 2000) utilize the sample covariance of the innovations, in which the output of a free simulation run is assigned as the predicted model states, . In the framework of linear and stationary systems, the methods estimate the optimal covariance matrices of system noise and observation noise by combining a relation of the innovation covariance and the Lyapunov equation for the state covariance.

### 1.5. Present study

The above-mentioned methods for estimating optimal covariance are constructed based on linear-Gaussian state space models, i.e. it is assumed that both the system equation and the observation equation are linear, and that both the system noise and the observation noise obey Gaussian distributions. In the present study, we propose a method for estimating optimal noise covariances in the context of sequential assimilation, even when the system and observation equations are nonlinear. Nonlinearities in the system equation are typically introduced by the advection term in the momentum equation. When the system equation is nonlinear, ensemble-based assimilation methods such as the EnKF are applied to deal directly with the nonlinearity. The present approach for covariance optimization is a maximum likelihood estimation carried out by approximating the likelihood with the ensemble (Table I). Specifically, the approach is also applicable when the noise obeys a non-Gaussian distribution such as the log-normal distribution (e.g. Fletcher and Zupanski 2006) for estimating the optimal parameters that describe the non-Gaussian distribution.

The present paper is organized as follows. In section 2, the basic concept of the EnKF is reviewed, and an ensemble approximation of the likelihood is derived. We then give an example of an application of the method to a coupled atmosphere– ocean model (Zebiak and Cane 1987), into which sea-surface height (SSH) observations are assimilated using the EnKF and the ensemble Kalman smoother (EnKS). The experimental set-up and results are given in sections 3 and 4, respectively. In section 5, we discuss the properties of the estimated covariance and missing physics in the coupled model; the westerly wind bursts (WWBs) in the western Pacific are not reproduced in the EnKS estimate. Conclusions are given in section 6.