## 1. Introduction

The concept of an optimal representation of control space was introduced by Bocquet (2009), building on the standpoint developed in the seminal work of Rodgers (2000) in the context of remote sounding. The idea is to define a discretization of large parameter spaces that best accounts for the observations. Those spaces are typically met in geophysical and environmental problems where fields of forcing parameters are uncertain, such as in atmospheric chemistry, where the emissions are poorly known. The theory was applied to the inverse modelling of source of atmospheric tracers. In many data assimilation experiments, such as the inversions of air quality pollutant sources or greenhouse gases fluxes, one is interested in the reduction of uncertainty achieved by the assimilation of observations. It was shown that optimal adaptive grids of control space can yield a reduction of uncertainty equivalent to a highly resolved regular grid, but with a far fewer number of grid cells.

### 1.1. Selected results from Part I

In Part I of this work (Bocquet *et al.*, 2011), the optimal representation theory was perfected. The multiscale aspect of the theory was made Bayesian, allowing for a consistent use of background information on control space parameters.

One considers the typical inverse modelling problem

where is the vector of observations, **H** is the Jacobian matrix of the problem (linear or linearized), is the vector of parameters, which is defined in control space, and is the vector of observational error. The typical data assimilation problem related to this equation assumes some prior statistical information on the errors that follow a Gaussian distribution , and on the parameters that follow the Gaussian background-error statistics . For a general representation (or discretization) of control space *ω*, the observation equation would read:

Such representation *ω* is an adaptive discretization made of cells of various form and sizes, each one representing a scalar variable, that compose a partition of the domain Ω of control space. Coarsening **Γ**_{ω} and prolongation **Γ**^{*}_{ω} operators are used to scale up or down these grid cells. The prolongation operator is derived using all available information from the background.

Assume these grid cells are aggregations of smaller grid cells defined on a regular finest grid with *N*_{fg} grid cells. Then the prolongation of the representation *ω* (with *N* ≤ *N*_{fg} grid cells) to the finest grid, followed by a coarsening back to *ω*, should correspond to the identity operator: . However the reverse, coarsening from the finest grid to *ω*, and prolongating back to the finest grid, implies a loss of information, so that the resulting (affine) operator is not the identity but

where

Aggregation errors that account for representativeness errors are taken into account in this framework. We called them scale-covariant errors because they follow

New objective functions for the design of the optimal representations were introduced: the degrees of freedom for the signal (DFS) , which is a well-used criterion in data assimilation though not used for that purpose. A data-dependent criterion was also introduced.

In conjunction with scale-covariant errors, the DFS criterion takes the simple scale-dependent form

We shall use this objective function for the design of representations in the rest of this article.

The adaptive grids are optimized on a dictionary of representations. A large dictionary was composed of general *tilings*. Each tiling is a set of rectangles (or tiles) that partition control space. Definition and implementation can be found in Part I. Figure 4 (this paper) also offers an illustration. By construction the subset of *qtrees* provides adaptive grids that are less efficient than optimal tilings (Figure 2 in the current paper provides an illustration). However, it was shown in Part I that the discrepancy is small. Besides the qtrees are expected to be computationally more efficient than the general tilings, as explained below.

### 1.2. Computational costs

The numerical optimization of the grid of control space entails the minimization of a functional. This functional depends on Lagrange parameters. Among them, *N*_{fg} parameters enforce the one-point one-tile constraints and a single one enforces the number of tiles of the representation. The optimizations are carried out with the L-BFGS-B quasi-Newton minimizer (Byrd *et al.*, 1995), on *N*_{fg} + 1 variables. It is difficult to estimate *a priori* the number of iterations of the minimization since it is problem-dependent, and since it depends on choices made by the operator such as the stopping criterion. As a general rule though, the minimization of a quadratic functional to machine precision has cubic dependence in the number of variables ((*N*_{fg} + 1)^{3} here). With the iterative BFGS minimization, each iteration computational cost scales like (*N*_{fg} + 1)^{2} multiplications, whereas it scales like *L*(*N*_{fg} + 1) multiplications for the limited memory L-BFGS, where *L* is the memory length. (Typically 10–30 for high-dimensional applications.)

However, this does not account for the evaluation of the cost function and of its gradient which is a vector of *N*_{fg} + 1 components. Such evaluations are required by the quasi-Newton algorithm and they are needed at each iteration. For high-dimensional geophysical systems, most of the computational time would be spent there. The cost function has the form of a sum over all tiles of the multiscale structure. Hence, the computational cost of the functional is linear in the total number of tiles. The total number of tiles scales at most like 4*N*_{fg} in the general 2D tiling case, and at most like 4/3*N*_{fg} in the 2D qtree case. This explains why the qtrees are faster to optimize on than the general tilings, even though the number of grid cells in the finest grid *N*_{fg} is the same. In the examples of Part I, the optimization over the dictionary of qtrees were at least twice as fast as the optimization on the dictionary of general tilings. However, it does not have to match perfectly the 1/3 scaling since the sum of the cost function is parallelised in both cases with communication overheads. Also note that the regularisation of the functional used in Part I requires functions such as logarithm and exponential which are more costly that matrix-vector multiplications.

As a clearly distinct problem, one needs to compute Jacobian **H**. For high-dimensional problems, most of the computational power can be dedicated to **H**, running geophysical numerical models. But once **H** is computed and stored, optimizations can be performed without the need to re-compute it, except if models are nonlinear.

Finally the storage requirements of the multiscale Jacobian scales like the total number of tiles of the multiscale structure, which was an argument in favour of the qtrees over the general tilings (three times more costly for a 2D domain) put forward in Bocquet (2009).

### 1.3. Objectives

For an application where the observation locations and schedule are known *a priori*, the optimization on the dictionary of grids can be performed *a priori* once and for all subsequent data assimilation analyses. However, even for moderately high-dimensional Jacobians, the optimization can be computationally challenging.

The objective of this Part II article is to introduce sub-optimal analytical solutions to the problem of the construction of optimal adaptive grids. A continuum or asymptotic limit of the problem will be first defined. An optimization will be analytically performed in the continuum limit framework. As a result, a density of tiles will be obtained. A discretization algorithm will then be needed to build discrete representations of control space, using those continuum densities.

Note that this is a constructive approach. The overall interest of the theory must be judged on the quantitative performance of the representation that it yields. This performance is objectively measured by an objective function such as the DFS.

### 1.4. Outline

The results of this article will be illustrated using a problem of interest for the Comprehensive Nuclear Test Ban Treaty Organisation (CTBTO) of the United Nations, but using simplified physics in the Jacobian. Details of the setup are given in section 2. The asymptotic analytical solutions are derived in section 3. They are introduced with increasing complexity. The continuum limit will be first derived in the one-dimensional (1D) case, because the limiting density of tiles is expected to be asymptotically exact. The multidimensional case is then treated but depends on the type of dictionary employed: ftrees, qtrees, or tilings. Focussing on the general tilings and qtrees, the construction of a discrete representation of control space using these analytical densities is then discussed, and simple algorithms are proposed. In some cases, the analytical densities may be improper (they cannot be normalized to one). This corresponds to a problem uncovered by Bocquet (2005), and it is dealt with in this context. In section 4, several of these results are illustrated on the CTBTO test case. We summarize the results and conclude in section 5.