Optimal linearization trajectories for tangent linear models

Authors


Abstract

We examine differential equations where nonlinearity is a result of the advection part of the total derivative or the use of quadratic algebraic constraints between state variables (such as the ideal gas law). We show that these types of nonlinearity can be accounted for in the tangent linear model by a suitable choice of the linearization trajectory. Using this optimal linearization trajectory, we show that the tangent linear model can be used to reproduce the exact nonlinear error growth of perturbations for more than 200 days in a quasi-geostrophic model and more than (the equivalent of) 150 days in the Lorenz 96 model. We introduce an iterative method, purely based on tangent linear integrations, that converges to this optimal linearization trajectory.

The main conclusion from this article is that this iterative method can be used to account for nonlinearity in estimation problems without using the nonlinear model. We demonstrate this by performing forecast sensitivity experiments in the Lorenz 96 model and show that we are able to estimate analysis increments that improve the two-day forecast using only four backward integrations with the tangent linear model. Copyright © 2011 Royal Meteorological Society

1. Introduction

The use of tangent linear (TL) and, in particular, adjoint models has been very useful in several applications in numerical weather prediction (NWP) (e.g. Errico, 1997, 2003; Errico and Ehrendorfer, 2007, give an overview). For example, at the European Centre for Medium-range Weather Forecasts (ECMWF) these linear models play a crucial role in the computation of initial condition perturbations used in the ensemble prediction system (Leutbecher and Palmer, 2008) and in their four-dimensional data assimilation system (4D-Var; Courtier et al., 1994). One of the major limitations to the application of linear models is that the results are useful only when the linear approximation is valid (Errico, 1997). By this we mean that the difference between two runs of the nonlinear model can be described by the associated linearized version of the nonlinear model. To achieve this, great effort is taken to develop linearized models which capture as many features as possible of the full nonlinear model (Janisková et al., 1999). Despite these efforts, the use of TL and adjoint models is restricted to ‘short’ time spans. The time span for which the TL model can be considered accurate will be referred to as the TL regime.

The duration of the TL regime depends on many factors. Typically the difference between two nonlinear forecasts is compared with the linear forecast by a scalar index, and it is said that the TL assumption is violated when the index has reached a threshold value. So the measure which is employed to compare forecast fields is already important in the definition of the TL regime. But also the size of the initial condition perturbation, the orientation of the perturbation, the background trajectory around which the TL model is linearized and the physical processes taken into account in the TL model all play a role. Another issue which influences the usefulness of linear models is whether we are considering forecast problems, where error growth is determined by the singular value spectrum of the propagator, or estimation problems that are typically characterized by the reciprocal of the singular value spectrum. In general, the spectrum of reciprocal of the singular values attains higher values (Reynolds and Palmer, 1998) and therefore the usefulness of the TL model in estimation problems is shorter. This effect becomes even more pronounced by the fact that the typical size of perturbations used in backward integrations is larger than in forward mode.

In this forward mode, the TL assumption is generally believed to be valid for 2–3 days at the synoptic scale. However, Gilmour et al. (2001) argue that 1 day is perhaps a better estimate. On the cloud-resolving scale, the TL assumption probably holds for much shorter time periods on the order of 1.5 h (Hohenegger and Schär, 2007). As models reach higher resolutions, the validity of the TL assumption is therefore a major concern. We will show that for bilinear systems the usefulness of the TL model can be greatly extended by modifying the linearization trajectory and therefore one of the major limitations on using TL models can be eliminated.

In section 2, the definition of bilinear differential equations is given. In section 3, we show that for bilinear systems there is an optimal linearization trajectory such that, if the TL model is linearized around this trajectory, the perturbation growth in the TL model is equal to the nonlinear perturbation growth. Knowing that such a trajectory exists, we show in section 4 that there is an iterated map based purely on TL integrations that converges to this linearization trajectory. In section 5, we show how the iterative method can be used in forecast sensitivity experiments using the inverse of the TL model. In section 6, the experimental results using a quasi-geostophic (QG) model (Marshall and Molteni, 1993, described in Appendix A) and the Lorenz 96 model (Lorenz, 1996, described in Appendix B) are given. In the discussion in section 7, the prospects for using the method in realistic NWP models and a method to regularize the error growth in the TL model are discussed. The conclusions are given in section 8.

To keep the notation simple, we use the convention that lower-case variables are perturbations (also referred to as increments) to upper-case variable, e.g. equation image is a perturbation to the state vector equation image.

2. Bilinear differential equations

In this section, some terms are defined which will be used throughout the article. Let equation image and equation image be elements of a vector space equation image.

Definition 1: Bilinear map

A map q

equation image

is called bilinear if q is linear in both arguments.

Definition 2: (Anti)symmetric bilinear map

A bilinear map s will be called symmetric if for any equation image

equation image(1)

A bilinear map a will be called antisymmetric if

equation image(2)

Note that, for any bilinear map q, there is a unique decomposition

equation image(3)

where equation image is symmetric and equation image is antisymmetric.

Definition 3: Bilinear differential equation

A differential equation will be called bilinear if it is of the form

equation image(4)

where q is a bilinear map, b is a linear map and c is a forcing. If equation image is finite-dimensional, this is an ordinary differential equation (ODE), while if equation image represents a (collection of) space- and time-dependent field(s), this is a partial differential equation (PDE). For PDEs, the mappings q, b and forcing c are allowed to depend on space and time explicitly.

Definition 4: Bilinear differential algebraic equation

A differential algebraic equation (Brenan et al., 1996) will be called bilinear if it is of the form

equation image(5)

where q and e are bilinear maps, d and b are linear maps and c is a forcing.

Example 1: Barotropic vorticity equation

The barotropic vorticity equation (bve) is

equation image

where the first equation is a prognostic equation for the absolute vorticity η, the second and third equations are algebraic constraints (diagnostic equations) for the stream function ψ and the two-dimensional velocity equation image respectively (hence the zeros in front of the time derivatives), f is the Coriolis parameter, equation image is the vertical unit vector and J is an antisymmetric bilinear map defined as

equation image(6)

If we define equation image, we see that the bve is a bilinear partial differential algebraic equation (BPDAE) in the state vector equation image with e = 0 and d = diag(I,0), and the velocity is a ‘post-processed’ variable. Alternatively, the equation for ∂η/∂t can be written as equation image, in which case the state vector should be defined as equation image.

Example 2: Momentum equation

The momentum equation in a uniform rotating coordinate frame is (Pedlosky, 1987)*

equation image

where equation image is the three-dimensional velocity vector, p is pressure, ρ is density, Ω is the angular rotation vector, φ is the potential that represents conservative body forces, including gravity, and ℱ represents non-conservative (frictional) forces. The prognostic equation for equation image is a trilinear differential equation due to the term equation image caused by the advection part of the total derivative. It is however easy to transform the trilinear equation to a bilinear differential algebraic equation by augmenting the state vector with the momentum density equation image:

equation image

Alternatively, the momentum density vector field equation image can be considered as the prognostic variable

equation image

where we used the mass continuity equation.

Example 3: Equation of state

The equation of state for an ideal gas can be formulated as an algebraic constraint as

equation image(7)

Examples 1 and 2 illustrate that in fluid dynamics bilinearity is typically a result of the advection part of the total derivative. Example 3 shows that another source for bilinearity is the use of algebraic constraints between state variables such as the ideal gas law. Example 2 further illustrates that it is easy to reduce multilinear systems (Appendix D) to bilinear systems by augmenting the state vector.

Notation 1: Nonlinear integrations

Integrations with a nonlinear model starting from an initial condition equation image are denoted by

equation image(8)

Then, by definition, the exact increment trajectory for a given perturbation equation image of an initial condition equation image is given by

equation image(9)

Notation 2: Tangent linear integrations

Integrations with the TL model starting with an initial condition perturbation equation image are denoted by

equation image(10)

where equation image is known as the propagator and equation image is the trajectory around which the tangent linear model is linearized.

3. Optimal linearization trajectories

In this section, we derive the TL model for the general form of bilinear system and show how to modify the linearization trajectory to obtain an exact correspondence between the nonlinear time evolution and the corresponding TL evolution of perturbations.

Consider the general form of a bilinear differential equation

equation image(11)

where equation image, q is a bilinear mapping, b is a linear mapping and c is a forcing, and the mappings q and b and forcing c are allowed to explicitly depend on space and time. Solutions (trajectories in equation image) of (11) are denoted as equation image. The time evolution of a perturbed run equation image is given by

equation image(12)

Now, using the bilinearity of q, the linearity of b and (11) to eliminate equation image, we obtain

equation image(13)

Here we used the linearity of q in both arguments and the linearity of b to define an operator

equation image(14)

For finite-dimensional systems, equation image is the Jacobian of (11) evaluated along the trajectory equation image. In the TL approximation, the bilinear term equation image is neglected and the system

equation image(15)

is known as the TL model. We use a hat to indicate that this is only an approximation to the true evolution equation image, and the reason for adding the superscript 1 will become apparent later.

The key observation in this section is that the exact time evolution of perturbations in (13) can also be written as

equation image(16)

i.e. we obtain the exact time evolution of perturbations if the TL model is linearized around the trajectory equation image instead of equation image. The trajectory equation image will be referred to as the optimal linearization trajectory. The previous results can be generalized to BPDAEs. Let

equation image(17)

then substitution of equation image using (17) and retaining only terms linear in equation image gives the TL model

equation image

where we used Definition 2 to write

equation image

It is easy to see that the neglected bilinear terms equation image and equation image are recovered if the TL model is linearized around the trajectory equation image. An important difference from the previous result is that to integrate the TL model both the trajectory equation image and the tendencies equation image are required if e≠0.

Integrations with the TL model linearized around a trajectory equation image starting from an initial condition equation image will be denoted by

equation image(18)

Note that, although the TL model is used to propagate the increment equation image, this is not a linear mapping from equation image to equation image due to the dependence of the linearization trajectory on equation image. In Appendix C, we discuss how to preserve bilinearity when higher than first-order integration schemes are used to integrate (11), and show that bilinearity is preserved if a finite-dimensional representation of the state vector equation image is obtained by truncating the coordinate vector with respect to a time-independent orthonormal basis.

4. Iterative relinearization

In section 3, we observed that, for a given initial condition perturbation, there is an optimal linearization trajectory for the TL model such that the TL predictions become exactly equal to the nonlinear predictions. In this section, we introduce an iterative method purely based on integrations with the TL model that converges to this optimal trajectory. Section 5 shows how this iterative method can be used to update the linearization trajectory in forecast sensitivity experiments without using the nonlinear model.

We have seen that, for bilinear systems,

equation image(19)

describes the exact time evolution of perturbations. For a given initial condition perturbation equation image and a trajectory equation image, this equation can be written as a map equation image that maps increment trajectories to increment trajectories

equation image(20)

with equation image, i.e. the trajectory equation image is a fixed point of equation image. If, for a fixed time interval [0,T], there is a constant 0 < q < 1 and a suitable metric d on the space of increments defined on the interval [0,T] such that equation image, then equation image is known as a contraction mapping. The Banach fixed-point theorem then guarantees that the fixed point equation image is unique and moreover the iterated map

equation image(21)

converges to this fixed point. This suggests that, given an estimate of the trajectory equation image, the TL model can be integrated in the form

equation image(22)

where the superscripts indicate the iteration number. With equation image, the first iteration k = 1 is equal to a standard TL integration (as given by (15)) and gives a trajectory equation image. During the second iteration, we integrate the TL model with a modified trajectory equation image, etc. Alternatively, the iteration can be started with equation image, which has the advantage that the time derivatives in the TL model become exact at t = 0. In the experiments both methods are compared.

In Appendix D, an analysis of (22) for multilinear models is given and we show that, independent of the order of the nonlinearities in the nonlinear model, at convergence (22) always gives better predictions of the time evolution of perturbations than the standard TL model (15). In particular, the bilinear terms are exactly taken into account. In section 6.2, we examine the rate of convergence for the iterated map (21) for the QG and the Lorenz 96 models.

Remark 1: Radius of convergence

Iterated maps can exhibit a finite radius of convergence even though there is a fixed point valid for all t. Therefore, even though the fixed-point trajectory equation image is valid for all t, this does not imply that the iterated map (22) converges to this fixed point. As an example, consider the system equation image with X(0) = 1. The solution is given by the Witch of Agnesi X(t) = 1/(t2 + 1). The Picard iteration

equation image

with X0(t) = 1 converges to the Taylor series of X(t) but, because X(t) has poles at t = ±i, the Picard iteration only converges to the fixed point for |t| < 1.

5. Estimation using the inverse TL model

The estimation problem considered in this article is: given a forecast starting from an analysis equation image

equation image(23)

and an analysis equation image valid at time T, can we determine an analysis increment equation image such that

equation image(24)

These types of experiments are known as forecast sensitivity experiments and have been studied by Rabier et al. (1996), Pu et al. (1997a,b), and Klinker et al. (1998). If the TL assumption is valid, we expect

equation image(25)

and therefore we can obtain estimates of equation image from

equation image(26)

Besides giving estimates for equation image, the integration with the inverse TL model can be used to produce estimates of the complete trajectory equation image. The result from section 4 therefore suggests using this method iteratively:

equation image(27)

with equation image or equation image. This defines an iterated map on the space of increment trajectories

equation image(28)

where equation image. In section 6.3, the convergence rate of the iterated map (28) is investigated for the Lorenz 96 model.

6. Applications

6.1. Indices

To highlight different aspects of the optimal linearization trajectories and the iterative relinearization method, the exact time evolution of the perturbations equation image and the corresponding TL evolution equation image are compared using five indices lk, αk, Rk and dk and equation image. The similarity index lk(t) is defined as

equation image(29)

the angle αk(t) is given by

equation image(30)

the relative norm Rk(t) is given by

equation image(31)

the error norm dk(t) by

equation image(32)

and the relative error norm Rd(t) by

equation image(33)

For the QG model, the values of dk, lk, Rk and equation image are determined using the kinetic energy inner product. For the Lorenz 96 model, the Euclidean inner product is used. In the context of twin experiments, values of l = 0.7, corresponding to an angle α = 45°, are commonly used to indicate that the TL assumption is violated (e.g. Gilmour et al., 2001).

We will say that equation image is more similar to equation image than equation image at time t if αk(t) < αk−1(t) or, equivalently, if lk(t) > lk−1(t). We say that equation image is closer to equation image than equation image at time t if dk(t) < dk−1(t) or, equivalently, if equation image.

6.2. Iterative relinearization

In this section the rate of convergence of the iterated map T is examined in a quasi-geostrophic model (described in Appendix A) and the Lorenz 96 model (described in Appendix B).

6.2.1. QG model

In Figure 1(a) we show the 2-day forecast difference of the stream function at 500 hPa. The initial condition for the control run and the perturbed run are 100 days apart and therefore we may assume that they are uncorrelated (see also Figure A.1). The size of the perturbations used in these experiments is therefore much larger than typical analysis increments. In Figure 1(b), we show the forecast of the standard TL model equation image.

Figure 1.

Streamfunction perturbation at 500 hPa after 2 days using (a) the nonlinear model equation image, (b) the standard tangent linear model with equation image, and the iterative relinearization method with equation image for iterations (c) 1, (d) 2, (e) 3 and (f) 4. The initial condition for the perturbed run and the control run are 100 days apart. The contour interval is 1 × 10−3 Ωa2 in all panels (with a and Ω the average radius and the angular velocity of the Earth, respectively), with positive values solid and negative values dashed.

The other panels show the iterative method for four iterations with equation image. Both the standard TL integration (l1 = 0.55) and the first iteration with equation image (l1 = 0.73) differ substantially from the truth, with large differences north of 60°N. In the first iteration with equation image there is a wave pattern over the North Atlantic Ocean which is absent in the first iteration with equation image. At subsequent iterations, all positive and negative cells are gradually moved to their correct location and with the correct amplitude. At iterations 2 to 4 we have l2 = 0.90, l3 = 0.95 and l4 = 0.99 respectively, indicating that the iterative method converges quickly with the largest improvement when going from iteration 1 to 2.

Figure 2 shows the similarity index lk and the relative error norm equation image as a function of time and iteration number. The solid black line refers to the standard TL model with equation image. The coloured lines show the iterative relinearized results for four iterations with equation image. The control run and the perturbed run are 2 days apart. From the standard TL integration, we see that the duration of the TL regime is slightly larger than 1 day. Especially in the short range, it is beneficial to use equation image because the derivatives in the TL model become exact at t = 0. In Figure 2(b) this can be seen for example from the relative error norm where equation image when the standard TL model is used. Observe that the iterative method adds approximately 0.5 days to the usefulness of the TL model at each iteration.

Figure 2.

(a) Similarity index lk(t) and (b) relative error norm equation image, as a function of time and iteration number for the QG model. Average values for 20 experiments are shown. The black line is the standard TL model with equation image. The coloured lines are iterations 1 to 4 with equation image. The control run and perturbed run are 2 days apart.

6.2.2. Lorenz 96 model

Figure 3 shows the similarity index and relative error norm (average over 50 experiments) as a function of time and iteration index for the Lorenz 96 model. All experiments start with a random initial condition perturbation with equation image. Such an initial condition amplitude is approximately equal to the size of 12 h forecast differences (Figure B.1). From the first iteration using equation image (black), we see that the duration of the TL regime is slightly larger than 1.5 days (0.3 time units). Using equation image, this can be extended to 2 days. The iterative linearization method converges to the true increment at subsequent iterations. For a 2-day forecast (0.4 time units), of the order of four iterations are required to converge to the true time evolution of the increment, with the largest improvements when going from iteration 1 to 2. For longer lead times, more iterations are needed. This is related to the fact that the TL model produces large increments beyond the duration of TL regime (see also Figure 7). Therefore the corrections equation image used in the second iteration are actually deteriorating the linearization trajectory at the end. As a result of this, the second iteration is further away from the truth at the end of the optimization window, even though it is more similar to the truth. In section 7.2, we discuss a method to regularize this behaviour without affecting the fixed point of the iterated map.

Figure 3.

As Figure 2, but for the Lorenz 96 model. Average results for 50 experiments are shown using random initial condition perturbations with equation image. The black line is the result for iteration 1 with equation image, and the coloured lines for iterations 1 to 5 with equation image.

6.3. Estimation using the inverse TL model

Here we examine the iterated map (28) from section 5. The action of equation image on a vector equation image is obtained by integrating the TL model backwards in time. In the Lorenz 96 model, the fourth-order Runge–Kutta (RK4) scheme is used to propagate the state. Theoretically the backward integration requires the use of the inverse integration scheme (which will be an implicit scheme) to ensure equation image. Here the adjoint of the RK4 scheme is used to integrate the TL model backwards in time. In the Lorenz 96 model, we find experimentally that the angle between equation image and equation image is of the order O(10−3) degrees, and the relative norm equation image for an optimization time of 0.6 time units (3 days). So it appears that equation image is close to the identity operator. We conclude that the adjoint RK4 scheme can be used for the inverse integrations.

Figure 4 shows the result when we iteratively solve (27) using equation image. Even though the estimate from the first iteration differs substantially from the truth with l1 = 0.4, the method quickly converges and the subsequent iterations are more similar and closer to the truth. Approximately four iterations are required to obtain an almost perfect estimate. Note that, during the inverse integration, we also obtain the corrections needed for the next iterations. Therefore the computational cost is equal to four TL integrations (backwards). This cost should be compared to the alternative of solving this estimation problem in terms of a cost function minimization (e.g. 4D-Var) where a single inner-loop iteration already involves two linear integrations (1 adjoint and 1 TL integration). For comparison, Figure 4 also includes the result when the standard TL model, i.e. equation image, is used to propagate the increment backwards in time (the black line). If l = 0.7 is used as threshold value, then the gain of using equation image in the first iteration is 0.13 time units (0.65 days). From the time evolution of the error norm (Figure 4(b)), we see that this gain is mainly a result of the fact that, the time derivatives at t = 0.4 become exact in the TL model and thus equation image at t = 0.4. In particular for large perturbations, we therefore expect to benefit from using equation image.

Figure 4.

(a) Similarity index and (b) error norm equation image, as a function of time for the inverse TL model (solid) and the nonlinear model starting from equation image (dashed). Average results are shown over 50 experiments with an optimization time OT=0.4 and random initial condition perturbations with norm equation image. The black line is the first iteration with equation image, i.e the standard TL model.

From Figure 4, it is also clear that for long optimization windows the estimated increment at t = 0 from the first iteration becomes uncorrelated with the true increment. As a result, the nonlinear forecast starting from equation image bears low similarity to the truth (dashed lines in Figure 4). Therefore, for long windows, the nonlinear model starting from equation image cannot be used to update the linearization trajectory. In a forthcoming article, applying optimal linearization trajectories in the context of 4D-Var, we will show that also in 4D-Var it is better to update the linearization trajectory using the TL model.

If we use l < 0.7 to indicate the breakdown of the TL assumption, Figure 4 indicates that the TL assumption linearized around the control run is valid for 0.15 time units (i.e. from t = 0.4 to t = 0.25). This should be compared with the forward integration in Figure 3 where the value 0.7 is reached after 0.3 time units. The duration of the TL regime is shorter for inverse integrations. Partly this is a result of the fact that error growth in the backward integration is characterized by the reciprocal singular value spectrum and these values are larger than the singular values (Figure 6). Another reason is that typically equation image and therefore the backward integration is started with larger initial conditions. The idea of using the inverse of the TL model has been studied by Pu et al. (1997a) using a method called the quasi-inverse. They reversed the sign of the dissipation terms in the TL model as a form of regularization. As will be discussed in section 7.2 on the regularized prediction experiments, there is no need for bilinear systems to add regularization when the optimal linearization trajectory is used. Therefore the amount of regularization should depend on how close we are to the optimal linearization trajectory. If the linear term b in the nonlinear model (11) is a purely dissipative term, i.e. equation image, then the TL model can be integrated in the form

equation image(34)

The choice α = 2 amounts to reversing the sign of the dissipation terms (compare with (13)) during the first iteration. However at subsequent iterations, at locations in space and time where the solution has converged, the unmodified TL is used.

6.4. Identification of bilinear systems

For bilinear systems, the time evolution of the increment equation image in the TL model linearized around the trajectory equation image given by

equation image(35)

is equal to the time evolution according to the nonlinear model: equation image. Therefore a necessary condition for the model ℳ to be a bilinear system is that the error norm (or equivalently the relative error norm) is zero:

equation image(36)

However, numerical integrations will be subject to round-off error leading to non-zero values for d and Rd. To highlight different aspects, the time evolution of perturbations is examined in terms of the angle α (30) and the relative norm R (31). Note that α = 0 and R = 1 if and only if Rd = d = 0. In the following sections, we study the behaviour of Rd, α and R in the QG and Lorenz 96 model.

6.4.1. QG model

Figure 5 shows the relative error norm Rd, the relative norm R and angle α as a function of time for the QG model for 10 experiments. The control run is obtained by integrating the nonlinear model for 300 days. Continuing the integration for another 300 days yields the perturbed run. The trajectory for the second experiment starts using the final condition of the previous perturbed run and so forth. Due to the long integration times, the initial condition for the TL model is given by the difference between two uncorrelated state vectors on the model attractor and is therefore larger than typical analysis increments. For these large-amplitude perturbations, the TL approximation is valid for 1 day.

Figure 5.

(a) Relative error norm Rd, (b) relative norm R and (c) angle α for the QG model with individual experiments dashed. The solid line in (a) is the estimate Rd using (37) with Rd(0) = 2.9×10−14 and σ = 0.148. In (b) and (c), the solid lines are estimates using the assumption that the error vector is perpendicular to equation image.

The ten experiments show exponential growth of the relative norm R after day 210. Before day 210, both α ≈ 0 and R ≈ 1 and we conclude that the TL model can be used for lead times shorter than 210 days. The time evolution of the relative error norm Rd (Figure 5(a)) shows no signal at day 210. Instead it merely indicates exponential growth beyond day 10 with an exponent of 0.148 day−1 (standard deviation 0.005 in ten experiments) corresponding to an error doubling time of τd = 4.7 days. Note that this is longer than the error doubling time based on linearization of the TL model around a control run, which gives a Lyapunov exponent of σ = 0.254 (with standard deviation 0.014 in ten experiments) and a corresponding error doubling 2.7 days. This is in agreement with other studies (e.g. Swanson et al., 1998), where an approximate value of 3 days is given. The increase of the error doubling time when we linearize around the average trajectory of the control and perturbed run is consistent with Hoskins et al. (2000), who determined singular vector growth using different linearization trajectories in the TL model. They found that the dominant factor for singular vector growth is the dynamic structure of the linearization trajectory and, in particular, its smoothness.

From Figure 5, we see that the time evolution of the relative error norm Rd is approximately exponential beyond day 10. This suggests that we can model the time evolution of Rd for t > 10 days by

equation image(37)

The values of σ and Rd(0) are estimated using linear least squares on the experimental values of ln(Rd(t)). The solid line in Figure 5(a) show the predictions of this model with the estimated values Rd(0) = 2.9×10−14 (standard deviation 2.2×10−14 in ten experiments) and σ = 0.148 (standard deviation 0.005 in ten experiments). With the additional assumption that the error vector equation image is perpendicular to equation image, the modelled time evolution of Rd can be used to predict values of the angle α and the relative norm R (solid lines in Figure 5(b, c)). We emphasize that these solid lines are not fitted to the experimental data but are purely a result of the geometric assumption that the error vector equation image is perpendicular to equation image. Experimentally we find that the angle between equation image and equation image is 89.6° with a standard deviation of 8.1°.

With the assumption that the error vector equation image is perpendicular to equation image, the condition α = 45° is equivalent to the condition Rd = 1. Setting Rd = 1 in the error growth model gives the estimate

equation image(38)

This estimate is plotted in Figure 5(a). The same estimate is obtained from α = 45 and equation image. Note that, in the absence of round-off error, equation image and as such there is no reason to prefer the nonlinear over the TL integration. Therefore these results also put a predictability limit on the nonlinear model due to round-off error of 212 days.

6.4.2. Lorenz 96 model

In the Lorenz 96 model, we obtain the estimates equation image and σ = 0.233 day−1, equivalent to an error doubling time of 2.97 days. The error doubling times are higher than estimates based on the Lyapunov exponent (2.1 days), consistent with the reduced growth of singular vectors for smooth trajectories in Hoskins et al. (2000). The figures for Rd, α and R are similar to the results for the QG model (not shown). For the Lorenz 96 model, the TL model can be used for tp = −σ−1 lnRd(0) = 152 days.

7. Discussion

7.1. Prospects for using the method in NWP

We have demonstrated the advantage of using the optimal linearization trajectories in the context of two simple bilinear models. Although the analysis in Appendix D shows that, independent of the order of the nonlinearities in the nonlinear model, the iteratively relinearized TL model always gives better results at convergence, to get an exact correspondence between the TL and the nonlinear model, the nonlinear model has to be bilinear. In Example 2, it was shown that it is possible to transform multilinear systems to bilinear systems by augmenting the state vector.

There are other situations where apparent ‘infinite’-order nonlinearities can be transformed to bilinear terms. Let equation image and define Y = eαX then equation image and equation image, which is a bilinear system. One difference between the reduction of multilinear systems (Appendix D) to a bilinear system in Example 2 is that in this case the newly introduced variable Y has to be a prognostic variable because the algebraic constraint equation image is not bilinear and therefore cannot be used. Similarly it can be shown that equation image (define Y = sin(X) and Z = cos(X)), equation image (define Y = lnX), and equation image (define Y = Xα−1) can be written as bilinear systems. Although this does not show that realistic NWP models can be formulated as bilinear systems, it illustrates that both multilinear models and models that contain ‘infinite’-order nonlinearites can be written as a bilinear system and demonstrates that the class of bilinear systems is very general. In a forthcoming article we will show that the restriction to bilinear systems can be lifted if the TL model is linearized around an ensemble of trajectories simultaneously.

7.2. Regularized relinearization in the Lorenz 96 model

The TL model produces large increments for long lead times (Figure 7). This will deteriorate the linearization trajectory for the next iterations. In principle, this can be solved by increasing the dissipation in the TL model, however in that case the solution would no longer converge to the true solution during the iterative process. Here we propose to add a term equation image to the TL model leading to

equation image(39)

So dissipation is added to the model, but at the same time the previous iteration is used as a forcing in the TL model. At convergence of the algorithm, equation image and the added term becomes zero, i.e. the added term does not modify the fixed point of the iterated map T (21). In general, α could be an operator (also section 6.3); here we only discuss the situation where α is a scalar.

Figure 6.

(a) Average leading singular value σmax and reciprocal of the trailing singular value 1min, and (b) corresponding mean values of α for the regularized prediction αmax and regularized estimation αmin values as a function of optimization time (OT) for the Lorenz model. In both plots, the dashed lines indicate the standard deviation in 50 experiments.

Figure 7.

The norm equation image as a function of time (a) without regularization and (b) with regularization using α = 8. The black line is equation image. The coloured lines are the values for equation image. The initial condition perturbation is random with norm equation image.

Using equation image, the first iteration is given by

equation image(40)

where equation image is the propagator for the TL model with α = 0. If equation image is the singular value decomposition of equation image, we obtain

equation image(41)

So the added term has no impact on the singular vectors, but it changes the singular value spectrum. Let σmax(t) denote the leading singular value of equation image. By choosing α such that

equation image(42)

we conclude that equation image for all equation image. In Figure 6 we show the leading singular value as a function of the optimization time and the value for α when α is kept constant during the optimization window α = t−1 logσmax(t).

Figure 7 shows the impact of the added term by examining the norm equation image as a function of time for α = 0 and α = 8. The iterative method still convergences to the true solution, but in a more controlled manner. At the first iteration, the norm decreases monotonically as expected. At subsequent iterations, the forcing ensures that we still converge to the true solution.

In NWP models, we know that at each grid point in the integration domain the density ρ, absolute temperature T, pressure p and the specific humidity q are all positive quantities. TL integrations do not respect these types of constraints, and therefore it is possible that in the linearization trajectory equation image some of these variables are negative. We therefore suggest the use of a projection operator P that sets negative values of ρ, T, p and q to zero and integrate the TL model in the form

equation image(43)

Being solutions of the nonlinear model, the trajectories equation image and equation image do not contain negative values for ρ, T and q. At convergence of the iterated map, the linearization trajectory equation image is the average of equation image and equation image and therefore the linearization trajectory does not contain negative values for ρ, T, p and q, i.e. the projection operater does not modify the fixed point of the iterated map but ensures that during the iterations only ‘physically consistent’ trajectories are used.

7.3. Identification of multilinear system

In section 6.4, we introduced a necessary condition (36) for a nonlinear model to have at most bilinear terms. Here we illustrate that this condition can be used to detect higher-order multilinearities.

Consider the Lorenz 96 model with modified dissipation:

equation image(52)

where equation image is given in Appendix B, equation image and α ≥ 0. For α = 0 we recover the Lorenz 96 model and dissipation is linear. For α = 1 the dissipation is a purely trilinear term and dependent on the total energy in the system. The factor equation image is introduced to ensure that the (unstable) steady-state solution equation image for the case α = 0 is also a (unstable) steady state for α≠0. For α≠0 the additional steady-state solutions are equation image. For 0 <α <4, the last expression gives two complex conjugate steady-state solutions which cannot be reached if we start with a real-valued initial condition. The time derivative of the total energy is

equation image

For points outside the sphere with radius equation image, we therefore have Ė < 0 and we conclude that all trajectories eventually enter this ball and cannot escape afterwards.

We expect that, for non-zero values of α, we have equation image and this is indeed what we observe (Figure 8). This shows that nonbilinearity can be identified based purely on the model output and might be useful in realistic NWP models where analysing the code to determine nonbilinearity might be prohibitive.

Figure 8.

Error norm equation image as a function of time for the trilinear Lorenz model for α ∈ {0,0.2,0.4,0.6,0.8,1}. Average results for 50 experiments are shown.

8. Conclusions

The nonlinearities in fluid dynamics as a result of the advection part of the total derivative and the use of algebraic constraints such as the ideal gas law give rise to bilinear differential equations. We have shown that for bilinear systems there exists an optimal linearization trajectory for the TL model, such that the TL model predicts the exact time evolution of the perturbations. Using a quasi-geostrophic model and the Lorenz 96 model we showed that, when the optimal linearization trajectory is used, the TL model can be used for more than 200 days in a quasi-geostropic model and more than 150 days in the Lorenz 96 model. Therefore for bilinear systems one of the major limitations to the application of linear models mentioned in the introduction can be eliminated by linearizing around the optimal linearization trajectory.

We introduced an iterative method that, based purely on TL integrations, converges to this optimal linearization trajectory. We showed that the optimal linearization trajectory is a fixed point of this iterative method and, using prediction experiments in the QG and Lorenz 96 models, we showed that the iterative method converges to the fixed point. In the discussion, we introduced a method to regularize the error growth in the TL model without affecting the fixed point of the iteration. The main conclusion from this article is that this iterative method can be used in estimation problems to account for nonlinearity without using the nonlinear model. In particular, when long windows are used in forecast sensitivity experiments, the estimated increment at t = 0 will be uncorrelated to the true increment and the nonlinear model cannot be used to update the linearization trajectory. Using forecast sensitivity experiments in the Lorenz 96 model where we iteratively use the inverse of the TL model, we showed that the iterative method can be used for long windows and converges quickly. Typically four iterations (computation cost equal to four integrations with the linear model) are needed to find the optimal corrections for a 2-day forecast. In a forthcoming article, we will show that the same ideas can be used in incremental 4D-Var.

Acknowledgements

We would like to thank Wim Verkley, Theo Opsteegh and two anonymous reviewers for carefully reading earlier versions of the manuscript.

Appendices

A. Quasi-geostrophic model

Marshall and Molteni (1993) introduced a spectral three-level quasi-geostrophic (QG) model with global domain and pressure as the vertical coordinate. The model is truncated at wave number 21 and the model levels are at 200 (level 1), 500 (level 2) and 800 hPa (level 3). The model integrates the system

equation image

where qi is the potential vorticity (PV), ψi the streamfunction, Di are linear operators that represent dissipative terms, Si are constant PV sources and J the Jacobian of a two-dimensional field. We refer to Marshall and Molteni (1993) for a complete description of the model.

Figure A.1 shows the norm |X(t) − X(tδt)| as a function of δt averaged over 1 year for the QG model. In Bengtsson et al. (2008, their Figure 3) a similar picture is shown for the RMSE of the geopotential height at 500 hPa for the ECMWF model but based on analyses instead of forecasts. If the trend due to seasonal variability is removed in the ECMWF model, the RMSE reaches a maximum of 110.8 m and the RMSE of analyses one day apart is 61 m, i.e. at 1 day the error is already half of the value reached for large δt. The QG model, the Lorenz 96 model (Figure B.1) and the ECMWF model therefore show similar behaviour in this respect. In both the QG model and the Lorenz 96 model, the growth of the error norm saturates at δt = 10 days.

Figure A.1.

The norm equation image as a function of δt averaged over 1 year for the quasi-geostrophic model. The solid line is the average, and the dashed lines show the maximum and minimum values that occurred during the 1-year period.

Figure B.1.

The norm equation image as a function of δt averaged over 1 year on a log-log scale for the Lorenz 96 model. The solid line is the average, and the dashed lines show the maximum and minimum value of the norm that occurred during the 1-year period. The straight line is the estimated value equation image based on the sphere equation image given by (B.4). For ease of comparison with Figure A.1, the time axis is scaled such that 1 time unit is 5 days.

B. Lorenz 96 model

Lorenz (1996) introduced a simple system of the form (C.2) with equation image, equation image, all other equation image, equation image the Kronecker delta and ci = F, giving the system

equation image(B.1)

where the dimension of the state vector is N and the cyclic convention Xi+N = Xi is used. We will use the vector notation

equation image(B.2)

The nonlinear term conserves the total energy equation image, i.e. equation image. The linear term equation image, representing mechanical or thermal dissipation, decreases the total energy equation image, while the constant term equation image representing external forcing prevents the total energy from decaying to zero. We imagine that equation image represents some atmospheric variable around a latitude circle and Xi is the value at longitude 360i/N. In all simulations we use N = 40 and equation image. If 1 time unit in the model is identified with 5 days, the error doubling time of the model is 2.1 days (Lorenz and Emanuel, 1998).

Figure B.1 shows the norm equation image as a function of δt average over 1 year. This should be compared with figure A.1 for the QG model. The forecast error norm saturates after day 10 in both models. The straight line in Figure B.1 is the estimated bound equation image which can be derived as follows.

The time evolution of the total energy E is given by

equation image

where we used the Cauchy–Schwarz inequality. If we define the closed ball

equation image(B.3)

then for all equation image we have dE/dt < 0. For all equation image on the boundary of ℬ, we have dE/dt ≤ 0. So all trajectories that start in the interior of ℬ at t = 0 remain in this interior for t > 0. Note that the steady-state solution equation image is on the sphere.

The time derivative of the energy can also be written as

equation image(B.4)

Therefore there is a sphere equation image with radius equation image and centre equation image on which the time derivative of the total energy is zero. Again note that the steady solution equation image is on this sphere (Figure B.2). Trajectories that start in the interior of ℬ stay in the interior for t > 0 and therefore the energy of the state is bounded as T → ∞. This is only possible if either the state asymptotically approaches equation image, or by crossing the surface of the sphere indefinitely. In either case, this implies that the dynamics of the system takes place ‘near’ the surface of the sphere equation image. This is indeed what we observe (Figure B.3).

Figure B.2.

The boundary of ball ℬ (solid) and the sphere equation image (dashed) and the steady-state solution equation image for N = 2 and F = 8. Trajectories that start in ℬ cannot cross the boundary of ℬ. This figure is not equivalent to a cross-section through the X1X2 plane of the system with N = 40 because the centre of equation image will not be contained in this cross-section. In particular, the sphere equation image will appear much smaller in such cross-sections.

Figure B.3.

Distance to the centre of the sphere equation image as a function of time. The straight line is the radius of equation image.

Assume now that the state vectors for large δt are uncorrelated and on the sphere equation image. By symmetry considerations, the expected value for the angle between two vectors associated with two random points on a (N−1)-dimensional sphere is π/2 (Borel, 1914, where it is shown that for large N the probability density function tends to a normal distribution with mean π/2 and standard deviation equation image) and therefore the expected distance between two random points on the sphere equation image is equation image. This estimate is shown in Figure B.1. Given the simplicity of the arguments that were used in the derivation, this is a remarkably good estimate of the asymptotic behaviour of the forecast error norm.

Before each experiment, we started from a random point on the sphere equation image and integrated for 100 days (20 time units) to allow the system to reach the attractor. All integrations were performed using a RK4 scheme with a time step of 0.01.

C. Bilinearity preserving finite-dimensional representations and time discretizations

If equation image is a complete time-independent orthonormal basis of the phase space equation image w.r.t. an inner product equation image, we can write equation image. Using the bilinearity of q and linearity of b, (11) can be written as

equation image(C.1)

where we use the convention that there is an implied summation over a repeated upper and lower index in a single term. Taking the inner product of this equation with equation image gives the time evolution of the coordinates Xi(t):

equation image(C.2)

where equation image, equation image and equation image. We see that, if the coordinate vector is truncated at a certain index N, the truncated system is bilinear (e.g. if equation image is a spherical harmonic basis). Therefore the time evolution of the coordinates Xi w.r.t. a time independent truncated orthonormal basis is given by a bilinear differential equation and the optimal linearization trajectory can be obtained by adding the coordinates.

C.1. Integration schemes

The Euler forward scheme propagates the state vector as

equation image(C.3)

where h is the time step. If

equation image

then the highest-order nonlinear term in the map from equation image to equation image is bilinear and therefore the time discretization by the integration scheme preserves the bilinearity of the underlying differential equation. This is no longer true if higher-order integration schemes are used. For these schemes, the value that is used to evaluate the right-hand side of the differential equation at intermediate time steps needs to be stored in the linearization trajectory. In the TL integration, these values from the linearization trajectory should then be used in the evaluation of the right-hand side of the TL model.

D. Multilinear systems

Definition 5: Multilinear map

A map equation image is called multilinear if it is linear in each argument.

Definition 6: Symmetric multilinear map

For a given multilinear map qn we define a symmetric map sn by

equation image(D.1)

where the summation is over all possible permutations of the arguments equation image.

Consider the general form of a multilinear system with at most Nth-order multilinearities:

equation image(D.2)

where q0 is the forcing term in the model. Substitution of equation image and using (D.2) gives

equation image

Using Definition 6 and Newton's binomial theorem, this can be written as

equation image(D.3)

The sum over k starts from k = 1 because the terms with only upper-case equation images are cancelled. The summation over n starts from n = 1 because the constant term is cancelled. Retaining only the terms linear in equation image (terms with k = 1) gives the TL model

equation image(D.4)

If we iteratively relinearized the TL model around the trajectory equation image, we get at convergence of the algorithm a unique increment equation image that satisfies

equation image(D.5)

Using Newton's binomial theorem, this can be written as

equation image

Shifting the summation over k with 1 gives

equation image

which can also be written as

equation image(D.6)

Table I shows the coefficients of the exact and the relinearized time evolution of perturbations at converge of the algorithm.

Table D.I. Coefficients for the exact (D.3) and the relinearized (D.6) time evolution of perturbations.
 k
n12345
Exact time evolution
11
221
3331
44641
55101051
Relinearized time evolution
11
221
333equation image
446equation imageequation image
5510equation imageequation imageequation image

The normal TL model has non-zero values only in the first column. Therefore we see that the relinearized model takes into account all linear terms but also all quadratic terms in the perturbation equation image. For terms higher than quadratic in equation image, the relinearized model multiplies the exact coefficient with a factor k21−k. This is a number between 0 and 1, and therefore is always closer to the exact coefficient than setting the coefficient to zero, as is done in the standard TL model. We therefore conclude that the relinearization iteration will always give better approximations than the standard TL model at convergence of the algorithm.

  • *

    We have introduced a minus sign in the equation image term so that potential energy is increasing with increasing height.

Ancillary