We examine differential equations where nonlinearity is a result of the advection part of the total derivative or the use of quadratic algebraic constraints between state variables (such as the ideal gas law). We show that these types of nonlinearity can be accounted for in the tangent linear model by a suitable choice of the linearization trajectory. Using this optimal linearization trajectory, we show that the tangent linear model can be used to reproduce the exact nonlinear error growth of perturbations for more than 200 days in a quasi-geostrophic model and more than (the equivalent of) 150 days in the Lorenz 96 model. We introduce an iterative method, purely based on tangent linear integrations, that converges to this optimal linearization trajectory.
The use of tangent linear (TL) and, in particular, adjoint models has been very useful in several applications in numerical weather prediction (NWP) (e.g. Errico, 1997, 2003; Errico and Ehrendorfer, 2007, give an overview). For example, at the European Centre for Medium-range Weather Forecasts (ECMWF) these linear models play a crucial role in the computation of initial condition perturbations used in the ensemble prediction system (Leutbecher and Palmer, 2008) and in their four-dimensional data assimilation system (4D-Var; Courtier et al., 1994). One of the major limitations to the application of linear models is that the results are useful only when the linear approximation is valid (Errico, 1997). By this we mean that the difference between two runs of the nonlinear model can be described by the associated linearized version of the nonlinear model. To achieve this, great effort is taken to develop linearized models which capture as many features as possible of the full nonlinear model (Janisková et al., 1999). Despite these efforts, the use of TL and adjoint models is restricted to ‘short’ time spans. The time span for which the TL model can be considered accurate will be referred to as the TL regime.
The duration of the TL regime depends on many factors. Typically the difference between two nonlinear forecasts is compared with the linear forecast by a scalar index, and it is said that the TL assumption is violated when the index has reached a threshold value. So the measure which is employed to compare forecast fields is already important in the definition of the TL regime. But also the size of the initial condition perturbation, the orientation of the perturbation, the background trajectory around which the TL model is linearized and the physical processes taken into account in the TL model all play a role. Another issue which influences the usefulness of linear models is whether we are considering forecast problems, where error growth is determined by the singular value spectrum of the propagator, or estimation problems that are typically characterized by the reciprocal of the singular value spectrum. In general, the spectrum of reciprocal of the singular values attains higher values (Reynolds and Palmer, 1998) and therefore the usefulness of the TL model in estimation problems is shorter. This effect becomes even more pronounced by the fact that the typical size of perturbations used in backward integrations is larger than in forward mode.
In this forward mode, the TL assumption is generally believed to be valid for 2–3 days at the synoptic scale. However, Gilmour et al. (2001) argue that 1 day is perhaps a better estimate. On the cloud-resolving scale, the TL assumption probably holds for much shorter time periods on the order of 1.5 h (Hohenegger and Schär, 2007). As models reach higher resolutions, the validity of the TL assumption is therefore a major concern. We will show that for bilinear systems the usefulness of the TL model can be greatly extended by modifying the linearization trajectory and therefore one of the major limitations on using TL models can be eliminated.
In section 2, the definition of bilinear differential equations is given. In section 3, we show that for bilinear systems there is an optimal linearization trajectory such that, if the TL model is linearized around this trajectory, the perturbation growth in the TL model is equal to the nonlinear perturbation growth. Knowing that such a trajectory exists, we show in section 4 that there is an iterated map based purely on TL integrations that converges to this linearization trajectory. In section 5, we show how the iterative method can be used in forecast sensitivity experiments using the inverse of the TL model. In section 6, the experimental results using a quasi-geostophic (QG) model (Marshall and Molteni, 1993, described in Appendix A) and the Lorenz 96 model (Lorenz, 1996, described in Appendix B) are given. In the discussion in section 7, the prospects for using the method in realistic NWP models and a method to regularize the error growth in the TL model are discussed. The conclusions are given in section 8.
To keep the notation simple, we use the convention that lower-case variables are perturbations (also referred to as increments) to upper-case variable, e.g. is a perturbation to the state vector .
2. Bilinear differential equations
In this section, some terms are defined which will be used throughout the article. Let and be elements of a vector space .
Definition 1: Bilinear map
A map q
is called bilinear if q is linear in both arguments.
Definition 2: (Anti)symmetric bilinear map
A bilinear map s will be called symmetric if for any
A bilinear map a will be called antisymmetric if
Note that, for any bilinear map q, there is a unique decomposition
where is symmetric and is antisymmetric.
Definition 3: Bilinear differential equation
A differential equation will be called bilinear if it is of the form
where q is a bilinear map, b is a linear map and c is a forcing. If is finite-dimensional, this is an ordinary differential equation (ODE), while if represents a (collection of) space- and time-dependent field(s), this is a partial differential equation (PDE). For PDEs, the mappings q, b and forcing c are allowed to depend on space and time explicitly.
A differential algebraic equation (Brenan et al., 1996) will be called bilinear if it is of the form
where q and e are bilinear maps, d and b are linear maps and c is a forcing.
Example 1: Barotropic vorticity equation
The barotropic vorticity equation (bve) is
where the first equation is a prognostic equation for the absolute vorticity η, the second and third equations are algebraic constraints (diagnostic equations) for the stream function ψ and the two-dimensional velocity respectively (hence the zeros in front of the time derivatives), f is the Coriolis parameter, is the vertical unit vector and J is an antisymmetric bilinear map defined as
If we define , we see that the bve is a bilinear partial differential algebraic equation (BPDAE) in the state vector with e = 0 and d = diag(I,0), and the velocity is a ‘post-processed’ variable. Alternatively, the equation for ∂η/∂t can be written as , in which case the state vector should be defined as .
Example 2: Momentum equation
The momentum equation in a uniform rotating coordinate frame is (Pedlosky, 1987)*
where is the three-dimensional velocity vector, p is pressure, ρ is density, Ω is the angular rotation vector, φ is the potential that represents conservative body forces, including gravity, and ℱ represents non-conservative (frictional) forces. The prognostic equation for is a trilinear differential equation due to the term caused by the advection part of the total derivative. It is however easy to transform the trilinear equation to a bilinear differential algebraic equation by augmenting the state vector with the momentum density :
Alternatively, the momentum density vector field can be considered as the prognostic variable
where we used the mass continuity equation.
Example 3: Equation of state
The equation of state for an ideal gas can be formulated as an algebraic constraint as
Examples 1 and 2 illustrate that in fluid dynamics bilinearity is typically a result of the advection part of the total derivative. Example 3 shows that another source for bilinearity is the use of algebraic constraints between state variables such as the ideal gas law. Example 2 further illustrates that it is easy to reduce multilinear systems (Appendix D) to bilinear systems by augmenting the state vector.
Notation 1: Nonlinear integrations
Integrations with a nonlinear model starting from an initial condition are denoted by
Then, by definition, the exact increment trajectory for a given perturbation of an initial condition is given by
Notation 2: Tangent linear integrations
Integrations with the TL model starting with an initial condition perturbation are denoted by
where is known as the propagator and is the trajectory around which the tangent linear model is linearized.
3. Optimal linearization trajectories
In this section, we derive the TL model for the general form of bilinear system and show how to modify the linearization trajectory to obtain an exact correspondence between the nonlinear time evolution and the corresponding TL evolution of perturbations.
Consider the general form of a bilinear differential equation
where , q is a bilinear mapping, b is a linear mapping and c is a forcing, and the mappings q and b and forcing c are allowed to explicitly depend on space and time. Solutions (trajectories in ) of (11) are denoted as . The time evolution of a perturbed run is given by
Now, using the bilinearity of q, the linearity of b and (11) to eliminate , we obtain
Here we used the linearity of q in both arguments and the linearity of b to define an operator
For finite-dimensional systems, is the Jacobian of (11) evaluated along the trajectory . In the TL approximation, the bilinear term is neglected and the system
is known as the TL model. We use a hat to indicate that this is only an approximation to the true evolution , and the reason for adding the superscript 1 will become apparent later.
The key observation in this section is that the exact time evolution of perturbations in (13) can also be written as
i.e. we obtain the exact time evolution of perturbations if the TL model is linearized around the trajectory instead of . The trajectory will be referred to as the optimal linearization trajectory. The previous results can be generalized to BPDAEs. Let
then substitution of using (17) and retaining only terms linear in gives the TL model
where we used Definition 2 to write
It is easy to see that the neglected bilinear terms and are recovered if the TL model is linearized around the trajectory . An important difference from the previous result is that to integrate the TL model both the trajectory and the tendencies are required if e≠0.
Integrations with the TL model linearized around a trajectory starting from an initial condition will be denoted by
Note that, although the TL model is used to propagate the increment , this is not a linear mapping from to due to the dependence of the linearization trajectory on . In Appendix C, we discuss how to preserve bilinearity when higher than first-order integration schemes are used to integrate (11), and show that bilinearity is preserved if a finite-dimensional representation of the state vector is obtained by truncating the coordinate vector with respect to a time-independent orthonormal basis.
4. Iterative relinearization
In section 3, we observed that, for a given initial condition perturbation, there is an optimal linearization trajectory for the TL model such that the TL predictions become exactly equal to the nonlinear predictions. In this section, we introduce an iterative method purely based on integrations with the TL model that converges to this optimal trajectory. Section 5 shows how this iterative method can be used to update the linearization trajectory in forecast sensitivity experiments without using the nonlinear model.
We have seen that, for bilinear systems,
describes the exact time evolution of perturbations. For a given initial condition perturbation and a trajectory , this equation can be written as a map that maps increment trajectories to increment trajectories
with , i.e. the trajectory is a fixed point of . If, for a fixed time interval [0,T], there is a constant 0 < q < 1 and a suitable metric d on the space of increments defined on the interval [0,T] such that , then is known as a contraction mapping. The Banach fixed-point theorem then guarantees that the fixed point is unique and moreover the iterated map
converges to this fixed point. This suggests that, given an estimate of the trajectory , the TL model can be integrated in the form
where the superscripts indicate the iteration number. With , the first iteration k = 1 is equal to a standard TL integration (as given by (15)) and gives a trajectory . During the second iteration, we integrate the TL model with a modified trajectory , etc. Alternatively, the iteration can be started with , which has the advantage that the time derivatives in the TL model become exact at t = 0. In the experiments both methods are compared.
In Appendix D, an analysis of (22) for multilinear models is given and we show that, independent of the order of the nonlinearities in the nonlinear model, at convergence (22) always gives better predictions of the time evolution of perturbations than the standard TL model (15). In particular, the bilinear terms are exactly taken into account. In section 6.2, we examine the rate of convergence for the iterated map (21) for the QG and the Lorenz 96 models.
Remark 1: Radius of convergence
Iterated maps can exhibit a finite radius of convergence even though there is a fixed point valid for all t. Therefore, even though the fixed-point trajectory is valid for all t, this does not imply that the iterated map (22) converges to this fixed point. As an example, consider the system with X(0) = 1. The solution is given by the Witch of Agnesi X(t) = 1/(t2 + 1). The Picard iteration
with X0(t) = 1 converges to the Taylor series of X(t) but, because X(t) has poles at t = ±i, the Picard iteration only converges to the fixed point for |t| < 1.
5. Estimation using the inverse TL model
The estimation problem considered in this article is: given a forecast starting from an analysis
and an analysis valid at time T, can we determine an analysis increment such that
These types of experiments are known as forecast sensitivity experiments and have been studied by Rabier et al. (1996), Pu et al. (1997a,b), and Klinker et al. (1998). If the TL assumption is valid, we expect
and therefore we can obtain estimates of from
Besides giving estimates for , the integration with the inverse TL model can be used to produce estimates of the complete trajectory . The result from section 4 therefore suggests using this method iteratively:
with or . This defines an iterated map on the space of increment trajectories
where . In section 6.3, the convergence rate of the iterated map (28) is investigated for the Lorenz 96 model.
To highlight different aspects of the optimal linearization trajectories and the iterative relinearization method, the exact time evolution of the perturbations and the corresponding TL evolution are compared using five indices lk, αk, Rk and dk and . The similarity index lk(t) is defined as
the angle αk(t) is given by
the relative norm Rk(t) is given by
the error norm dk(t) by
and the relative error norm Rd(t) by
For the QG model, the values of dk, lk, Rk and are determined using the kinetic energy inner product. For the Lorenz 96 model, the Euclidean inner product is used. In the context of twin experiments, values of l = 0.7, corresponding to an angle α = 45°, are commonly used to indicate that the TL assumption is violated (e.g. Gilmour et al., 2001).
We will say that is more similar to than at time t if αk(t) < αk−1(t) or, equivalently, if lk(t) > lk−1(t). We say that is closer to than at time t if dk(t) < dk−1(t) or, equivalently, if .
6.2. Iterative relinearization
In this section the rate of convergence of the iterated map T is examined in a quasi-geostrophic model (described in Appendix A) and the Lorenz 96 model (described in Appendix B).
6.2.1. QG model
In Figure 1(a) we show the 2-day forecast difference of the stream function at 500 hPa. The initial condition for the control run and the perturbed run are 100 days apart and therefore we may assume that they are uncorrelated (see also Figure A.1). The size of the perturbations used in these experiments is therefore much larger than typical analysis increments. In Figure 1(b), we show the forecast of the standard TL model .
The other panels show the iterative method for four iterations with . Both the standard TL integration (l1 = 0.55) and the first iteration with (l1 = 0.73) differ substantially from the truth, with large differences north of 60°N. In the first iteration with there is a wave pattern over the North Atlantic Ocean which is absent in the first iteration with . At subsequent iterations, all positive and negative cells are gradually moved to their correct location and with the correct amplitude. At iterations 2 to 4 we have l2 = 0.90, l3 = 0.95 and l4 = 0.99 respectively, indicating that the iterative method converges quickly with the largest improvement when going from iteration 1 to 2.
Figure 2 shows the similarity index lk and the relative error norm as a function of time and iteration number. The solid black line refers to the standard TL model with . The coloured lines show the iterative relinearized results for four iterations with . The control run and the perturbed run are 2 days apart. From the standard TL integration, we see that the duration of the TL regime is slightly larger than 1 day. Especially in the short range, it is beneficial to use because the derivatives in the TL model become exact at t = 0. In Figure 2(b) this can be seen for example from the relative error norm where when the standard TL model is used. Observe that the iterative method adds approximately 0.5 days to the usefulness of the TL model at each iteration.
6.2.2. Lorenz 96 model
Figure 3 shows the similarity index and relative error norm (average over 50 experiments) as a function of time and iteration index for the Lorenz 96 model. All experiments start with a random initial condition perturbation with . Such an initial condition amplitude is approximately equal to the size of 12 h forecast differences (Figure B.1). From the first iteration using (black), we see that the duration of the TL regime is slightly larger than 1.5 days (0.3 time units). Using , this can be extended to 2 days. The iterative linearization method converges to the true increment at subsequent iterations. For a 2-day forecast (0.4 time units), of the order of four iterations are required to converge to the true time evolution of the increment, with the largest improvements when going from iteration 1 to 2. For longer lead times, more iterations are needed. This is related to the fact that the TL model produces large increments beyond the duration of TL regime (see also Figure 7). Therefore the corrections used in the second iteration are actually deteriorating the linearization trajectory at the end. As a result of this, the second iteration is further away from the truth at the end of the optimization window, even though it is more similar to the truth. In section 7.2, we discuss a method to regularize this behaviour without affecting the fixed point of the iterated map.
6.3. Estimation using the inverse TL model
Here we examine the iterated map (28) from section 5. The action of on a vector is obtained by integrating the TL model backwards in time. In the Lorenz 96 model, the fourth-order Runge–Kutta (RK4) scheme is used to propagate the state. Theoretically the backward integration requires the use of the inverse integration scheme (which will be an implicit scheme) to ensure . Here the adjoint of the RK4 scheme is used to integrate the TL model backwards in time. In the Lorenz 96 model, we find experimentally that the angle between and is of the order O(10−3) degrees, and the relative norm for an optimization time of 0.6 time units (3 days). So it appears that is close to the identity operator. We conclude that the adjoint RK4 scheme can be used for the inverse integrations.
Figure 4 shows the result when we iteratively solve (27) using . Even though the estimate from the first iteration differs substantially from the truth with l1 = 0.4, the method quickly converges and the subsequent iterations are more similar and closer to the truth. Approximately four iterations are required to obtain an almost perfect estimate. Note that, during the inverse integration, we also obtain the corrections needed for the next iterations. Therefore the computational cost is equal to four TL integrations (backwards). This cost should be compared to the alternative of solving this estimation problem in terms of a cost function minimization (e.g. 4D-Var) where a single inner-loop iteration already involves two linear integrations (1 adjoint and 1 TL integration). For comparison, Figure 4 also includes the result when the standard TL model, i.e. , is used to propagate the increment backwards in time (the black line). If l = 0.7 is used as threshold value, then the gain of using in the first iteration is 0.13 time units (0.65 days). From the time evolution of the error norm (Figure 4(b)), we see that this gain is mainly a result of the fact that, the time derivatives at t = 0.4 become exact in the TL model and thus at t = 0.4. In particular for large perturbations, we therefore expect to benefit from using .
From Figure 4, it is also clear that for long optimization windows the estimated increment at t = 0 from the first iteration becomes uncorrelated with the true increment. As a result, the nonlinear forecast starting from bears low similarity to the truth (dashed lines in Figure 4). Therefore, for long windows, the nonlinear model starting from cannot be used to update the linearization trajectory. In a forthcoming article, applying optimal linearization trajectories in the context of 4D-Var, we will show that also in 4D-Var it is better to update the linearization trajectory using the TL model.
If we use l < 0.7 to indicate the breakdown of the TL assumption, Figure 4 indicates that the TL assumption linearized around the control run is valid for 0.15 time units (i.e. from t = 0.4 to t = 0.25). This should be compared with the forward integration in Figure 3 where the value 0.7 is reached after 0.3 time units. The duration of the TL regime is shorter for inverse integrations. Partly this is a result of the fact that error growth in the backward integration is characterized by the reciprocal singular value spectrum and these values are larger than the singular values (Figure 6). Another reason is that typically and therefore the backward integration is started with larger initial conditions. The idea of using the inverse of the TL model has been studied by Pu et al. (1997a) using a method called the quasi-inverse. They reversed the sign of the dissipation terms in the TL model as a form of regularization. As will be discussed in section 7.2 on the regularized prediction experiments, there is no need for bilinear systems to add regularization when the optimal linearization trajectory is used. Therefore the amount of regularization should depend on how close we are to the optimal linearization trajectory. If the linear term b in the nonlinear model (11) is a purely dissipative term, i.e. , then the TL model can be integrated in the form
The choice α = 2 amounts to reversing the sign of the dissipation terms (compare with (13)) during the first iteration. However at subsequent iterations, at locations in space and time where the solution has converged, the unmodified TL is used.
6.4. Identification of bilinear systems
For bilinear systems, the time evolution of the increment in the TL model linearized around the trajectory given by
is equal to the time evolution according to the nonlinear model: . Therefore a necessary condition for the model ℳ to be a bilinear system is that the error norm (or equivalently the relative error norm) is zero:
However, numerical integrations will be subject to round-off error leading to non-zero values for d and Rd. To highlight different aspects, the time evolution of perturbations is examined in terms of the angle α (30) and the relative norm R (31). Note that α = 0 and R = 1 if and only if Rd = d = 0. In the following sections, we study the behaviour of Rd, α and R in the QG and Lorenz 96 model.
6.4.1. QG model
Figure 5 shows the relative error norm Rd, the relative norm R and angle α as a function of time for the QG model for 10 experiments. The control run is obtained by integrating the nonlinear model for 300 days. Continuing the integration for another 300 days yields the perturbed run. The trajectory for the second experiment starts using the final condition of the previous perturbed run and so forth. Due to the long integration times, the initial condition for the TL model is given by the difference between two uncorrelated state vectors on the model attractor and is therefore larger than typical analysis increments. For these large-amplitude perturbations, the TL approximation is valid for 1 day.
The ten experiments show exponential growth of the relative norm R after day 210. Before day 210, both α ≈ 0 and R ≈ 1 and we conclude that the TL model can be used for lead times shorter than 210 days. The time evolution of the relative error norm Rd (Figure 5(a)) shows no signal at day 210. Instead it merely indicates exponential growth beyond day 10 with an exponent of 0.148 day−1 (standard deviation 0.005 in ten experiments) corresponding to an error doubling time of τd = 4.7 days. Note that this is longer than the error doubling time based on linearization of the TL model around a control run, which gives a Lyapunov exponent of σ = 0.254 (with standard deviation 0.014 in ten experiments) and a corresponding error doubling 2.7 days. This is in agreement with other studies (e.g. Swanson et al., 1998), where an approximate value of 3 days is given. The increase of the error doubling time when we linearize around the average trajectory of the control and perturbed run is consistent with Hoskins et al. (2000), who determined singular vector growth using different linearization trajectories in the TL model. They found that the dominant factor for singular vector growth is the dynamic structure of the linearization trajectory and, in particular, its smoothness.
From Figure 5, we see that the time evolution of the relative error norm Rd is approximately exponential beyond day 10. This suggests that we can model the time evolution of Rd for t > 10 days by
The values of σ and Rd(0) are estimated using linear least squares on the experimental values of ln(Rd(t)). The solid line in Figure 5(a) show the predictions of this model with the estimated values Rd(0) = 2.9×10−14 (standard deviation 2.2×10−14 in ten experiments) and σ = 0.148 (standard deviation 0.005 in ten experiments). With the additional assumption that the error vector is perpendicular to , the modelled time evolution of Rd can be used to predict values of the angle α and the relative norm R (solid lines in Figure 5(b, c)). We emphasize that these solid lines are not fitted to the experimental data but are purely a result of the geometric assumption that the error vector is perpendicular to . Experimentally we find that the angle between and is 89.6° with a standard deviation of 8.1°.
With the assumption that the error vector is perpendicular to , the condition α = 45° is equivalent to the condition Rd = 1. Setting Rd = 1 in the error growth model gives the estimate
This estimate is plotted in Figure 5(a). The same estimate is obtained from α = 45 and . Note that, in the absence of round-off error, and as such there is no reason to prefer the nonlinear over the TL integration. Therefore these results also put a predictability limit on the nonlinear model due to round-off error of 212 days.
6.4.2. Lorenz 96 model
In the Lorenz 96 model, we obtain the estimates and σ = 0.233 day−1, equivalent to an error doubling time of 2.97 days. The error doubling times are higher than estimates based on the Lyapunov exponent (2.1 days), consistent with the reduced growth of singular vectors for smooth trajectories in Hoskins et al. (2000). The figures for Rd, α and R are similar to the results for the QG model (not shown). For the Lorenz 96 model, the TL model can be used for tp = −σ−1 lnRd(0) = 152 days.
7.1. Prospects for using the method in NWP
We have demonstrated the advantage of using the optimal linearization trajectories in the context of two simple bilinear models. Although the analysis in Appendix D shows that, independent of the order of the nonlinearities in the nonlinear model, the iteratively relinearized TL model always gives better results at convergence, to get an exact correspondence between the TL and the nonlinear model, the nonlinear model has to be bilinear. In Example 2, it was shown that it is possible to transform multilinear systems to bilinear systems by augmenting the state vector.
There are other situations where apparent ‘infinite’-order nonlinearities can be transformed to bilinear terms. Let and define Y = eαX then and , which is a bilinear system. One difference between the reduction of multilinear systems (Appendix D) to a bilinear system in Example 2 is that in this case the newly introduced variable Y has to be a prognostic variable because the algebraic constraint is not bilinear and therefore cannot be used. Similarly it can be shown that (define Y = sin(X) and Z = cos(X)), (define Y = lnX), and (define Y = Xα−1) can be written as bilinear systems. Although this does not show that realistic NWP models can be formulated as bilinear systems, it illustrates that both multilinear models and models that contain ‘infinite’-order nonlinearites can be written as a bilinear system and demonstrates that the class of bilinear systems is very general. In a forthcoming article we will show that the restriction to bilinear systems can be lifted if the TL model is linearized around an ensemble of trajectories simultaneously.
7.2. Regularized relinearization in the Lorenz 96 model
The TL model produces large increments for long lead times (Figure 7). This will deteriorate the linearization trajectory for the next iterations. In principle, this can be solved by increasing the dissipation in the TL model, however in that case the solution would no longer converge to the true solution during the iterative process. Here we propose to add a term to the TL model leading to
So dissipation is added to the model, but at the same time the previous iteration is used as a forcing in the TL model. At convergence of the algorithm, and the added term becomes zero, i.e. the added term does not modify the fixed point of the iterated map T (21). In general, α could be an operator (also section 6.3); here we only discuss the situation where α is a scalar.
Using , the first iteration is given by
where is the propagator for the TL model with α = 0. If is the singular value decomposition of , we obtain
So the added term has no impact on the singular vectors, but it changes the singular value spectrum. Let σmax(t) denote the leading singular value of . By choosing α such that
we conclude that for all . In Figure 6 we show the leading singular value as a function of the optimization time and the value for α when α is kept constant during the optimization window α = t−1 logσmax(t).
Figure 7 shows the impact of the added term by examining the norm as a function of time for α = 0 and α = 8. The iterative method still convergences to the true solution, but in a more controlled manner. At the first iteration, the norm decreases monotonically as expected. At subsequent iterations, the forcing ensures that we still converge to the true solution.
In NWP models, we know that at each grid point in the integration domain the density ρ, absolute temperature T, pressure p and the specific humidity q are all positive quantities. TL integrations do not respect these types of constraints, and therefore it is possible that in the linearization trajectory some of these variables are negative. We therefore suggest the use of a projection operator P that sets negative values of ρ, T, p and q to zero and integrate the TL model in the form
Being solutions of the nonlinear model, the trajectories and do not contain negative values for ρ, T and q. At convergence of the iterated map, the linearization trajectory is the average of and and therefore the linearization trajectory does not contain negative values for ρ, T, p and q, i.e. the projection operater does not modify the fixed point of the iterated map but ensures that during the iterations only ‘physically consistent’ trajectories are used.
7.3. Identification of multilinear system
In section 6.4, we introduced a necessary condition (36) for a nonlinear model to have at most bilinear terms. Here we illustrate that this condition can be used to detect higher-order multilinearities.
Consider the Lorenz 96 model with modified dissipation:
where is given in Appendix B, and α ≥ 0. For α = 0 we recover the Lorenz 96 model and dissipation is linear. For α = 1 the dissipation is a purely trilinear term and dependent on the total energy in the system. The factor is introduced to ensure that the (unstable) steady-state solution for the case α = 0 is also a (unstable) steady state for α≠0. For α≠0 the additional steady-state solutions are . For 0 <α <4, the last expression gives two complex conjugate steady-state solutions which cannot be reached if we start with a real-valued initial condition. The time derivative of the total energy is
For points outside the sphere with radius , we therefore have Ė < 0 and we conclude that all trajectories eventually enter this ball and cannot escape afterwards.
We expect that, for non-zero values of α, we have and this is indeed what we observe (Figure 8). This shows that nonbilinearity can be identified based purely on the model output and might be useful in realistic NWP models where analysing the code to determine nonbilinearity might be prohibitive.
The nonlinearities in fluid dynamics as a result of the advection part of the total derivative and the use of algebraic constraints such as the ideal gas law give rise to bilinear differential equations. We have shown that for bilinear systems there exists an optimal linearization trajectory for the TL model, such that the TL model predicts the exact time evolution of the perturbations. Using a quasi-geostrophic model and the Lorenz 96 model we showed that, when the optimal linearization trajectory is used, the TL model can be used for more than 200 days in a quasi-geostropic model and more than 150 days in the Lorenz 96 model. Therefore for bilinear systems one of the major limitations to the application of linear models mentioned in the introduction can be eliminated by linearizing around the optimal linearization trajectory.
We introduced an iterative method that, based purely on TL integrations, converges to this optimal linearization trajectory. We showed that the optimal linearization trajectory is a fixed point of this iterative method and, using prediction experiments in the QG and Lorenz 96 models, we showed that the iterative method converges to the fixed point. In the discussion, we introduced a method to regularize the error growth in the TL model without affecting the fixed point of the iteration. The main conclusion from this article is that this iterative method can be used in estimation problems to account for nonlinearity without using the nonlinear model. In particular, when long windows are used in forecast sensitivity experiments, the estimated increment at t = 0 will be uncorrelated to the true increment and the nonlinear model cannot be used to update the linearization trajectory. Using forecast sensitivity experiments in the Lorenz 96 model where we iteratively use the inverse of the TL model, we showed that the iterative method can be used for long windows and converges quickly. Typically four iterations (computation cost equal to four integrations with the linear model) are needed to find the optimal corrections for a 2-day forecast. In a forthcoming article, we will show that the same ideas can be used in incremental 4D-Var.
We would like to thank Wim Verkley, Theo Opsteegh and two anonymous reviewers for carefully reading earlier versions of the manuscript.
A. Quasi-geostrophic model
Marshall and Molteni (1993) introduced a spectral three-level quasi-geostrophic (QG) model with global domain and pressure as the vertical coordinate. The model is truncated at wave number 21 and the model levels are at 200 (level 1), 500 (level 2) and 800 hPa (level 3). The model integrates the system
where qi is the potential vorticity (PV), ψi the streamfunction, Di are linear operators that represent dissipative terms, Si are constant PV sources and J the Jacobian of a two-dimensional field. We refer to Marshall and Molteni (1993) for a complete description of the model.
Figure A.1 shows the norm |X(t) − X(t − δt)| as a function of δt averaged over 1 year for the QG model. In Bengtsson et al. (2008, their Figure 3) a similar picture is shown for the RMSE of the geopotential height at 500 hPa for the ECMWF model but based on analyses instead of forecasts. If the trend due to seasonal variability is removed in the ECMWF model, the RMSE reaches a maximum of 110.8 m and the RMSE of analyses one day apart is 61 m, i.e. at 1 day the error is already half of the value reached for large δt. The QG model, the Lorenz 96 model (Figure B.1) and the ECMWF model therefore show similar behaviour in this respect. In both the QG model and the Lorenz 96 model, the growth of the error norm saturates at δt = 10 days.
B. Lorenz 96 model
Lorenz (1996) introduced a simple system of the form (C.2) with , , all other , the Kronecker delta and ci = F, giving the system
where the dimension of the state vector is N and the cyclic convention Xi+N = Xi is used. We will use the vector notation
The nonlinear term conserves the total energy , i.e. . The linear term , representing mechanical or thermal dissipation, decreases the total energy , while the constant term representing external forcing prevents the total energy from decaying to zero. We imagine that represents some atmospheric variable around a latitude circle and Xi is the value at longitude 360i/N. In all simulations we use N = 40 and . If 1 time unit in the model is identified with 5 days, the error doubling time of the model is 2.1 days (Lorenz and Emanuel, 1998).
Figure B.1 shows the norm as a function of δt average over 1 year. This should be compared with figure A.1 for the QG model. The forecast error norm saturates after day 10 in both models. The straight line in Figure B.1 is the estimated bound which can be derived as follows.
The time evolution of the total energy E is given by
where we used the Cauchy–Schwarz inequality. If we define the closed ball
then for all we have dE/dt < 0. For all on the boundary of ℬ, we have dE/dt ≤ 0. So all trajectories that start in the interior of ℬ at t = 0 remain in this interior for t > 0. Note that the steady-state solution is on the sphere.
The time derivative of the energy can also be written as
Therefore there is a sphere with radius and centre on which the time derivative of the total energy is zero. Again note that the steady solution is on this sphere (Figure B.2). Trajectories that start in the interior of ℬ stay in the interior for t > 0 and therefore the energy of the state is bounded as T → ∞. This is only possible if either the state asymptotically approaches , or by crossing the surface of the sphere indefinitely. In either case, this implies that the dynamics of the system takes place ‘near’ the surface of the sphere . This is indeed what we observe (Figure B.3).
Assume now that the state vectors for large δt are uncorrelated and on the sphere . By symmetry considerations, the expected value for the angle between two vectors associated with two random points on a (N−1)-dimensional sphere is π/2 (Borel, 1914, where it is shown that for large N the probability density function tends to a normal distribution with mean π/2 and standard deviation ) and therefore the expected distance between two random points on the sphere is . This estimate is shown in Figure B.1. Given the simplicity of the arguments that were used in the derivation, this is a remarkably good estimate of the asymptotic behaviour of the forecast error norm.
Before each experiment, we started from a random point on the sphere and integrated for 100 days (20 time units) to allow the system to reach the attractor. All integrations were performed using a RK4 scheme with a time step of 0.01.
C. Bilinearity preserving finite-dimensional representations and time discretizations
If is a complete time-independent orthonormal basis of the phase space w.r.t. an inner product , we can write . Using the bilinearity of q and linearity of b, (11) can be written as
where we use the convention that there is an implied summation over a repeated upper and lower index in a single term. Taking the inner product of this equation with gives the time evolution of the coordinates Xi(t):
where , and . We see that, if the coordinate vector is truncated at a certain index N, the truncated system is bilinear (e.g. if is a spherical harmonic basis). Therefore the time evolution of the coordinates Xi w.r.t. a time independent truncated orthonormal basis is given by a bilinear differential equation and the optimal linearization trajectory can be obtained by adding the coordinates.
C.1. Integration schemes
The Euler forward scheme propagates the state vector as
where h is the time step. If
then the highest-order nonlinear term in the map from to is bilinear and therefore the time discretization by the integration scheme preserves the bilinearity of the underlying differential equation. This is no longer true if higher-order integration schemes are used. For these schemes, the value that is used to evaluate the right-hand side of the differential equation at intermediate time steps needs to be stored in the linearization trajectory. In the TL integration, these values from the linearization trajectory should then be used in the evaluation of the right-hand side of the TL model.
D. Multilinear systems
Definition 5: Multilinear map
A map is called multilinear if it is linear in each argument.
Definition 6: Symmetric multilinear map
For a given multilinear map qn we define a symmetric map sn by
where the summation is over all possible permutations of the arguments .
Consider the general form of a multilinear system with at most Nth-order multilinearities:
where q0 is the forcing term in the model. Substitution of and using (D.2) gives
Using Definition 6 and Newton's binomial theorem, this can be written as
The sum over k starts from k = 1 because the terms with only upper-case s are cancelled. The summation over n starts from n = 1 because the constant term is cancelled. Retaining only the terms linear in (terms with k = 1) gives the TL model
If we iteratively relinearized the TL model around the trajectory , we get at convergence of the algorithm a unique increment that satisfies
Using Newton's binomial theorem, this can be written as
Shifting the summation over k with 1 gives
which can also be written as
Table I shows the coefficients of the exact and the relinearized time evolution of perturbations at converge of the algorithm.
Table D.I. Coefficients for the exact (D.3) and the relinearized (D.6) time evolution of perturbations.
Exact time evolution
Relinearized time evolution
The normal TL model has non-zero values only in the first column. Therefore we see that the relinearized model takes into account all linear terms but also all quadratic terms in the perturbation . For terms higher than quadratic in , the relinearized model multiplies the exact coefficient with a factor k21−k. This is a number between 0 and 1, and therefore is always closer to the exact coefficient than setting the coefficient to zero, as is done in the standard TL model. We therefore conclude that the relinearization iteration will always give better approximations than the standard TL model at convergence of the algorithm.
We have introduced a minus sign in the term so that potential energy is increasing with increasing height.