Breaking/saturating gravity waves (GWs) not only exert drag on the mean flow due to their momentum deposition but also affect the background thermally because of the associated energy flux divergence. We present a rigorous derivation of terms describing the thermal effects of GWs on the mean flow, based on the corresponding energy cycle for wave/mean flow interactions. The combined effect of saturating GWs is to produce both differential heating and cooling by inducing a downward wave heat flux, and an irreversible conversion of wave energy into heat. The former effect can also be represented as thermal diffusion acting on the mean potential temperature gradient. This rigorous theory for the thermal exchange between waves and the mean flow can be closed once the mechanism for GW dissipation is parameterized. To illustrate the procedure, we employ our recent nonlinear theory of GW spectra to derive expressions for the wave-induced heating rates. This yields a parameterization of the thermal effects of GWs, which is suitable for use in general circulation models and requires the source GW spectrum as the only tunable parameter. We present results of numerical calculations of wave heating terms for typical wind and temperature profiles as well as simulations with the full-scale Canadian Middle Atmosphere Model (CMAM).
 The role of vertically propagating gravity waves (GW) in maintaining the circulation of the middle atmosphere is now widely recognized. In recent years most studies have primarily been concerned with estimates of momentum deposition, or the so-called gravity wave drag (GWD) exerted by dissipating and/or breaking GW. Relatively few studies have rigorously addressed the effect of the associated divergence of wave energy fluxes on the thermal structure of the mean circulation.
 Waves may affect the thermal structure of the atmosphere through several mechanisms. One such process is associated with the divergence of the wave's vertical flux of sensible heat, which arises because damping associated with GW breaking and/or saturation alters the phase relationship between fluctuations of temperature and vertical velocity. Explicit expressions for the corresponding heating rates have been derived by Walterscheid  for molecular diffusion, and approximately by Weinstock  and Gardner  for diffusion associated with nonlinear interactions between GW harmonics. They all found that typical mesospheric magnitudes of downward wave heat fluxes, , are several K m s−1, and that the corresponding flux divergence produces strong cooling rates up to several tens of K day−1 in the mesosphere. Although these numbers are difficult to reconcile with the much lower estimates of the monthly mean heat budget of the mesosphere [Schoeberl et al., 1983], recent measurements of wave heat fluxes and their gradients with a wind-temperature lidar [Gardner and Yang, 1998] confirm the larger magnitudes estimated for individual profiles. It is worth mentioning here that the cited papers emphasized a wave-induced cooling of the mesosphere produced by this mechanism. However, the divergence of implies this cooling must be balanced by heating somewhere in the vertical column, because the fluxes themselves vanish at the lower and upper atmospheric boundaries, where waves are either assumed to propagate conservatively or to be absent. Measurements by Gardner and Yang  revealed both heating and cooling due to this mechanism. To our knowledge, none of the other existing practical GW parameterizations used in middle atmosphere models includes differential heating due to GW.
 Another mechanism through which GWs may affect the thermal structure of the mean flow is the turbulent diffusion of heat. Some aspects of the early development of the concept of induced turbulent diffusion and heat transfer have been reviewed by Ebel . The turbulence appears as stochastic wind fluctuations associated with breaking gravity waves. Early models [Hodges, 1967; Lindzen,1981] postulated that the damping rate associated with this turbulence was precisely that needed to prevent the exponential growth of wave amplitude with height, i.e., to induce saturation of the GW harmonic. The turbulence associated with wave damping also affects the background wind (momentum deposition) and temperature (irreversible conversion of kinetic to internal energy). Thus both the drag and the heating depend strongly on the diffusion parameters. Earlier research [Strobel et al., 1985; Garcia and Solomon, 1985] implied that this wave-induced turbulence equally affects disturbances and the mean flow, and that the diffusive coefficients which appear in the respective equations for the mean and fluctuating quantities are equal. In other words, the diffusion coefficient DGW(u) which relates diffusive fluxes of GW momentum with gradients of the GW wind field, FGW = ρDGW(u) ∇u′, was considered to be equal to (u), the latter relating diffusive fluxes to gradients of the mean wind, = ρ(u)∇. A similar assumption was invoked for the potential temperature diffusion coefficients, namely DGW(θ) = (θ). While this hypothesis may represent a reasonable first step, it is clearly questionable on physical grounds, since the effects of a given turbulent field on motions of disparate scales are generally different. This latter property is recognized in theories of scale-dependent diffusion [Weinstock, 1990; Gardner, 1994; Medvedev and Klaassen, 1995], in which the corresponding coefficients associated with saturating or breaking GWs are different for each harmonic in the broad wave spectrum. In line with these arguments, the assumption (u,θ) = DGW(u,θ) has little physical justification.
 A related problem in the parameterization of cooling rates due to eddy thermal conduction is the uncertainty regarding the eddy or turbulent Prandtl number, Pr = (u)/(θ). The assumption of Pr = 1, in conjunction with uniform Hodges-Lindzen values of DGW required to offset exponential wave growth and the implicit assumption that ≡ DGW, has yielded excessively large values of associated diffusive cooling in the mesosphere [Schoeberl et al., 1983; Apruzese et al., 1984]. Chao and Schoeberl , Fritts and Dunkerton , and Coy and Fritts  suggested that turbulence localized to the convectively unstable region of the GW may increase the turbulent Pr, leading to a reduction of eddy heat conductivity. This approach is physically justifiable for the case of a single breaking wave, and has allowed the derivation of the dependence of the eddy Prandtl number on GW parameters.
 For the case of a broad nonlinear spectrum, the derivation of the dependence of the eddy Prandtl number on GW parameters is considerably more difficult, and to our knowledge has not yet been attempted: nonlinear interactions create localized regions of instability which are spread throughout the wave field, rather than being confined to a particular wave phase. Thus for practical applications in general circulation models, the turbulent Prandtl number is usually treated as a tunable parameter. In the absence of definitive measurements or calculations of the turbulent Prandtl number for a strongly interacting wave spectrum, we shall assume its value to be unity. This assumption is consistent with the widespread instabilities expected in a broad spectrum, and we shall demonstrate that this value produces reasonable heating rates when used in conjunction with our gravity wave drag parameterization.
 The third effect of breaking gravity waves on the mean flow is the irreversible conversion of eddy kinetic energy into heat. The corresponding heating rate, εGW ≡ (∂/∂t)GW, can be related to the vertical component of the diffusion coefficient by cpεGW ≈ [DGW(u)]ZZ(∂u′/∂z)2, where the capital subscript “ZZ” indicates the vertical diagonal component of the diffusion tensor [see Ebel, 1984]. This formula has been used to estimate GW-induced heating in combination with various approaches for defining [DGW(u)]ZZ. For example, Schoeberl et al.  and Gavrilov  (e.g., the latter's equation (24b)) employed Hodges-Lindzen-style approximations for the diffusion coefficient, while Gavrilov and Yudin  utilized a semiempirical turbulence closure hypothesis for this purpose. A somewhat different approach was taken by Hines  who invoked scaling arguments from the theory of homogeneous turbulence to relate ε to [(u)]ZZ. It should be noted that Hines  does not distinguish between [DGW(u)]ZZ and [(u)]ZZ.
 Several studies have bypassed the direct calculation of turbulent diffusion coefficients by applying linear wave considerations to estimate the dissipated kinetic energy. Such estimates are based on the linearized relation between vertical fluxes of energy, Fe, and momentum, Fm, for steady conservative waves, namely Fe = cFm [Jones, 1971]. Here c is the observed horizontal phase velocity of the wave, and the lower case subscripts e and m denote energy and momentum, respectively. Assuming that the mean vertical velocity = 0 (so there is no energy flux by mean motions), the equation for the conservation of the horizontally averaged total energy, (z, t) + (z, t), can be written as
where we have further subdivided the kinetic () and potential () energies into mean (M) and wave/eddy (E) components. Invoking the steady wave approximation ∂(KE + PE)/∂t = 0, together with Fe = cFm, this becomes
Here we have explicitly recognized that ρ0a = − ∂Fm/∂z represents the wave-induced horizontal drag acting on the flow. The following relation for the mean kinetic energy tendency
can be established independently from the mean momentum equation. It has been argued that the difference between these last two equations, ρ0 (c − ) a (which is positive because a and c − have the same sign), represents ∂PM/∂t, or the rate at which wave energy is converted to heat. Arguments of this kind have been used by Lindzen , Gavrilov , and Hines . This approach is utilized to infer the diffusion coefficients either from polarization relations for GW [e.g., Gavrilov and Roble, 1994], or by employing scaling arguments from turbulence theory [e.g., Hines, 1997]. However, it must be noted that the relation between Fe and Fm for nonlinear, dissipative waves is more complex (see (38)), and that arguments based on linear wave considerations should used with appropriate caution.
 The purpose of this paper is to consider, in a self-consistent analysis, the energy conversion between gravity waves and the mean flow, and to derive rigorous expressions describing the thermal effects of GWs on the background fields by considering the full nonlinear energy budget. We suggest a means of representing these interactions for saturating waves, which is sufficiently general to be adapted for use with many GWD parameterizations and general circulation models (GCMs). We also provide examples and results using our own GWD scheme [Medvedev and Klaassen, 1995, 2000] (hereafter MK95 and MK00, respectively), in order to demonstrate typical patterns of heating/cooling rates associated with GW dissipation. A detailed investigation of the effects of GW heating on the middle atmosphere climate is beyond the scope of this paper.
 The paper is organized as follows. In section 2 we present a budget for the total spatially averaged energy of the flow, and derive general expressions for the evolution of the GW component and the horizontally averaged background, subject only to the hydrostatic and nonrotating approximations. In section 3 we consider the conversion of energy between waves and the mean flow (i.e., the “energy cycle”). The steady-wave approximation is introduced in section 4. A suitably general closure for the heating/cooling rates associated with breaking/saturated GWs is proposed in section 5. In section 6 we apply the MK spectral GWD parameterization to the general formulae obtained in the previous section to calculate wave heating rates for typical profiles of mean wind and temperature in a column model of GWD. In section 7 we present estimates of temperature tendencies due to GWs using simulations with the Canadian Middle Atmosphere Model. Conclusions are given in section 8.
2. Basic Equations
 In this section we consider the effects of subgrid-scale gravity waves (GW) on the mean flow, by deriving a self-consistent set of equations governing the exchange of energy between the wave and background components. As is common with contemporary theories of GW/mean flow interactions, we take as our starting point the nonlinear, nonrotating, hydrostatic momentum, continuity, and thermodynamic equations in log-pressure coordinates [cf. Andrews et al., 1987, sections 3.1.1 and 3.1.2; Plumb, 1983].
In (2a)–(2e), v = (u, v) and w are the horizontal and vertical velocities, respectively, Φ is the geopotential, T is the temperature, R is the gas constant, H = RTs/g is the scale height, Ts being the constant reference temperature, g is the acceleration of gravity, d/dt ≡ ∂/∂t + u∂/∂x + v∂/∂y + w∂/∂z is the material derivative. The vertical coordinate z = −H ln(p/ps) where ps is the reference pressure and p is the local pressure. The forcing terms on the right-hand sides of (2a)–(2e), namely L = (LX, LY, 0) and Q, respectively denote the dissipative momentum forces and diabatic heating rate. Temperature T is related to the potential temperature θ by means of
where κ = R/cp, cp is the specific heat at constant pressure. The background density profile ρ0 is defined as ρ0 = ρs exp(−z/H), where ρs is the density at the reference height z = 0.
 The substantive difference from the equations of Plumb  is that frictional heating, represented by −v · L/cp, has been included in the thermodynamic equation (2e). This term represents the irreversible conversion of kinetic energy into internal energy or heat, and its exact form for the case of Newtonian viscous forces is given by Batchelor [1967, (3.4.5)]. (Incidentally, temperature changes resulting from the mechanical stirring of fluids were used by [Joule, 1850] to establish the concept of heat as a form of energy.) While the term Q describes diabatic energy gains or losses by the volume as a whole, dissipative forces associated with the term L merely convert kinetic energy into heat within the fluid volume. Without the addition of this frictional heating term to the right-hand part of (2e), an unbalanced sink would appear on the right side of the total energy equations (6) and (7). (See Batchelor [1967, section 3.4, p. 151] and Chandrasekhar [1981, section 7, pp. 13–16] for further discussions.) Although frictional heating is often omitted in large-scale studies, it is not negligible in the present application, and its parameterization must be linked in a self-consistent way to that of the eddy stress divergence terms which represent GW damping and mean-flow acceleration. Note that the term represented by −v · L/cp in the thermodynamic equation must be entirely associated with the irreversible conversion of kinetic energy into heat. The form of this term means that the frictional forcing L included in the momentum equation does not have a diffusive character, but rather represents a localized deformation force. There are dissipative forces in fluids which contribute to the transport of mechanical energy as well, e.g., molecular or turbulent diffusion. The absence of diffusion in L does not imply any loss of generality, since turbulent transport correlations will appear from the nonlinear terms in due course. Further discussion may be found in section 3.
 To obtain the kinetic energy (KE) equation, we multiply (2a) by ρ0u, (2b) by ρ0v, (2d) by ρ0w, add them together and use (2c). Introducing the vector notation v ≡ (u, v, w) and ∇ ≡ (∂/∂x, ∂/∂y, ∂/∂z), this operation yields
In (3) we have defined the kinetic energy per unit volume, K, in the usual way:
The specific enthalpy, cpT = cvT + RT, is the local measure of total potential energy (internal plus gravitational), P ≡ ρ0cpT, in the hydrostatic log-pressure coordinate system [Plumb, 1983]. The equation for potential energy (PE) is obtained from (2e) by multiplying it by ρ0cp with the use of (2c)
Adding (3) and (5), we obtain the equation for total energy
where (K + P + ρ0Φ)v represents the total energy flux vector. As seen from (6), the net energy change in an infinitesimal volume is equal to the work done by body forces associated with the flux term in the left-hand part, the internal energy change due to diabatic heating/cooling. The work done by dissipative forces and the adiabatic cooling/heating due to vertical motions represent exchanges between the kinetic and potential reservoirs, and consequently neither term appears in (6). The local energy transfers associated with the system (3), (5) and (6) are illustrated in the box diagram of Figure 1.
 Integrating (6) over x, y, and z and requiring the fluxes at the bounding surfaces of the fluid volume to be either zero or periodic (inflow matches outflow), we obtain the net energy balance over the volume
where the angle brackets denote averaging over the coordinates corresponding to the subscripts. The averaging operator introduced above can be applied to either a global spherical shell or to a vertical column based on a grid element of a model. The latter approach is the most relevant to the specific interest of this paper.
 In order to examine the energy transferred through a horizontal surface, we shall separate all field variables in (2a)–(2e) into horizontally averaged “mean” quantities,
In the context of the present paper, “mean” quantities can be viewed as being averaged over a horizontal grid element of a large-scale numerical model. Then the deviations can be associated with subgrid-scale gravity waves. We shall also assume that the net horizontal fluxes into or out of the grid elements are negligible. (The analysis however is completely general in the global context, e.g., when applied to a spherical atmospheric surface.) Expanding all variables in (2a)–(2e) in this fashion and applying the horizontal averaging operator, we obtain for the mean motions
It is seen that (8a)–(8c) describes the forcing of the mean circulation by waves through the eddy terms in the right-hand parts of the mean momentum and thermodynamic equations. In the derivation of (8a)–(8c) we have assumed the condition ≡ 0 which follows from the horizontally averaged continuity equation together with the assumption that there is no net mass flux through the lower or upper boundaries of the column.
 Governing equations for the eddy component can be obtained by subtracting the “mean” equations (8a)–(8c) from (2a)–(2e). For the GW case, these equations can be simplified further if we assume that the waves are propagating in a vertical plane aligned along the direction of the mean wind, i.e., = (, 0, 0). In this situation, (8a)–(8c) reduces to:
(Note that the overbar now represents an average over x only, and L has replaced LX.) Correspondingly, for the wave variables we obtain:
is the buoyancy frequency squared. For convenience we have introduced lower case subscripts to denote partial derivatives in (10a)–(10e). The following quadratic terms in the right-hand parts of (10a)–(10e) describe the effects of nonlinear interactions between waves
where for compactness, we have used the perturbation potential temperature, θ′, defined in accord with (2f). Note that the lower-case subscripts x and z in (10a)–()(11b) indicate partial derivatives rather than vector components. If we put J′ = L′ = M′ = Q′ = 0, (10a)–(10e) represents the set of wave equations frequently employed in gravity wave drag schemes. In order to close this system of equations M′ and J′ must be specified, e.g., by means of a GW parameterization.
3. Energy Cycle
 In this section we shall consider the energy exchange between waves and the mean component of the flow. This energy cycle has been considered by many authors, and we shall follow the general route adopted by Holton  and Plumb . However, in contrast to the cited works, we shall place particular emphasis on the dissipative and nonlinear interaction terms, as well as vertical transfers. In this regard, we note that neither Holton  nor Plumb [1983, equation (3.8f)] considered nonlinear terms, nor did they include the term v · L in the thermodynamic equation. While this latter exchange term is generally ignored for large-scale flows, it is nevertheless a key component of the wave-breaking process which ultimately converts kinetic to potential energy and must be suitably coupled to the nonlinear conversion terms responsible for the cascade of energy to small scales.
 To obtain energy equations corresponding to (9a)–()()()()(10e), we must first separate the kinetic and potential energies into mean (“M”) and eddy (“E”) components. For kinetic energy, we have
where is the horizontally averaged kinetic energy. Similarly to (12), we define separation for the horizontally averaged potential energy, :
Since the potential energy associated with waves is always available for adiabatic conversion into kinetic energy, we define PE using the concept of available potential energy [cf. Holton, 1975; Plumb, 1983]:
 Proceeding in the usual way, we multiply (10a) by ρ0u′, (10c) by ρ0w′, add the results, and after averaging over x with the help of (10b) obtain
Note that the triple correlation terms involving M′ and L′ which appear in equation (15) were neglected in Plumb's analysis, under the assumption that perturbations were small. In section 5 we will present a parameterization or closure for these terms. We have rearranged equation (15) so that the terms involving the Reynolds stresses represent a flux divergence source of eddy kinetic energy and a transfer from KE to KM. Although this rearrangement can be viewed as somewhat arbitrary, it has the intuitive advantage that eddy terms are not associated with the mean flux of kinetic energy [Plumb, 1983]. In the present application, the mean vertical motion is assumed to vanish, so that there is no vertical flux of mean kinetic energy.
 The rate of change of eddy potential energy, PE, can be obtained by multiplying (10d) by ρ0R2T′/H2N2, neglecting temporal variations in the background stratification, ∂N2/∂t (since the timescale of waves is assumed to be much smaller than that of the mean motions), and averaging over x:
Multiplication of (9a) by ρ0 yields for the mean kinetic energy
Multiplying (9b) by ρ0cp we obtain the rate of change of , and use (13) and (16) to derive for the mean potential energy:
There are two terms appearing in (15)–(18) which depend on the speed of the frame of reference through the factor . Thus while the mean kinetic and mean potential energy tendencies are frame-dependent, their sum (the total mean energy tendency) remains invariant under Galilean transformations.
 Summation of (15)–(18) gives the equation for the net energy balance of the system
Then the left-hand term represents the rate of change of the total energy of a horizontal slab of infinitesimal vertical thickness dz. The first term in the right-hand side of (19) is associated with the net work done by stresses on the right-hand sides of (9a) and (9b), i.e., it represents the difference between the influx and outflux of energy through the upper and lower boundaries of the slab. The term ρ0 defines diabatic losses/gains by the volume. The last term in (19) describes the rate of work done by the stresses in the right-hand part of (10a). It can be shown (see Appendix A) that this term also has flux form:
This means that the term , which describes nonlinear interactions between waves, contributes to the net energy flux. Being advective in nature, these nonlinear interactions cannot change the total energy of an isolated system, i.e., they are associated with forces acting on the surface of a fluid element. Furthermore, they do not directly convert mechanical wave energy into heat. During wave breaking or saturation, the energy cascade caused by nonlinear interactions transfers the energy to smaller scales, where the dissipative force L eventually converts this mechanical energy into internal energy, or heat. Thus, if we are to parameterize the conversion of kinetic energy into heat, the nonlinear flux divergence must be linked to frictional heating, by means of a subgrid-scale closure, the development of which will be described in section 5.
 As we mentioned in section 2, some dissipative forces result not only in frictional heating, but contribute to the transport of mechanical energy as well. For example, if the dissipative force were where ν is a coefficient representing either eddy or molecular diffusion, then the mechanical energy equation would contain terms of the form
The second term on the RHS is a deformation which produces truly irreversible heating, as represented by our . In contrast, the first term on the RHS represents the divergence of a vertical transport of wave kinetic energy due to diffusion, has flux form, and therefore does not directly contribute to the conversion of mechanical wave energy into heat. It is always possible to add the conservative contribution of a dissipative force (e.g., the first term on the RHS) to thus reserving the notation for irreversible dissipative heating. In the text to follow, we assume that the correlation in (20) may include such conservative contributions from diffusion, in addition to the conservative contribution of nonlinear wave interactions.
Equations (15)–(18) represent the budget of horizontally averaged energy in a slab of infinitesimal vertical thickness dz. For simplicity, this cycle has been illustrated in Figure 2 for the case in which both the mean vertical motion and the mean viscous forces can be neglected. Including these effects would provide a conversion path from KM to PM [Plumb, 1983]. The differences between the incoming and outgoing wave fluxes through the thin horizontal slab represent sources and/or sinks of eddy kinetic and mean potential energy. It is through these flux divergences, the conversion terms corresponding to dissipation (CD) and the vertical heat flux (CH), that eddies may alter the total enthalpy, = PM + PE, and hence the large-scale temperature structure.
 Integrating (19) over z, using (20), and assuming that the atmosphere is insulated, i.e., assuming that fluxes across the lower and upper boundaries vanish, we obtain the energy conservation equation (cf. (7))
In particular, the total energy of the volume is conserved in the absence of diabatic heating/cooling ( = 0) if w′ = 0 at z = 0 and at z → ∞ (i.e., no wave energy is being generated at infinity).
4. Steady Waves
 In this section we introduce the assumption that gravity waves are steady, i.e., their amplitudes are independent of time, and that perturbation fields have the form u′(x, z, t) = û(z) f (x − ct). We shall also assume that the timescale of the mean motion is large compared to GW periods. For an individual wave component propagating in the x–z plane, these assumptions allow us to replace ∂/∂t in (10a) and (10d) by −c∂/∂x, c being the horizontal phase velocity of a specific GW. This approximation, which is common to all gravity wave drag (GWD) parameterizations we are aware of, gives
In the above equations the terms M′ and J′ denote conservative nonlinear forces, while L′ and Q′ are respectively associated with the dissipation of wave momentum, thermal conduction and radiative damping.
 Adding (22) multiplied by u′, to (10c) by w′, averaging over x, and invoking the relation (valid for any ψ(x)), we obtain
Comparison with (15) shows that this relationship forces ∂KE/∂t = 0, as expected for waves with steady amplitude at a fixed height. Then we multiply (23) by T′ and average the result over x to obtain
As (25) shows, sensible heat flux may be induced by the forcing terms in the right-hand part of the wave equations (10a)–(10e) by altering the phase shift between w′ and T′. As expected, the relation (25) between the conversions forces ∂PE/∂t = 0 in (16).
 By adding (17) to (18) and using (25), we obtain (for steady waves) an alternative to the form (19)
The above equation describes the rate of change of the total mean energy due to the vertical divergence of eddy stresses (associated with GW wave damping) and sensible heat flux, as well as frictional and diabatic heating. The energy transformations for steady waves can be visualized using the diagram in Figure 2. The steady character of (KE + PE) means that the influx of “eddy” energy at a given height due to divergence of wave fluxes must be compensated by the conversion of (KE + PE) into the energy of the “mean” flow, i.e., (KM + PM). Thus the wave flux divergence provides a potential source/sink for the KE and PM reservoirs. The flux divergence entering the KE reservoir is partially passed to the PM reservoir, either through the PE reservoir, or directly through the frictional heating term. Note that the steadiness of the wave amplitude in the presence of dissipation or other damping mechanisms implies the existence of permanent wave sources which maintain an equilibrium between the influx and sink of wave energy. As (26) shows, the total mean energy (KM + PM) at a given height can change even though the wave energy remains constant. Furthermore, as we shall see, the fact that ∂ PE/∂ t = 0 means that ∂PM/∂t is directly related to the mean temperature tendency.
The expression in square brackets (which is operated on by the vertical divergence) together with the nonlinear flux represented by [see (20)], yields the total wave energy flux, Fe. The form (27) emphasizes the fact that the flux divergences provide the source of the energy conversions (26). For linearized, conservative, adiabatic waves, , and , so that the energy and momentum fluxes are related by Fe = cFm (see section 5.1), and (27) reduces to (1).
5. A Closure for Heating/Cooling Rates
 The purpose of this section is to specify a general closure for gravity wave stresses which may be adapted to a variety of different parameterization schemes. The divergence of the Reynolds' stresses appearing in (17) may be identified with the drag produced by the (as yet unspecified) gravity wave parameterization. The vertical heat flux terms are specified by (25). We shall assume that there is no heating or cooling (e.g., radiative damping) which is “external” to the system, and neglect the effects of thermal conduction on the mean temperature, so that = 0. We shall also neglect the terms involving in (9a) and (9b), which include the diffusive effect of dissipating gravity waves on the mean wind.
 The right-hand part of (9b) describes the thermal effect of subgrid-scale GW on the mean flow, which involves eddy stresses determined by the wave system (10a)–(10e). These wave equations contain nonlinear terms M′ and J′ which must be parameterized in order to close the system. The terms J′ and M′ in (22) and (23) represent conservative forces while the terms L′ and Q′ are dissipative. The nonlinear advective forces represented by M′ are associated with instabilities which cascade energy to smaller scales where it is dissipated into heat by the terms involving L′. It is therefore natural to close the system by parameterizing the M′ and L′ terms together, a step which is facilitated by their simultaneous appearance on the right side of (10a). This can be done conveniently within the framework of the steady wave approximation of section 4. A similar argument can be made for the simultaneous parameterization of J′ and Q′, since the former represents nonlinear heat fluxes which cascade temperature fluctuations to smaller scales where they are diffused by molecular conduction, Q′.
 The saturation or breaking of a particular wave component can always be represented by an amplitude damping rate, or the corresponding diffusion coefficient, either of which may be a function of the wave amplitude, or may incorporate nonlinear interactions with other components of the wave spectrum. A number of existing GWD parameterizations explicitly calculate these quantities [Lindzen, 1981; Weinstock, 1990; Gardner, 1994; MK95]. Another effect of wave nonlinearity is to produce a nonlinear frequency shift for each harmonic in the spectrum [Weinstock and Hyde, 1976; MK95]. (Note that Hines  has proposed a rather different view of nonlinear frequency shifting in his Doppler spread parameterization.) In order to account for both effects, we shall introduce the wave damping coefficient dR and phase velocity change Δc, so that the nonlinear interaction terms in (22) and (23) are replaced by
Although this system is written explicitly for a single harmonic, the thermal effects must of course be summed over all components of the wave spectrum. For simplicity, the component index remains implicit. Here the new variables , and represent the corresponding parameterized terms. dI = kΔc is the nonlinear frequency shift, k is the horizontal wave number, and the harmonic is assumed to have horizontal and temporal dependence ∝ exp[ik(x − ct)], so that the transformations ∂/∂x = ik and ∂/∂t = −c∂/∂x are valid. In what follows, and will be directly associated with a conservative frequency shift, while and will be associated with wave damping. The subgrid-scale closure (28)–(29) represents an extended form of Rayleigh drag and Newtonian cooling which incorporates the possibility of frequency shifts, and provides a generic representation of wave saturation and breaking. Such a closure should be valid as long as the eddies associated with saturating/breaking waves remain somewhat smaller than the GCM's vertical grid interval (typically 2–3 km), and the mixing timescale is shorter than the model timestep (typically 0.25 to 0.5 hours).
 It is easy to verify by substituting (28) and (29) into (22) and (23), that the amplitude damping and frequency shift are equivalent to adding a complex term of the form −id = −i(dR + idI) to the wave frequency ω = ck. Since we have not specified dR and dI, (28) and (29) still allow a relatively general representation of the exact nonlinear stresses, with the single assumption that they may be represented by terms proportional to the wave fields. This representation is compatible with many existing GWD parameterizations and may be either linear or nonlinear depending on the chosen form of dR and dI. Introduction of these terms modifies the “linear” dispersion relation associated with the system (10a)–(10e), approximately given by mR2 + 1/4H2 = N/(c − )2. At this point, it is convenient to introduce a further approximation used in many current GWD parameterizations, namely mRH ≫ 1 (incompressiblity), for which the dispersion relation reduces to midfrequency form, mR2 = N2/(c − )2. The result of these combined modifications is a generalized dispersion relation of the form
where mR is the real part of the vertical wave number m = mR − imI. For further details, see the derivation of (11a) and (11b) by MK95.
 First we consider the implications of the assumed closure (28) for the triple correlations appearing in (15). Multiplying (28) by u′ and averaging the result over x, we obtain
where we have used the fact that . This new expression cannot be written in flux form. It now represents work done by volume forces, and is always negative. As seen from (15), this term describes the (dissipative) transformation of wave kinetic energy into heat. Equation (31) demonstrates that the dissipation introduced by the closure (28) is associated with the term containing dR which represents the dissipative force due to GW saturation/breaking. On the other hand, the portion of the parameterization (28) associated with the term containing Δc is not dissipative, and represents a parameterization of the nonlinear cascade of energy in the form of a spectrum-induced Doppler shift. It is convenient for us to identity and to represent the dissipative and conservative forces, respectively. Similarly, we can denote and . Note that dR is the same in (28) and (29), which effectively imposes a turbulent Prandtl number of unity. Identical frequency shifts Δc are also assumed for the temperature and velocity fields in order to maintain consistency with the polarization relations for steady inviscid GW.
 The next step is to derive an expression for the sensible heat flux . Substituting (29) into (25), multiplying the result by T′, averaging in the horizontal, and neglecting the triple correlation compared to the second order , we obtain
 We next need to obtain a relation between and . With the help of the replacement ∂/∂x = ik and (29) we can rewrite (23) in the form
Taking the squared modulus of (33) and utilizing the identity , where a is a particular wave field amplitude and the asterisk denotes a complex conjugate, we obtain
Substituting (34) into (32), and using (30) and the relation (which follows from the continuity equation (10b) and mRH ≫ 1) yields
As seen from (35), the sensible heat flux arises as a result of a phase shift between w′ and T′ due to the dissipative forces associated with the nonzero dR. This induced flux is always directed downward, since the right-hand part of (35) is negative. Similar expressions for , but in terms of diffusivity, have been obtained by Walterscheid , Weinstock , and Liu . Also note that (35) is now expressed in terms of the RMS horizontal velocity variance, a quantity that is readily computed for many existing GW parameterizations, and for which observational data are available.
 Substitution of (31) and (35) into (9b) yields a practical formula for the net thermal effect of GWs on the background temperature, represented in terms of the wave amplitude and damping rate:
The second term in the right-hand part of (36) is always positive. It describes irreversible heating of the mean flow by dissipating GWs. The first term in the right-hand part of (36) has flux-gradient form, and therefore describes a differential heating associated with variations of the induced wave heat flux. This term can have either sign, and integrated over the column from z = 0 to ∞ has the net value zero provided that there is no downward wave flux at z = ∞. This term represents a redistribution of internal energy in the vertical.
 Another convenient form of (36) which relates ϵGW with the acceleration of the mean flow induced by saturating gravity waves, ah, can be written with the help of the expression (derived by MK95 and given by their formulae (15) and (28))
Wave heating in a form similar to the second term on the right-hand side of (36′) has been derived by Gavrilov , and introduced rather heuristically by Hines . Note, that their formulae did not include the phase shift Δc induced by nonlinear interactions between GW, nor did they obtain the first flux divergence term in (36′).
 In deriving (36′), we did not have to specify a particular form for dR and Δ c. Therefore, the results represented by (36) and (36′) have a general character. The key assumption made in the course of the derivation pertains to the replacements (28) and (29). This introduces a generalized form of Rayleigh drag and Newtonian heating, which is consistent with and adaptable to many gravity wave drag parameterizations. The generalized coefficients dR and dI = kΔc contain the physics of the wave dissipation mechanism, be it linear or nonlinear. Practically speaking, the form (36′) can be applied with any GW drag scheme which computes ah and keeps track of the harmonic's phase velocity c.
 One more potentially important form of (36) can be written with the help of (35):
A similar expression for εGW but without factor of two in the second term has been obtained by Schoeberl et al. [1983, p. 5251]. This discrepancy can be traced to their neglect of the term in (9b), i.e., the heating term due to kinetic energy dissipation. Finally we note that since the sensible heat flux, , can be measured in the atmosphere [Gardner and Yang, 1998], (36″) represents a convenient form for estimating the thermal effects of GW from observations.
5.1. and Wave Energy Balance
 Our goal now is to demonstrate that the parameterization of heating/cooling rates based on the closure assumption (28) and (29) for M′ and L′ does not violate the energy conservation principle. We first return to the basic wave equations (22)–(23) of section 4 in order to formulate an alternative version of (26) and (27). Introducing the antiderivatives and such that ∂/∂x = M′ and ∂/∂x = L′, multiplying (22) by [u′( − c) + Φ′ −( + )], and averaging over x using the relation for any variable ψ, we obtain
The above equation describes the rate of change of the total mean energy due to the vertical divergence of eddy stresses (associated with GW wave motions) and diabatic heating. The expression in the square brackets (which is operated on by the vertical divergence) together with the flux contained within the term [recall (20)], yields the total wave energy flux, Fe. As seen from (38), if nonlinear, dissipative and diabatic effects are neglected (M′ = J′ = L′ = Q′ = 0), Fe takes the well-known “linearized”, inviscid, adiabatic form which relates Fe to the wave momentum flux Fm, i.e., .
Equation (38) states the energy balance equation in the most general form, since it contains no explicit parameterization for M′ and L′. We proceed by invoking the parameterization (28), and introducing the antiderivative defined by u′ = ∂/∂x:
Multiplying the result by w′ and averaging in the horizontal yields
The first relation in (39) is the component of perturbation kinetic energy flux due to the diffusivity associated with dR. It can be shown (see Appendix B) that this term can be neglected in (38) compared to the sensible heat flux, , induced by the same diffusivity. Thus, (38) can be written
where we have used (39), the relation , and assumed = 0 for the diabatic heating. Equation (40) relates the rate of mean energy change to wave energy flux divergence, and may be directly compared with the relation (1) for linear conservative waves, for which Δc = 0 and One can verify that the total energy of the closed system described by (40) is conserved by integration of this equation over the height with the proper boundary conditions. Note, that the Doppler shift Δc appears in (40) due to the parameterization of nonlinear conservative forces, M′.
 In the previous section we derived heating rates (36)–(36″) from (9b). Now we shall demonstrate that these expressions can also be obtained from (40) as a difference between the rate of change of the total wave energy and that of the kinetic energy of the mean flow. Since ∂PE/∂t = 0 for steady waves, it follows from (13) that ∂PM/∂t = ∂/∂t = ρ0cp∂/∂t. Now we substitute ∂KM/∂t from (17) into (40), neglect all the diffusive forces except those introduced by the parameterization of nonlinear breaking waves, put = 0, use the definition of the wave drag from (9a), , and divide the result by cpρ0 to obtain
The first term in the right-hand part of (41) coincides with the first term in the right-hand part of (36″). The second term in the equation above coincides with the second term in the right-hand part of (36′), the latter being in turn equal to the corresponding term in (36″). Therefore, the heating rate given by (41) coincides with the expression derived in the previous section.
 It should be emphasized here that the heating rate εGW cannot be derived directly from the “full” energy equation (26) unless the parameterization for the nonlinear conservative and diffusive terms, M′ and L′, is provided in the relatively general form (28)–(29). (The specific form of dR and dI is not required for the derivation.) Therefore, εGW cannot be obtained directly from fully general considerations based on a linearized energy budget, e.g., (27) with M′ = 0. Instead, the “physics” of the nonlinear wave interactions must be considered to provide a closure for M′ and L′.
5.2. and Diffusive Heat Conduction
 The wave-induced downward heat flux, , can formally b e represented in a flux-gradient form as a function of a mean potential temperature gradient. Using (35), the relation (as in the derivation of (36′)), the definition of the buoyancy frequency (10e), and the assumption that the mean temperature varies more slowly with height than the density, we may write
where the “thermal diffusion coefficient” ZZ(θ) is given by
Then we multiply (36′) by exp(κz/H), and rewrite the result in terms of ZZ(θ):
The first term in (44) describes a diffusion-like downgradient transport of the mean potential temperature. This process is induced by saturating/breaking GWs, and vanishes when waves propagate freely (when ah = 0). The second term in (44) represents a source of mean potential temperature due to GW dissipation, and is always positive in stably stratified atmospheres with N2 > 0 when waves are damped. (It also vanishes for freely propagating waves.) As follows from multiplying this term by cpexp(−κz/H), the rate of wave energy dissipation per unit mass, , is related to the thermodiffusion coefficient ZZ(θ) by means of . A similar expression for the momentum eddy diffusivity, ZZ(u), but with an undefined scaling factor in place of our coefficient 2, has been obtained using isotropic turbulence arguments [e.g., see Chandra, 1980, (2); McIntyre, 1989, (1)].
 It is important to recognize that the wave vertical momentum flux, , does not have flux-gradient form, and should not be represented as a function of mean wind gradient, /dz. (Wave breaking and drag may still occur when /dz = 0, in which case the flux-gradient form would imply an infinite diffusion coefficient, ZZ(u)). Therefore, the effect of GW on the mean wind given by ρ0ah in (9a) cannot be written in the form of a first-order closure with a mean momentum diffusion coefficient. Thus, strictly speaking, there is no counterpart of ZZ(θ) for momentum diffusion, nor should (43) be regarded as equivalent to a conventional momentum diffusion coefficient. However, it is useful to compare ZZ(θ) with diffusion coefficients deduced from observations and/or obtained in theoretical estimates. It is worth emphasizing here that the diffusion acting on a wave and described by the corresponding coefficient DZZ′ = mR−2dR must be distinguished from the diffusion affecting the background fields, i.e., ZZ(θ) and ZZ(u).
6. Application of the MK Parameterization: Calculations for Representative Background Profiles
 In this section we apply the MK parameterization to the general formulae obtained in the previous section to calculate heating rates εGW and the thermodiffusion coefficient ZZ(θ) for representative midlatitude wind and temperature profiles. MK95 give explicit formulae for dR and kΔc in terms of wave spectra, as well as the procedure for calculating the evolution of the latter with height. However, here we shall use (36′) (instead of (36)) and (43) to emphasize the possibility of adapting the parameterization of heating rates to other existing GWD schemes.
 An extended physical description and implementation guide for the GWD parameterization scheme based on MK95 is given in MK00. The MK GWD scheme accounts for nonlinear interactions between waves of similar and smaller scale, and has been shown to produce saturation in a broad spectrum of GW [MK95]. The scheme also incorporates wave breaking, which may be induced by critical level interactions with the mean flow. Here we briefly outline the scheme, in which the vertical evolution of each spectral component is described by the equation
where Uj is the amplitude of wind fluctuations for the jth harmonic, is related to the amplitude by the definition , the asterisk denotes complex conjugate quantities, and mj is the real part of the vertical wave number (corresponding to (30)). The nonlinear damping rate for the jth harmonic, which includes interactions with other components in the spectrum, is given by
In (46) the root mean squared (rms) velocity σj and the parameter αj can be calculated from
where j = 1 and j = M are components with the lowest and highest vertical wave numbers in the spectrum, respectively. As follows from (47), only components with higher mj contribute to σj2. As shown by MK00, the parameter α can be regarded as the square root of a spectral Richardson number: as , the damping of a particular wave component becomes large and it is obliterated. This corresponds to wave breaking or overturning, assisted by interactions with shorter-scale waves in the spectrum or, more typically, during a critical level approach. In the absence of critical layers, the MK scheme allows waves to achieve a saturated state for α ∼1.4 to 3.2, at amplitudes below the Lindzen-Hodges overturning threshold.
 Given the source spectrum at some reference height, (45) together with (46) and (47) can be integrated upward for all components in the spectrum. The wave drag associated with jth component is then
The corresponding cooling/heating rate and thermodiffusion coefficient are respectively given by (36′) and (43) with (30) substituted:
The total drag, heating rate, and thermodiffusion coefficient are calculated as sums over all components in the spectrum, e.g., εGW = Σj=1M(εGW)j, etc.
 Since the troposphere is believed to be a region of strong GW generation, it is convenient to specify the source spectrum at some level above the tropopause. We employ a model launch spectrum which approximates the observed amplitude and “universal” shape of power spectral density (PSD) wind fluctuations in the form [Fritts and VanZandt, 1993]
where S is the PSD, m* is the characteristic Desaubies vertical wave number, A0 is the scaling constant, and the parameters t = 3 and s = 1 (i.e., a “Modified Desaubies” spectrum) were chosen. The input spectrum was specified at z = 10 km and approximated by a discretization of (51), namely , where Δmj is the wave number interval. According to the tests presented in MK95 and MK00, M = 15 components are sufficient to accurately represent the range from m1 = 2π/(900 m) to mM = 2π/(19 km), with nonuniform spacing as specified by Medvedev and Klaassen . The latter range corresponds to wave phase speeds c from approximately 2.8 to 60 m s−1. We also assumed an isotropic spectrum with components propagating in both directions, i.e., with phase velocities ±c. All results presented in this section were obtained using characteristic horizontal wave number k = 2π/(300 km), and a Desaubies vertical wave number of m* = 0.006 rad m−1 for the launch spectrum. This source specification is close to the one used in the sensitivity tests of Medvedev and Klaassen .
 Our calculations for one-dimensional profiles employ a vertical gridstep Δz = 2 km, which is typical of the vertical resolution employed in most middle atmosphere general circulation models. We have chosen to present calculations for three mean wind and mean temperature profiles from the CIRA-86 model (50°N, 40°N, and 30°N in January), which are representative of winter solstice conditions. The distribution of zonal mean wind is shown in Figure 3. The net heating rates associated with saturating GWs for the corresponding background distributions are presented in Figure 4. In these calculations, the amplitude of the GW source spectrum A0 = 100 m3 s−2. As seen in the figure, the peaks of wave heating (εGW > 0) occur lower than the peaks of wave cooling (εGW < 0), with cooling generally stronger due to density stratification. The distributions of εGW are very sensitive to the background state and parameters of the source spectrum, with magnitudes varying from several K day−1 to several tens K day−1.
Figure 5 presents the vertical profiles of wave drag and wave heating rates calculated at 50°N for source spectra with A0 = 10, 50, and 100 m3 s−2. This range of A0 covers the range of observed variability of GW spectra near the tropopause. Figure 5a shows that the drag maxima occur at lower altitudes and their magnitude increases as the GW source amplitude is increased. Consequently, the peaks of wave heating and cooling εGW shown in Figure 5b are lower and stronger for stronger wave sources. The magnitude of typical GW-induced heating/cooling rates is ∼±15 K day−1; however, Figure 4 shows that some peaks may considerably larger than that (by a factor of 5 or more).
 The heating rates associated with the irreversible conversion of wave mechanical energy into heat and represented by the second term in (36′) are plotted in Figure 6. The magnitude of wave heating varies from a few K day−1 to more than 20 K day−1, which is comparable to or even exceeds typical radiative heating rates at these heights. Comparison with Figure 4 shows that these magnitudes are much weaker than those of net wave heating/cooling. However, it should be recognized that the effects of diffusive heating (being of positive sign) tend to accumulate, while the vertical redistribution of heat (the first term in (36′), which may have either sign) may be canceled to some degree by the intermittency of εGW profiles.
 Profiles of the thermodiffusion coefficient ZZ(θ) calculated at 50°N, 40°N, and 30°N for January are shown in Figure 7. The magnitudes, vertical distribution, and typical range of variability of the thermal diffusion coefficient are close to those observed by Lübken [1997, Figure 4], and those estimated theoretically by Chandra [1980, Figure 4] for momentum eddy diffusivity, DZZ(u).
7. Calculations With the Canadian Middle Atmosphere Model (CMAM)
 The Canadian Middle Atmosphere Model (CMAM) is based upon the tropospheric climate GCM, with extensions to cover the stratosphere and mesosphere. The CMAM has recently been described by Beagley et al.  and has been used in conjunction with the MK GWD drag parameterization for simulations of the effects of GWD on the middle atmosphere climate [Medvedev et al., 1998], the tropical semiannual oscillation [Medvedev and Klaassen, 2001] and interactive ozone chemistry [de Grandpre et al., 2000]. Briefly, the CMAM is a spectral model which uses the finite element method to solve the equations of motion in the vertical. It uses a hybrid vertical coordinate η, which is terrain following near the surface and reduces to a pressure coordinate throughout the middle atmosphere. The model has 50 levels in the vertical, a 10 minute timestep, and a horizontal spectral resolution of 32 wave numbers using triangular truncation. Its upper lid is at p = 0.000637 hPa with an approximate vertical resolution of 3 km throughout the middle atmosphere. The version (CMAM-5) used in the following experiments differs from the one described by Beagley et al.  and Medvedev et al. , mostly in the tropospheric physics and nonorographic gravity wave drag (GWD) scheme. The implementation of GWD is similar to that reported by Medvedev et al. . The gravity wave source spectrum is a Desaubies spectrum with RMS wind fluctuations of 0.25 m s−1, launched at the 165 mbar pressure level. It consists of 15 harmonics in each of the four cardinal directions (N, E, S, W), and is horizontally uniform (no latitudinal or longitudinal dependencies are invoked). Other source parameters are specified exactly as in the previous section. These source specifications are in accord with what is known about the upper tropospheric gravity wave spectrum.
 A detailed investigation of GW heating climatology and sensitivity experiments is beyond the scope of this paper; rather we provide a few typical cases to establish the magnitudes and patterns of expected heating rates. To this end, we have used the “standard” CMAM model configuration described in the previous paragraph, which is known to produce a reasonably faithful troposphere and middle atmosphere. The model was run for 24 months of simulated time using the fully interactive parameterization of wave heating/cooling rates described in section 6. Comparisons were made with another 24 months model run in exactly the same configuration, but with the parameterization of GW heating/cooling rates switched off.
Figure 8a shows a height-latitude cross-section of mean zonal temperature for July (averaged over two model years) including our interactive parameterization of GW heating. To estimate the effect of inclusion of the GW heating/cooling scheme, we plotted in Figure 8b the temperature difference ΔT between the simulation of Figure 8a and a similar simulation without the parameterized GW heating. It is seen from Figure 8b, that the strongest thermal effect of gravity waves is in the mesosphere. They warm the summer high latitude mesosphere by ≈6K over 60°N. In the winter hemisphere, ΔT has a “dipole” structure. GWs heat the mesosphere below ≈0.1 mbar by almost 10K, and to cool it by up to −8K above. In the tropical region, there is a slight heating (≈2K) of the upper portion of the mesosphere, and a slight cooling (−2K) below ≈p = 0.01 mbar. These patterns in ΔT roughly correspond to the monthly mean zonal temperature tendency due to gravity waves, εGW ≡ (∂T/∂t)GW, shown in Figure 8c. It is seen from the figure that the distribution of ϵGW also has a “dipole” structure in the winter hemisphere, with cooling in the upper mesosphere and heating in the lower mesosphere; breaking gravity waves produce a downward transport of heat in this case. The strongest wave-induced cooling, more than 5 K day−1, occurs at ≈p = 0.02 mbar, centered over the latitude 60°S. A region in which the wave heating is about 2 K day−1 is situated just below. The same “dipole” structure, but with somewhat weaker rates (−3 and +1 K day−1, correspondingly), is seen in the summer hemisphere over 30°N. Consistent with Figure 8b, there is heating of about 4 K day−1 in the mid to high latitudes of the upper mesosphere in summer. Figure 8c demonstrates that averaged GW heating/cooling rates are significantly lower than those produced for each individual wind and temperature profile, as shown in the previous section. This is a result of spatial and temporal intermittency of GW breaking/saturation events.
 In Figure 8d, we plotted the difference between the temperature obtained in the simulation with εGW turned off, and the empirical temperature from CIRA; this difference indicates the deficiencies of the model in reproducing the monthly mean zonal temperature. As seen from the figure, the CMAM produces a significantly colder mesosphere than CIRA. This feature was present also in earlier versions of the model [e.g., Beagley et al., 1997], with orographic-only GWD and orographic plus either the Hines or MK GWD schemes. This suggests that the overall cold mesosphere is a consequence of physics other than parameterized gravity waves. The summer polar stratosphere is warmer by up to 15K. As shown by de Grandpre et al. , this excessive warming is practically removed when interactive ozone chemistry is included in the model. The warmer polar winter stratosphere is the result of the parameterized GWD. Without it, GCMs are known to produce a colder winter polar stratosphere, known as the “cold pole problem”. The winter stratospheric temperature reflects the shape and the strength of the polar night jet, with the latter being very sensitive to the gravity wave sources in the troposphere, and consequently, to the gravity wave drag. GW sources are not very well known, and Figure 8d suggests that the simulated climate might be improved if a more complex, nonuniform source distribution were known. It is instructive to compare Figures 8d with 8c) of the same figure. It is seen, that thermal effects of GW tend to compensate for the colder summer mesosphere (above p ≈ 0.1 mbar). In the winter hemisphere, the warm spot over 60°S (frame b) tends to compensate for temperatures that are somewhat colder than CIRA (frame d), although the colder temperatures above cool the already colder mesosphere. There is also weak partial compensation for the excessively warm temperatures in the winter polar stratosphere (below 2 mbar), and for the colder equatorial upper mesosphere.
 The strength and location of εGW strongly depends on the distribution of GW drag. The corresponding July monthly mean zonal wind and GW momentum deposition are presented in the Figure 9. As comparison of Figures 8c and 9 shows, areas with high εGW coincide with those where GWD is strongest, which usually occurs near the mean zonal wind reversals in the mesosphere. Areas where GWs induce heating generally coincide with the regions in which the absolute value of the GWD is growing with height, while wave cooling tends to occur in regions with vertically decreasing wave drag.
 Results of calculations for January are presented in Figure 10. Again, Figure 10a shows the monthly mean temperature averaged over 2 years from the run including the interactive εGW. Figure 10b presents the difference between the temperatures obtained in the runs with and without parameterized GW heating. It is seen from Figure 10b, that εGW results in warmer mesosphere temperatures over extensive regions in the both the summer and winter hemispheres (up to +6K in midlatitudes of the SH). The mid to high latitude winter upper mesosphere, however, is colder (by up to −3K) in the run with GW heating. Figure 10c shows the monthly mean heating/cooling rates associated with GWs. The peak εGW (+7 K day−1) occurs over 60°S at around p = 0.01 mbar, and approximately coincides with the maximum of ΔT in Figure 10b. In the winter hemisphere, the cooling portion of the “dipole” is about 2 K, but the heating portion, being less than 1 K day−1, does not show on the plot. Differences between the simulated temperature from the run without εGW and the CIRA temperature are shown in Figure 10d. As for July, the CMAM produces a colder mesosphere (by almost 20K), and a warmer winter and summer polar stratosphere. The tendency of the thermal effects of the parameterized GW to compensate for deficiencies of the model is even stronger in January. Comparison of Figure 10b and Figure 10d shows that the inclusion of εGW produces a warmer mesosphere as well as reducing the excessively warm winter polar stratosphere. Of course, heating/cooling due to parameterized gravity waves cannot be expected to fully compensate for the deficiencies of the CMAM model physics. These deficiencies of course include, but are not restricted to, uncertainties with the orographic and nonorographic GWD schemes, as well as the simplified representation of broad-spectrum wave sources in the lower atmosphere.
Figure 11 presents latitude-longitude distribution of εGW at p = 0.01 mbar averaged over two model Januaries (upper panel), and two Julies (lower panel). Equinox conditions have not been shown because the GWD is relatively weak at that time, and the associated the GW heating/cooling rates generally do not exceed 1 K day−1. Nevertheless, the figure clearly shows a seasonal reversal in GW heating and cooling at mesospheric altitudes associated with the seasonal changes in the jets. The instantaneous values of εGW at the solstices are very intermittent and often reach tens of K day−1, in line with the results of the column model described in the previous section. However, as seen from Figure 11, the averaged heating/cooling rate is only several K day−1. At p = 0.01 mbar, heating due to gravity waves (up to 8 K day−1) is concentrated in higher latitudes of the summer hemisphere, while cooling is weaker (up to −3 K day−1) and spreads to lower latitudes. The stronger longitudinal structure of εGW in the winter hemisphere compared to the summer one is probably due to stationary planetary waves. Comparison with zonally averaged rates in Figures 8c and 10c shows that at higher altitudes, heating/cooling patterns change.
 Our simulations with this middle atmosphere general circulation model suggest that the overall heating/cooling rates associated with GW are of the same order as other diabatic heating mechanisms in the mesosphere: namely, radiative heating and cooling, and photochemical heating. Therefore, unlike the case of breaking planetary waves in the stratosphere, the thermal effects of saturated gravity waves in the mesosphere must be accounted for in middle atmosphere models.
8. Summary and Conclusions
 In this paper we presented a rigorous derivation of the thermal effects of breaking/saturated gravity waves on the mean flow. It is based on the equations for the exchange of energy between the mean motion and compressible two-dimensional hydrostatic GWs propagating in a nonrotating fluid. This yields a relatively general formulation which can be applied to different gravity wave drag parameterizations. We have illustrated the application of these general formulae in the specific case of the MK parameterization, for which we have assumed an incompressible form of the GW dispersion relation.
 It is shown that the combined effect of saturated GW on the mean flow is to produce differential heating/cooling, with the cooling situated above the region of heating, and an irreversible heating associated with the dissipation of wave kinetic energy. The former effect is due to altering the phase relationship between wave fluctuations of temperature and vertical velocity by the diffusion associated with GW saturation/breaking. It can be viewed as an eddy thermal diffusion of the mean potential temperature. The latter effect is the result of irreversible wave energy dissipation into heat caused by turbulent diffusivity associated with stochastic fluctuations in the saturated/breaking gravity waves.
 Our calculations for typical individual wind and temperature profiles show that breaking/saturating GWs produce a “dipole” thermal tendency with cooling above and heating below, and corresponding rates of several tens of K day−1. These rates are in good agreement with observations of individual profiles, suggesting that our assumption of unit turbulent Prandtl number is reasonable for a broad wave spectrum. Simulations with the full-scale Canadian Middle Atmosphere GCM demonstrate that monthly averaged rates are significantly lower than the magnitudes found in individual profiles, due to the temporal and spatial intermittency of GW breaking/saturation events. The monthly mean cooling rates in the upper portion of the mesosphere can reach several K day−1 in winter, with somewhat lower heating rates just below. As these values suggest, heating/cooling associated with GW is of the same order as other diabatic heating/cooling mechanisms in the mesosphere. Therefore, its effect must be included in middle atmosphere models.
 For the sake of clarity and completeness, we include a derivation of (20) here. From (11a), after averaging over x, we have
where subscripts denote partial differentiation over the corresponding parameter. Changing the order of differentiation in the second and fourth terms, we have
Eliminating the third and the last terms using the continuity equation (10b) and recalling that , gives the required expression:
 The goal of this appendix is to demonstrate that the wave kinetic energy flux induced by the diffusivity associated with saturating waves, dR, can be neglected compared to the wave sensible heat flux (also induced by the diffusivity) in (26).
 Introducing such that w′ = ∂/∂x, and integrating the first equation in (39) by parts, we obtain
From the continuity equation (10b) it follows that u′ = −ρ0−1(ρ0)z, and therefore, (B1) can be continued
Replacing by (ik)−1w′, making use of the definition for any complex wave amplitudes a and b, and replacing (e.g., as in the derivation of (35)) yields
Using the formulae derived in MK95 for the vertical evolution of corresponding to any given dR (see equation (24) there), we have
For linear propagating waves, dR = 0, and the flux . At the levels of wave saturation (where dR reaches maximum), , and the flux is equal to zero again. At all other heights, the flux , the other flux term in (26). To show this, we consider the ratio , estimate in (B4), and use (35) for the sensible heat flux to obtain
for short vertical scale waves considered in this paper, mRH ≫ 1.
 This research was partially supported by group Strategic and individual Research Grants from the Natural Sciences and Engineering Research Council of Canada and by the Canadian Climate Research Network of the Meteorological Service of Canada. The authors are grateful to Stephen Beagley for assistance with the CMAM simulations.