Estimating the state of the thermospheric composition using Kalman filtering



[1] To determine the propagation parameters of high-frequency radio waves, an accurate estimate of the ionosphere is desirable. Estimating the ionosphere, especially during geomagnetic storm times, is strongly dependent on perturbations in the neutral composition. Because of this coupling between the ionosphere and neutral atmospheric chemistry, accurate knowledge of the neutral atmospheric composition is critical in estimating the ionosphere. In the research presented here, a data assimilation system is constructed to optimally estimate the neutral composition, and the necessity for implementing an optimized filtering method, like the Kalman filter, is shown. To demonstrate the data assimilation system, an artificial “truth” thermosphere is created using a physical model. This thermosphere is sampled according to an instrument and satellite simulation algorithm, creating the measurement data set. Noise is then added to the measurement data, to represent observation errors. Data are assimilated, and noise from this data is reduced using a Kalman filter in combination with a state propagation model. Results show that the error in the estimate can be greatly reduced (usually to <6%), even if the observation errors are large (15%), by using a Kalman filter. Best results are obtained by using a Kalman filter together with an accurate physical model.

1. Introduction

[2] Accurately knowing and estimating the ionospheric conditions in near real time would be of great use to communication and navigation system operators since ionospheric disturbances can adversely affect radio propagation through attenuation, fading, absorption, noise, or signal velocity change. Free electrons in the ionosphere can absorb radio signals, but these electrons typically reradiate the signal [Tascione, 1994]. As a result, the process of absorption and subsequent reradiation of radio waves by free electrons in the ionosphere is responsible for determining the group or propagation velocity. If the electron density is known, then the time required for a signal to propagate through the ionosphere is well determined, allowing navigation systems to correctly calculate the distance of the signal path based on the time for propagation. However, if the electron density changes without the user's knowledge, the navigation system miscalculates the distance of the signal path, creating navigational errors.

[3] During geomagnetic storms in particular, changes in the neutral composition in the thermosphere (95–500 km altitude), the neutral component of the upper atmosphere, can alter the recombination rate of the ionosphere (60 to beyond 1000 km altitude), which changes the electron density [Schunk and Nagy, 2000]. Accurately estimating the neutral composition can improve estimates of electron density and would therefore allow navigation system users to correct these errors in signal propagation. However, the dynamics of the thermosphere is complex and is difficult to predict, especially during geomagnetic storms [Fuller-Rowell et al., 1994; Buonsanto and Fuller-Rowell, 1997]. Thermospheric models have been used to understand the dynamics and have helped the scientific community predict likely regions of ionospheric electron depletion [Fuller-Rowell et al., 1996a]. Satellite measurements have allowed scientists to observe these changes as well as the overall dynamics in the thermosphere [Paxton et al., 1992; McCoy and Thonnard, 1997].

[4] In recent years a strong correlation between the changes in the electrically neutral atmospheric components during geomagnetic storms and the ionosphere has been realized [Buonsanto and Fuller-Rowell, 1997]. Large geomagnetic storms occur at Earth when material ejected from the sun by a coronal mass ejection hits the Earth. If the solar wind plasma has a southward magnetic field, it creates a strong coupling with the magnetosphere. Initially, plasma convection increases and auroral particle precipitation expands to lower latitudes [Schunk and Nagy, 2000]. Besides the increased heating rate from particle precipitation and Joule dissipation, the expanded convective electric field also redistributes the plasma [Sojka and Schunk, 1983; Prölss et al., 1991]. The localized heating causes a thermal expansion of the thermosphere, and the induced horizontal pressure gradients drive horizontal winds. Through continuity the diverging horizontal winds create a vertical movement of air carrying the heavier neutral species from low to high altitudes. These heavier neutral species, primarily molecular oxygen and nitrogen, displace the atomic species and capture free electrons in the ionosphere through charge transfer, ion-atom interchange, and dissociative recombination reactions. As a result, the electron density in the high latitude regions decreases in the ionosphere, affecting radio propagation [Davies, 1990].

[5] Because of global circulation, the molecular-rich atmosphere at higher altitudes advects toward the equator and away from the auroral regions toward lower latitudes. The convergent flow toward the equator creates a downwelling in the low-latitude regions. This downwelling decreases the number of heavy neutral species in the upper thermosphere, and the F region electron density increases [Fuller-Rowell et al., 1996a].

[6] One should keep in mind that this description of the neutral atmospheric and ionospheric dynamics is a simplification. Although the latitudinal regions are described in general terms, the upwelling and downwelling regions are greatly varying and are almost never longitudinally uniform. Using empirical models provides an improved estimate for the thermospheric composition. However, empirical models are calculated from a historical database, and as a result the more complex dynamics of the thermosphere is not specifically described. A physical model is therefore required to obtain the finer structure of the neutral composition through the solution of the dynamic equations, assuming the model drivers are known.

[7] Since the physical models are imperfect and the drivers are not well defined, an accurate description of the response requires updates from observations so that the state does not diverge from the truth. If regions of the thermosphere are not observed for long periods of time, then the data assimilation system must relax to an empirical model. Although the finer structure may no longer be discernable, empirical models still provide an improved estimate without observations since it is based on historical data.

[8] In the research presented here a data assimilation system is developed as part of the Utah State University (USU) Global Assimilation of Ionospheric Measurements (GAIM) effort. The data assimilation system is strongly based on previous techniques used to estimate the troposphere and oceans [Cohn, 1982; Hamill et al., 1992; Fukumori and Malanotte-Rizzoli, 1995; Pfaendtner et al., 1995; Howe et al., 1998; Cohn et al., 1998; Fukumori et al., 1999; Houser et al., 1999; Dee and da Silva, 1999]. Such an approach reaps the benefits of satellite data, physical and empirical models, and statistical estimation. As in meteorological or oceanographic applications, the data assimilation system uses satellite data, when available, to correct the physical model output. However, the thermosphere differs from the troposphere and oceans by being more dynamic, i.e., the state can change more rapidly since it is more influenced by external forcing, like auroral, Joule, and solar heating, and the wind speeds are in the hundreds of meters per second. The data is also very sparse in comparison to meteorology, and even oceanography. Instead of having a constellation of satellites and various types of in situ and remote measurements, the thermosphere is only viewed by a small number (1–3) of satellites. The combination of a more rapidly changing system with sparse data makes the solution approach more challenging because some regions may not be observable for the length of time that the thermosphere can significantly change. These unobserved regions must be treated appropriately in the data assimilation system.

[9] If data become unavailable or if there are regions that are unobservable, then the data assimilation system relaxes back to the climatology of an empirical model so that a reasonable estimate is still possible. An extended Kalman filter [Kalman, 1960; Kalman and Bucy, 1961; Costa and Moore, 1991] is used to estimate the state using observations and both physical and empirical models. To emphasize the importance of the empirical and physical models, the data assimilation is also implemented using a “persistence” model, which simply keeps the state constant when it is not being updated by an observation. The reduction of error in the estimate from using the more advanced empirical and physical models is compared to the simplified persistence model.

[10] To demonstrate the effectiveness in improving accuracy using data assimilation methods, the results of this study are compared to a common nonrigorous technique called “nudging” [Davies and Turner, 1977]. Nudging uses observations to correct the state without using the statistical advantages found in the more advanced filters. When nudging is used, the measured grid point is given the exact value of the measurement without consideration of previous measurements, observation errors, grid point correlations, or model errors. However, as will be shown in this research, using nudging and ingesting otherwise unfiltered data into the model will yield nonoptimal results. By comparing nudging against a more mathematically rigorous filtering technique, the improvement in accuracy gained by the filtering demonstrates the necessity for proper development of a data assimilation system for global estimation.

[11] To analyze and evaluate the thermospheric data assimilation system, a simulated data set is created using a physical model. This data set is defined as the “true” thermosphere. Using typical satellite orbits and instrument mechanics, the true thermosphere is sampled to provide an observation data set. This observation data set is corrupted to simulate measurement noise. The data assimilation system then ingests this data and is evaluated, using statistical analyses, by comparing it with the original true neutral density.

2. Instrument Simulation

[12] The main sources of data are the Special Sensor Ultraviolet Limb Imager and the Special Sensor Ultraviolet Spectrographic Imager (SSUSI) instruments. These two instruments are developed for the Defense Meteorological Satellite Program (DMSP) Block 5D4 satellites. DMSP satellites have been collecting weather data for the U. S. military for the last two decades. The DMSP Block 5D4 satellites maintain a near polar, Sun synchronous orbit at an altitude of ∼830 km and carry numerous other instruments for various environmental parameter sensing.

[13] The SSUSI developed by the Applied Physics Laboratory (APL) at Johns Hopkins University and by Computational Physics, Inc. [Paxton et al., 1992; Evans et al., 1995] is designed to measure numerous environmental parameters of the upper atmosphere including auroral parameters. The imager uses a scanning instrument to infer neutral density profiles of the major species O, O2, and N2 on the limb- and height-integrated properties at the Earth's disc. The instrument obtains optical signatures of all of the major species on the dayside. The Scanning Imaging Spectrograph (SIS) [Paxton et al., 1992] device of the SSUSI instrument builds a spectrographic image by scanning across the satellite's ground track, from limb to limb as shown in Figure 1.

Figure 1.

The Scanning Imaging Spectrograph conducting a horizon-to-horizon limb scan.

[14] During this scan, the scan mirror sweeps the field of view by rotating between ±72.8°. The instrument receives a cross-track scan every 22 s. A scan cycle consists of a limb viewing and an Earth viewing part. The limb viewing occurs above the horizon from the maximum scan angle of ±72.8° from nadir to the horizon ±63.2° from nadir. The Earth viewing occurs between the limb scans or between ±63.2° from nadir creating a field of view ∼445 km. In a near circular orbit at an altitude of ∼830 km, the satellite moves 148 km for each 22-s scan. As a result, each image tends to overlap by ∼5 km. With a satellite at 830 km altitude the maximum height above the horizon is ∼520 km if the maximum scan angle is ±72.8°.

[15] While the limb viewing provides density profiles of the thermosphere species, the Earth-viewing part of the SSUSI instrument measures column abundance rather than density profile versus altitude since it is not possible to directly obtain a profile from nadir optical data [Strickland et al., 1995]. The nadir view calculates a ratio of 135.6 nm atomic oxygen and the Lyman-Birge-Hopfield (LBH) dayglow emission of molecular nitrogen. This ratio can be compared to O/N2 as

equation image

where N2ref is the number density at a reference depth, fO and image are the mixing ratios of O or N2, g is a scaling factor for the given emission type, and z is the optical path of the measurement through the thermosphere.

[16] For the research presented here, optical depths are calculated from the Coupled Thermospheric-Ionospheric Model (CTIM) [Fuller-Rowell et al., 1996b], and an integrated O/N2 ratio is obtained. Each integrated O/N2 ratio has an associated vertical profile for the major species O, O2, and N2. For ease of interpolation back and forth from profiles to integrated O/N2 ratio, the variation of the species' concentration, at a given thermospheric level, as a function of integrated O/N2, has been fitted to a hyperbolic tangent function. The function is written as

equation image

where nj is the percentage of species j, at a given level in the thermosphere, ∫O/N2 is the integrated O/N2 value, and a1 and a2 are coefficients. As an example, the fitted hyperbolic functions for pressure level 12, near 300 km altitude, are shown in Figures 2 and 3, for O and N2, respectively. The functional fitting allows a reduced state size that significantly decreases the number Kalman filter operations and improves the speed of the data assimilation system with little change in the solution accuracy.

Figure 2.

Hyperbolic tangent fit of percent atomic oxygen with integrated O/N2.

Figure 3.

Hyperbolic tangent fit of percent molecular nitrogen with integrated O/N2.

[17] A hyperbolic tangent function was chosen to provide a realistic fit for all values of integrated O/N2. As the integrated O/N2 goes to 0, the percent of O and N2 go to 0 and 1, respectively. Oppositely, as the integrated O/N2 goes to infinity, the percent of O and N2 go to 1 and 0, respectively. The hyperbolic tangent function describes the distribution observed in the thermosphere. From the fits at each level, such as shown in the examples in Figures 2 and 3, one can obtain the density profile for each species based on a given value of the integrated O/N2.

[18] The Special Sensor Ultraviolet Limb Imager (SSULI) [McCoy and Thonnard, 1997] is an optical remote sensor developed by the Naval Research Lab. The SSULI instrument measures vertical profiles of the natural airglow radiation from atoms, molecules, and ions in the upper atmosphere. SSULI also infers ions in the upper atmosphere and ionosphere by viewing the Earth's limb at a tangent altitude of ∼48 km to 750 km. SSULI is able to infer, on the dayside, all of the main neutral atmospheric species, O, O2, and N2.

[19] The limb scanner faces in the opposite direction of the satellite flight path vector as shown in Figure 4. The field of view is 0.1° in the vertical and 2.4° in the horizontal direction. The entire scan covers 30° in the vertical by 2.4° in the horizontal direction. The scan ranges from 10 to 40° below the direction of flight of the satellite. The scanner covers 6° every second.

Figure 4.

The Special Sensor Ultraviolet Limb Imager conducting a limb scan.

[20] Unlike the SSUSI instrument, the SSULI instrument only looks toward the Earth's limb to infer the species' density profiles [McCoy and Thonnard, 1997]. SSULI obtains the vertical profiles by viewing between tangent altitudes of ∼50 to 750 km. As the SSULI instrument scans along this altitude range, the extreme and far ultraviolet airglow ratio from atoms, molecules, and ions are recorded as a function of altitude.

[21] Two Sun-synchronous satellites of orbits 0930 and 1330 LT are used to sample the truth thermosphere, and only dayside measurements are assumed to be available. Each satellite carries both a SSULI and SSUSI instrument. A fixed error of 15% is applied as measurement noise via the random number generator. This representation of the error is ∼5–10% higher than what is predicted by the instrument designers. However, the increased error provides a “worst case” scenario to test the data assimilation system. Each instrument is assumed to provide the filter with one measurement of the major neutral species every 0.07 s. The actual instrument capabilities probably significantly exceed the definitions in this research, but the instrument accuracies and measurement rates are reduced to test the data assimilation system under the most stringent conditions.

[22] Last, these instrument definitions are a generalization of the more complicated process of converting the actual measurement to realistic estimates of the composition. The instruments are defined as being generally equivalent in capability to not distract from the main focus of this research, the evaluation of the data assimilation system. As instrument mechanics, resolution, and observation techniques become more of a focus in future research, the data assimilation system can be modified as instrument definitions are upgraded or changed.

3. Model Descriptions

[23] An important element to data assimilation comes from modeling the physics of the system. The model provides a physics-based, time-dependent description of the state, and thus also provides time correlation between measurements. Further, the model provides additional information about the state that may not be available from the observations, predicts future states, and guides the data assimilation system if data become unavailable.

[24] Three model types are considered: a physical model, an empirical model, and keeping the state constant, “persistence.” The physical model is ideal for propagating the state in time, which allows the filtering method to correlate the current measurement with the state at a previous time. The disadvantage of the physical model is that its accuracy degrades with time unless it is updated with measurements since there may be imperfections in the way the physics and drivers are described. When observations become unavailable, then the filtering system must turn to an empirical model that provides a reasonable level of accuracy, without current measurements, based on climatology.

[25] The third model mentioned is persistence, which is the simplest type of model. Persistence keeps the state constant, which is a sufficient assumption if the observation frequency and amount of coverage are sufficient enough in comparison to the rate at which the thermosphere changes. Although the accuracy is lower for data assimilation systems using persistence, using persistence requires very little computation time in comparison to more realistic models. Persistence also represents a baseline from which improvements through using the more advanced models can be compared.

3.1. The Physical Model

[26] The Coupled Thermospheric-Ionospheric Model (CTIM) [Fuller-Rowell and Rees, 1980, 1983; Quegan et al., 1982; Fuller-Rowell et al., 1996b] provides a complex and well-tested [Fuller-Rowell et al., 1994, 2000] physical model for the propagation of the state. CTIM is a combination of two independently developed physical models. The first part of CTIM contains a global, nonlinear, time-dependent neutral atmospheric model developed at University College London [Fuller-Rowell and Rees, 1980, 1983]. The second part contains a midlatitude and high-latitude ionospheric convection model that originated at Sheffield University [Quegan et al., 1982]. The high-latitude electric field and auroral particle precipitation are two of the inputs to the ionospheric-thermospheric coupled model, and control the amount of Joule heating. The neutral composition portion of CTIM, however, is the primary part of the algorithm used for the state propagation when assimilating the neutral species.

[27] The thermospheric portion of the code numerically solves the nonlinear equations of momentum, energy, and continuity to provide a time-dependent structure of the wind vector, temperature, and density in the neutral atmosphere. The altitude scale of the thermospheric model is pressure dependent. The range begins at 1 Pascal, 80 km altitude, and extends from 300 km to 700 km altitude depending on the amount of expansion from heating.

[28] The major species composition equations, including solution of the three major species, O, O2, and N2, including chemistry, transport and the mutual diffusion between the species, are solved in parallel with the dynamics and energy budget. Using a combination of the generalized diffusion equation [Chapman and Cowling, 1970] and the continuity equations, the change in mass mixing ration of the three species is evaluated self-consistently with the wind and temperature fields. Allowance is made for mutual molecular diffusion, horizontal and vertical advection, turbulent mixing vertically and horizontally, and production and loss mechanisms.

[29] The nonlinear equations of the thermospheric portion are solved self-consistently with a high-latitude and midlatitude ionospheric convection model poleward of 23° latitude. The ionospheric portion numerically solves the nonlinear equations for ion continuity, diffusion, and temperature. Additionally, odd nitrogen species are taken into account in calculating the molecular ion concentrations as well as other diffusion parameters.

[30] The advantage of CTIM becomes more obvious during geomagnetic storm times when localized Joule heating creates regions of sudden density and composition changes in the thermosphere. The dynamical equations in CTIM reproduce these sudden local changes given the appropriate high latitude inputs, whereas statistical models based on climatology cannot. Additionally, these changes are generally too sudden for persistence to accurately represent, and thus a physical model may provide the only means for accurately estimating the neutral composition during geomagnetic storm times. However, the amount and location of the Joule heating must be specified for an optimal response. Also, during storm periods the physical relationship between grid points becomes even more important due to the rapidly changing conditions and the limited amount of coverage that a satellite can provide during the short time span of the storm. It is expected that the physical model, CTIM, will be necessary during storms since it has the capability of advecting information from observed to unobserved regions through the physical equations.

[31] One may also be faced with loss of data for extended periods of time. Delays in downloading the data from the satellite to the user will inevitably occur. Also, large areas may not be covered as a result of satellite-solar geometry, satellite and instrument malfunctions, or daily and seasonal effects. Therefore, a physical model is necessary to provide an estimate to cover any limitations in the availability of data.

[32] Finally, it may be necessary to provide an estimate of the state in the future. In this case a physical model, like CTIM, provides the most accurate means for propagating the state to a certain point in the future. However, the accuracy of this estimate will degrade with time, and relaxation of the state to climatology over a specific time period will be necessary.

3.2. Determining the Main Parameters of the Coupled Thermospheric-Ionospheric Model (CTIM)

[33] In meteorology and tropospheric modeling, including winds and pressure in the data assimilation system is a necessity since these parameters may be greatly varying and are influenced by a wide array of other parameters. The thermosphere is similar in this regard. The difference arises from the more externally driven dynamics in the thermosphere, rather than the dynamics evolving from internal stochastic processes. It may be possible to accurately forecast the neutral composition transport by using only the more influential parameters of the thermospheric system, which would reduce the number of state elements and would expedite the data assimilation calculations. The main driver of the circulation is the Joule heating. This driver heats the thermosphere locally, creates horizontal pressure gradients, changes the global circulation, and produces composition changes. The convection between Joule heating and the resultant composition change follows a well-understood and -modeled sequence. From this clear chain of events in the composition's reaction to Joule heating, it may become unnecessary to include all of the parameters in the state, assuming many of the parameters can be determined from the more influential parameters, like Joule heating, and that the other external drivers, like EUV, remain unchanged. Which parameters must be included in the state and how is the accuracy affected by each of the parameters is determined in this section.

[34] A test is conducted using CTIM in the Kalman filter with various state definitions to determine which parameters have the greatest affect on the accuracy of the filter. The RMS error is calculated for each of the state definitions. The following six state vector definitions are implemented: (1) composition only (worst-case), (2) horizontal winds and composition, (3) vertical, horizontal winds, and composition, (4) vertical, horizontal winds, height of the pressure level, and composition, (5) composition and Joule heating driver represented by Ap and (6) all paramters, including the driver (best case).

[35] A theoretical scenario is applied where the entire globe may be viewed every hour to eliminate errors due to poor satellite coverage and to emphasize the model's effect on the final composition specification. Since the composition is only measured once per hour, the state RMS errors will be higher in this test as compared to normal operation. It is assumed that the other parameters, winds, pressure, and driver, are provided by other sources and that the global knowledge of these parameters are, for testing purposes, known with a 10% error. A storm is also assumed to last for 12 hours with an Ap index of 125.

[36] The results of the six state vector definitions are shown in Figure 5. As a guideline, the ideal case, containing all of the parameters, shows the best results. This ideal case requires measurements of the driver, winds, pressure, and composition, all of which may not be available when attempting to assimilate a real data set.

Figure 5.

A comparison of filter accuracy for six state vector definitions.

[37] The least accurate results are obtained by including only the composition in the state vector. In this case the composition is not correctly propagated since the dynamics are not considered. Throughout the 12 hours the composition-only state vector shows the largest error in comparison to the other state definitions.

[38] When winds are included in the state vector, the accuracy significantly improves. The incorporation of the vertical wind, however, does not noticeably improve the accuracy in comparison to the horizontal winds-only state vector. This similarity in the accuracy indicates that the horizontal winds, and their divergence, are sufficient to characterize the creation of the neutral composition changes and then subsequent transport around the globe.

[39] Including pressure gradients in the state vector shows additional improvement in the accuracy, but this improvement is slight. The improvement indicates that the change in pressure affects the wind velocity, but the small improvement in accuracy shows the greater importance of the horizontal wind contribution in propagating the neutral composition.

[40] Last, the composition and driver only are included in the state vector. Despite the winds and pressure being removed from the state vector, inclusion of the composition and driver in the state vector shows further improvement. This result demonstrates the dependence of the thermospheric dynamics on external drivers. Additionally, the result in Figure 5 indicates that the composition movement can be well forecasted if the driver is well known. Although the state, containing the composition and driver only, does not match the best case, including the driver in the state vector demonstrates that the winds and pressure do not have to be included to obtain reasonable results if the driver is well specified.

[41] Therefore the propagation of the state with CTIM, as used in this research, may be described as

equation image

where the inputs are the current estimated state, equation image [composition], which contains the information describing the composition only, Δt is the duration of the propagation, and the drivers include the geomagnetic index history, Ap, universal time, UT, and the day of the year. The output is the state containing the new composition description, equation image [composition] at the future time, which is normally the next observation time.

3.3. Empirical Model

[42] As mentioned earlier, the physical model, unless corrected using measurements, will eventually deviate from the truth. As the physical model estimate loses accuracy as it extends its estimate into the future, the state must eventually relax to historical climatology. This historical climatology may be statistically represented using an empirical model.

[43] A drawback of the empirical model comes from it being a statistical representation of the composition distribution. Because of its statistical derivation, unusual physical conditions may not be accurately represented in the empirical model. For example, regions of sudden and local storm-induced thermal expansion may be averaged over in the empirical model and therefore not predicted.

[44] The Mass Spectrometer and Incoherent Scatter Model (MSIS) MSIS-86 [Hedin, 1987] is an empirical thermospheric model based on data from satellite and ground-based measurements. The model provides O2, O, and N2 as well as other species including N, He, H, and Ar. The model also provides the thermospheric temperature. The model is based on in situ data from seven satellites, numerous rocket probes, and five ground-based incoherent scatter stations [Hedin, 1983, 1991]. The model uses a temperature profile as a function of height for the upper thermosphere and an inverse polynomial as a function of height in the lower thermosphere. For each estimate of the height-integrated O/N2 ratio, a vertical profile in pressure coordinates is defined. The temperature, in turn, is required to distribute the profile in height. One may also integrate the composition profile to determine the height integrated O/N2 ratio. The temperature and composition profiles are dependent on universal time, solar and magnetic activity, annual, semiannual, seasonal, diurnal, semidiurnal, and terdiurnal effects.

4. The Extended Kalman Filter

[45] One of the most commonly used data assimilation techniques, called sequential or Kalman filtering, is based on minimum variance estimation. Kalman [1960] originally developed the filter algorithm, and later modifications were made to estimate a continuous dynamical system [Kalman and Bucy, 1961]. An extended version of the Kalman filter [Costa and Moore, 1991] allows convergence in real time without iterating.

[46] The advantage of the extended Kalman filter comes from continually updating the nominal state in real time. The extended version of the Kalman filter is often applied to nonlinear systems since the nominal state is continually updated, which helps minimize errors from the higher order terms. Unlike the traditional Kalman filter, the extended Kalman filter does not have to iterate to reach the “best” estimate of the state. As observations are taken, the state error variance-covariance matrix settles to a statistical steady state, or a “floor,” at which the average error no longer decreases. This floor is not constant since the amount of error in the state can change depending on the accuracy of each individual measurement. However, the floor is a statistical representation of the average error, which is determined by the propagation model and observation errors. Once this steady state is reached, the extended Kalman filter is now approximately as accurate as the traditional Kalman that has been iterated to its best estimated state [Born et al., 2003].

[47] One begins the Kalman filter with a ‘best’ estimate state vector, equation imagek−1, which is an n × 1 vector with the subscript indicating time, k − 1. Each element of the state vector indicates a defined condition at a given grid point; n is the number of parameters describing the neutral composition. In this research, for example, the elements of the state vector represent the vertically integrated O/N2 ratio at each grid point over the globe. The observed O/N2 is not a vertical integral, but is the total line-of-sight measurement from the instrument to the lowest measurable altitude, or the optical depth, of the thermosphere. The measurement must be converted to the vertically integrated O/N2 via pre-calculated look-up tables. The Kalman filter state has an associated n × n error variance-covariance matrix, Pk−1, where the diagonal elements indicate the state error variance for the corresponding element in the state vector, equation imagek−1, and the off diagonal indicates the covariance between two elements, or grid points, in the state vector. The Kalman state error variance-covariance matrix, Pk−1, is an indicator of the accuracy in the state estimate, equation imagek−1, and therefore, how well the Kalman filter is performing.

[48] To propagate the estimated state, equation imagek−1, and its error variance-covariance matrix, Pk−1, forward in time, a model is used. Although, this model may range from a complicated set of differential equations to just keeping the state constant, the model is always represented in the Kalman filter by the transition matrix, Φk,k−1. Even though a set of differential equations describes the model dynamics, the transition matrix must be a linear form of the model. The transition matrix propagates the state and its error variance-covariance matrix from time of the last state estimate, k − 1, to the next measurement time, k, respectively as

equation image


equation image

where equation imagek is the propagated state vector, v is the forcing to the system, equation imagek is the propagated state error variance-covariance matrix, Q is an n × n variance-covariance matrix correction to errors in the linear propagation model, or transition matrix, Φk,k−1, and T represents the transpose of the transition matrix.

[49] The linearized model, Φk,k−1, may be described by a Gauss-Markov process [Liebelt, 1967] so that two models may be used simultaneously to propagate the state. This process is based on the state relaxing from either persistence or CTIM to MSIS if measurements become unavailable or if regions remain unobservable for extended periods of time. The Gauss-Markov process may be written as

equation image

where equation image is the estimated persistence or CTIM state at time k − 1 and equation image is the propagated persistence or CTIM state at time k. The persistence or CTIM state is propagated from equation image to equation image using either persistence or CTIM as described in equation (1). The nominal states, equation imagem and equation imagem, on the other hand, are the MSIS state at times k and k − 1, respectively. The MSIS nominal states, equation imagem and equation imagem, do not require propagation since they can be determined directly from the empirical model, MSIS. The Gauss-Markov coefficient, τ, represents the appropriate exponential rate for the relaxation from CTIM to MSIS.

[50] Letting equation imagek = (equation imageequation imagem)k and equation imagek = (equation imageequation imagem)k, the state propagation may be written as

equation image

which has the same form as equations (2) and (3), where Φk,k−1 = exp[−Δt/τ]. The state vectors, equation imagek and equation imagek−1, represent the Kalman state vector in the Kalman filter equations. The time value, Δt, is the time since the grid point under consideration is last measured. If Δt is small, then the propagation relies mostly on the CTIM state, but if Δt begins to increase as the grid point has not been measured for a long period of time, the state will exponentially relax to MSIS. The value used in this research for τ ∼ 14 hours. These values indicate that the state should completely relax from CTIM to MSIS after ∼3 days without observations.

[51] Errors in the Gauss-Markov process and models add some errors to the propagation of the state error variance-covariance matrix, and the propagated state error variance-covariance matrix, equation imagek, calculation in equation (3), must be adjusted by the amount of expected propagation uncertainty represented in the propagation error variance-covariance matrix, Q. The diagonal elements of Q represent the rate at which the error in the propagation grows with time. The off-diagonal elements in the transition matrix, Φk,k−1, are set equal to 0, which is an adequate assumption since the off-diagonals elements in the correct transition matrix are close to zero. Errors, from assuming that the off-diagonal elements in the transition matrix are zero, must be considered in off-diagonal elements Q so that a time correlation between grid points exists. In general, the values for the off-diagonal elements are defined based on the influence of one grid point on another. The covariances (off-diagonal elements) for each grid point are determined from this correlation and the value of the variances (diagonal elements) in Q. Adjacent grid points would have the strongest time correlation and would provide the largest values in the off-diagonal elements of Q, and two grid points on opposite sides of the globe would be given a value of 0.

[52] Estimating propagation uncertainty is not straightforward. Before the Kalman filter is applied, Q must be estimated based on one's knowledge of the system and physics and largely by trial and error. In this research, Q is adjusted until the estimate errors are minimized. One may also augment Q during geomagnetic storms as it becomes more difficult to estimate the neutral composition during the more dynamic storm periods.

[53] The Q matrix has a specific form [Liebelt, 1967], which is written as

equation image

The values for κ and τ, based on the expected propagation error, must be determined before the data assimilation process is implemented. Typically, κ is roughly equal to the variance in the propagation error. However, a range of values for κ and τ are used to compare the root mean square (RMS) error between the estimated state to the truth set. A map of the RMS state error with respect to the ranges of κ and τ are shown in Figure 6.

Figure 6.

Determination of the best κ and τ based on minimum state RMS for a dawn-dusk orbiting satellite at solstice using a Gauss-Markov process.

[54] Figure 6 shows that a value for κ, that is too small, increases the error substantially in comparison to choosing a value of κ that is too large, with the optimal value at 0.01, where the units are the integrated O/N2 squared. The choice of τ, however, affects the accuracy even more greatly. The value of τ decides the amount of time dependency for the process error variance-covariance matrix, where Δt is the time between measurements. A minimum for τ is well defined at ∼14 hours, indicating that the amount of error contributed to the state estimate from the process errors grows at this exponential rate as predicted by the Gauss-Markov process.

[55] The off-diagonal terms of Q are estimated based on the expected correlation between grid points. A ‘correlation length’ [Houser, 1996] is defined as 0.4 km/s, or 240 km/10 min, where 10 min is the update time of the Kalman filter. The correlation length defines the time it takes for a grid point to affect another grid point for a given distance between the two grid points. This value, 240 km/10 min, consistently provides the lowest RMS and closely matches the typical wind speeds in the thermosphere.

[56] Once the estimated state vector and its error variance-covariance matrix are propagated to the next observation time k, one may then obtain the innovation vector, yk, calculated from

equation image

where Yk is the actual observation from the instrument and Gk is the expected observation based on the model prediction. The propagated state and current measurement vectors are related to each other by the observation matrix, Hk, and this relationship may be written as

equation image

where ɛ is the unknown observation error to be determined. The weighting, or Kalman gain, Kk, is determined from the propagated state error variance-covariance matrix, equation imagek, the observation error variance-covariance matrix, Rk, and the mapping matrix, Hk. The Kalman gain is calculated as

equation image

The m × m observation error variance-covariance matrix, Rk, where m is the number of observations at the current time, must be determined before the data assimilation system is applied. Rk can often be determined from the instrument performance specifications supplied by the instrument manufacturer or operator. Besides the instrument errors, Rk also typically contains representation error, e.g., components in the real data that can be attributed to sub-grid scale phenomena the model cannot and does not represent. Since the truth data set is simulated, these errors are zero, but if this system were applied to real data, then one would have to consider these errors and include them in Rk. The calculated Kalman gain then is used to map the innovation vector correction to the model propagated state as

equation image

where equation imagek is the new state estimate at time k. The new state error variance-covariance matrix is also corrected at this time as

equation image

which is the Joseph formulation [Joseph, 1964], which ensures that Pk is always positive-definite. The extended Kalman filter equations are then repeated, using the corrected state estimate and its corrected error variance-covariance matrix, to obtain a state and state error variance-covariance matrix for the next observation time.

[57] The results in this research will also refer to the simplest appoximation to the Kalman filtering approach, called ‘nudging’ [Davies and Turner, 1977]. Nudging is not really a filtering technique since it ingests raw measurements into the model without attempting to statistically remove the random observation errors. In the traditional extended Kalman filter, nudging is equivalent to keeping the Kalman gain equal to one so that each new measurement is taken as the truth and the propagated state is ignored. This research presents the nudging method because of its common usage and to demonstrate how the estimate error could be reduced if a more mathematically rigorous filtering technique is used.

5. Evaluation Methods

[58] To test and optimize the data assimilation system, the filter results must be compared to the truth data set. In this research, the physical model, CTIM, is used to generate this truth data set. The various analyses of the data assimilation system will be scored on how well each can reproduce the truth data set. The scoring is based on the standard root mean square error and on the pattern correlation for a 91 × 20 lat-long grid, 2° latitudinal and 18° longitudinal spacing.

5.1. Standard Root Mean Square Error

[59] The data assimilation methods, presented in this research, are scored by calculating the root mean square (RMS) error. The RMS error between the truth file and the actual estimated state is calculated as

equation image

where equation imagei is the ith element of, equation image, the state estimate, xi is the corresponding ith element of the true state obtained directly from the truth thermosphere, and n is the total number of state elements, where each state element is area-weighted. Calculating RMS error provides a globally averaged estimate of the data assimilation system's accuracy.

5.2. Evaluating Pattern Correlation

[60] The RMS error calculation is an adequate tool for estimating the overall error in the estimate, but the RMS error provides no insight as to how well the global pattern of the estimated neutral distribution matches the true distribution. If the patterns have a low amount of global variability, then the RMS error may appear small even though the patterns do not appear to be similar. However, how well two global patterns match can be quantified through the calculation of a pattern correlation coefficient. This pattern correlation coefficient, ρk, at time k, may be calculated as

equation image

where cov(equation imagek, xk) is the covariance between the estimated and true patterns, var(equation imagek), is the variance of the estimate, and var(xk) is the estimate of the true state [Montgomery and Runger, 1994] at time k.

[61] If the state estimate from the data assimilation system provides an accurate representation of the true structure, then a correlation coefficient calculation will result in a value close to but not greater than 1. Having a correlation coefficient of exactly 1 represents a perfect correlation. If the estimated structure does not match the truth, then a negative correlation or a correlation coefficient near 0 will result. Ideally, one would wish to obtain a pattern correlation coefficient that is as close to 1 as possible.

6. Illustrative Examples

[62] Using CTIM, driven by a Foster-type electric field [Foster et al., 1986], is used to create the truth data set, at vernal equinox, test the system. The scenarios simulate a 48-hour period of low geomagnetic activity with an energy input of ∼10 gigawatts (GW), which is equivalent to an Ap index of ∼7. After the 48-hour quiet period, a 12-hour geomagnetic storm with a power input of ∼260 GW follows, equivalent to an Ap index of ∼300. After this 12-hour storm period, another 12-hour quiet time period, with an Ap of 7, follows. The total simulation period covers 72 hours. The Kalman filter uses the first 24 hours of quiet to initialize and reach a steady state. After the steady state is reached, the results in the quiet period from 24 to 48 hours demonstrate the quiet time accuracy of the data assimilation system under undisturbed conditions. At 48 hours, the geomagnetic storm commences. During the 12-hour storm period, the filter must then react to a rapidly changing state. The final 12 hours of quiet time demonstrate the recovery characteristics of the filter, during which the composition is still changing as it recovers from the storm to quiet conditions.

[63] The seasonal simulation uses two satellites at 0930 and 1330 LT orbits with each satellite carrying both SSULI and SSUSI instruments. Both instruments take one measurement every 0.07 s. The measurement error of the integrated O/N2 is assumed to have a standard deviation roughly 15% for both instruments, which provides a worse-case representation of the error to test the robustness of the data assimilation system. All variations to the data assimilation system are initialized using MSIS.

[64] Table 1 outlines the methods used to estimate the truth thermosphere. MSIS only uses the MSIS model with no state propagation or filtering; i.e., it is the direct output from the MSIS model with out corrections. Nudging using persistence allows observations to update the state without a filtering technique; the Gauss-Markov process, relaxing from persistence to MSIS, is used as the transition matrix to propagate the state. Nudging using CTIM, again, allows observations to update the state without filtering, and the Gauss-Markov process, relaxing from CTIM to MSIS is used as the transition matrix. The Kalman filter using persistence allows the filter to correct the state, and the Gauss-Markov process relaxes from persistence to MSIS. Last, the Kalman filter using CTIM corrects the state using the filter, and the Gauss-Markov process relaxes from CTIM to MSIS.

Table 1. Methods Useda
ScenarioTransition MatrixFiltering?Models Used
  • a

    MSIS, Mass Spectrometer and Incoherent Scatter model; CTIM, Coupled Thermospheric-Ionospheric model.

MSIS onlynonenoMSIS
Nudging using PersistenceGauss-Markovnopersistence relaxing to MSIS
Kalman filter using PersistenceGauss-Markovyespersistence relaxing to MSIS
Kalman filter using CTIMGauss-MarkovyesCTIM relaxing to MSIS

6.1. Comparing RMS Errors

[65] The RMS errors, as calculated in equation (5), are shown for the equinox case in Figure 7. This figure shows the RMS error for MSIS only, nudging with persistence, and the extended Kalman filter with the persistence and CTIM models. Figure 7 shows that the data assimilation system and the Kalman filter need ∼24 hours to reach a steady state where the RMS error does not decrease further. The steady state, from 24 to 48 hours, shows a ‘floor’ representing the lowest RMS error the data assimilation system obtains for quiet conditions. The geomagnetic storm begins at 48 hours and lasts for 12 hours, and during this time, the RMS error, for all three model types, increases. The RMS error for the Kalman filter using CTIM increases less during the storm and maintains a low error during the recovery phase since the physics are better represented. During the storm, the state is changing rapidly in comparison to the observation rate. Although the persistence state propagation is sufficient during quiet times, the physical model, CTIM, proves necessary to capture the storm effects, indicating that a physical model is necessary to estimate the state in unobservable regions.

Figure 7.

A comparison of the RMS error in the integrated O/N2 for the four cases in Table 1.

[66] Figure 7 also indicates the necessity of using a statistical filter in the data assimilation system. When comparing RMS differences between the nudging and Kalman filter results when using either persistence or CTIM, the resultant lower RMS error of Kalman filter demonstrates that the technique better represents the truth thermosphere. Using the Kalman filter to remove observation errors provides about a 2–7% improvement after the steady state is reached.

[67] The MSIS-only result gives a comparatively higher RMS error overall, but this result is not indicative of the capability of MSIS. Since the truth thermosphere is created by CTIM, it is expected that CTIM would be more capable in reproducing the simulated thermosphere. Instead, MSIS is shown here as a baseline for the largest possible difference between the MSIS and CTIM models. Further, although the MSIS-only model shows a comparatively large RMS, the RMS error is still low overall and provides an accurate state estimate to which the Gauss-Markov process can relax if observations become unavailable. Also, the Kalman filter using persistence and CTIM may begin with an accurate initial state vector with a low RMS error value because the filter is initialized with the MSIS model. Through the Gauss-Markov process, the MSIS model acts as an error ‘cap’, constraining the state estimate so that the state will not continue to diverge without measurements.

6.2. Comparing Correlation Patterns

[68] Figure 8 shows the global pattern correlation coefficient, as calculated in equation (6), between the estimated state from the assimilation system and the truth thermospheric composition. Figure 8 also indicates that the data assimilation system requires the first 24 hours of quiet to initialize and reach a steady state. During the first 24 hours, the pattern correlation coefficients increase toward 1, demonstrating how the data assimilation system's accuracy improves as measurements are taken. Between 24 and 48 hours, the assimilation system reaches a steady state, as the conditions remain quiet. With the onset of the geomagnetic storm at 48 hours, the correlation coefficient decreases the least in the Kalman and nudging cases using CTIM as the propagator demonstrating the value of an accurate physical model. The correlation coefficient also shows better results during the storm recovery phases when CTIM is used as the propagator. Last, the Kalman filter shows an advantage over nudging when trying to reproduce the correct pattern for the composition; the Kalman filter pattern coefficients are overall closer to 1 as compared to the nudging examples.

Figure 8.

A comparison of the pattern correlation in the integrated O/N2 for the four cases in Table 1.

[69] To provide a better view of the estimate results, Figures 9 and 10 are provided. Figures 9 and 10 show a global view of the integrated O/N2 at the storm's end, at 60 hours. The top panels in both Figures 9 and 10 show the truth thermospheric composition. The middle panels in Figures 9 and 10 show the data assimilation estimates for the Kalman filter using persistence and CTIM, respectively. The bottom panels provide the absolute error between the truth thermosphere and the filtered estimate. In the bottom panels, blue signifies the lowest error, and red signifies the highest absolute error between the top two panels. The middle panel of Figure 9 shows the result from the data assimilation system when using persistence as the state propagation model with the Kalman filter. The region between 0° and 200° longitude is not recently measured, and a region of error is shown in the bottom panel of Figure 9 by the large yellow and red area between ∼50° latitude and 200° longitude. The region of error occurs because the persistence model has not accurately propagated the composition through the 12-hour storm. In fact, a clear discontinuity, at ∼200° longitude, between the recently measured and unmeasured regions can be see in the erratic shape of the integrated O/N2 in the middle panel of Figure 9. Finally, the details of the truth thermosphere in the top panel are not well represented in Figure 9 (middle) by the Kalman filter using the persistence model, which shows only a generalized shape of the thermosphere.

Figure 9.

A global view of the integrated O/N2 using persistence as the state propagation model.

Figure 10.

A global view of the integrated O/N2 using the physical model, CTIM, as the state propagation model.

[70] Figure 10 shows the results of the Kalman filter using the physical model, CTIM, as the propagator. Figure 10 (middle) shows that the data assimilation estimate from the Kalman filter using CTIM more closely resembles the truth composition in the top panel. The error, shown by the yellow and red area present in Figure 9 (bottom), is reduced in Figure 10 (bottom), demonstrating a decrease in the Kalman filter error using CTIM. This decrease in error indicates that the physical model more accurately propagates the state during the geomagnetic storm. Further, the result for the Kalman filter using CTIM, Figure 10 (middle), shows more detail than the result from the Kalman filter using persistence, Figure 9 (middle). The case with CTIM is somewhat idealized since the same model is used to generate the truth file. It does, however, illustrate the value of an accurate physical model.

7. Conclusions

[71] A data assimilation system has been applied to estimate a simulated truth thermosphere, to estimate the neutral atmospheric composition. This thermosphere is realistically sampled based on the SSULI and SSUSI instruments both on each of two DMSP Sun-synchronous satellites. The results of the test demonstrate the advantages and necessity of implementing data assimilation, like the Kalman filter, when estimating the neutral composition. The Kalman filter system is compared to the nonfilter system using nudging, and the Kalman filter results show significant improvement. The system also demonstrates that it can continue to provide an accurate state estimate, even during geomagnetic storms and limited observability, through the use of a physical and empirical model. The physical model provides information about how the state is to be propagated, particularly when the dynamics change rapidly as during the storm periods and when observations become unavailable as a result of instrument and satellite constraints. The empirical model is implemented to ensure that the state does not stray from the truth if measurements become unavailable or if the time period between measurements becomes too long for the state to be accurately propagated. Through a combination of these model types and the Kalman filter method, results show a significant improvement in the thermospheric composition estimate.