## 1. Introduction

[2] The purpose of this paper is to describe the application of a data assimilation scheme that combines measurements and model results on the basis of their respective uncertainties. In the absence of detailed knowledge of the high-latitude input, simulated measurements of the chemical composition in the thermosphere are used to guide the model results toward a solution that is consistent with the input without actually measuring it. In this section we describe the state of modeling and the need for data assimilation for the thermosphere ionosphere system.

[3] Important progress has been made in modeling the effects of geomagnetic storms in the thermosphere and ionosphere using global models with large grids (2°–5° latitude, 5°–18° longitude). Numerical simulations of generic [e.g., *Fuller-Rowell et al.*, 1994, 1996] and specific [e.g., *Codrescu et al.*, 1997; *Fuller-Rowell et al.*, 2000] storms using global circulation models (GCMs) have provided a better understanding of the dynamics of the upper atmosphere and have permitted the identification of the processes responsible for global ionospheric storm effects at high latitudes and midlatitudes. Neutral composition changes driven by high-latitude energy inputs have been identified as one of the main mechanisms responsible for global storm effects [*Rishbeth et al.*, 1987; *Prölss*, 1987; *Burns et al.*, 1991]. However, one should keep in mind that the models do not include the physics of processes with horizontal spatial scales smaller than 400 km.

[4] Most GCM simulations use statistical patterns of electric field and particle precipitation to calculate the high-latitude energy input into the upper atmosphere. Storm effects are very sensitive to the temporal and spatial distribution of the energy input into the high-latitude region. However, statistical patterns do not reproduce even the global-scale characteristics of specific storms. As a result, GCMs can predict typical storm effects (generic storms) rather well but have difficulties modeling specific storm periods because the precise temporal and spatial energy inputs cannot be specified from statistical patterns even for the large grid spacing (2° latitude and 18° longitude) used in this study.

[5] The high-latitude energy input and its spatial distribution have been recognized as the single largest source of uncertainty in a GCM simulation of specific storm conditions in the upper atmosphere [*Codrescu et al.*, 1997]. The need for better inputs is being addressed by the community, and efforts are being made to produce forcing patterns using the SuperDARN radar network and techniques like the assimilative mapping of ionospheric electrodynamics [*Richmond and Kamide*, 1988]. However, as the possibility of entirely removing the forcing uncertainty through direct measurements of electric fields and conductivities is impractical at this time, the use of data assimilation techniques and measurements of the variable of interest, like the O/N_{2} ratio in this case, is an attractive alternative.

[6] There are many ways to implement “data assimilation” using techniques that range from a simple replacement of a model result by a “raw” measurement at one location (nudging) to sequential statistical methods like the extended Kalman filter (KF). The computational burden generally increases with the sophistication of the data assimilation scheme and quickly becomes prohibitive in the case of global GCMs for the thermosphere-ionosphere system (see *Minter* [2002] for full details).

[7] The thermosphere-ionosphere system is strongly forced, meaning that the initial condition can become irrelevant in a matter of hours or even tens of minutes if the external forcing changes rapidly. From a modeling point of view, during storms, if we have accurate inputs over tens of minutes, then one can start a simulation from any state (including all zeros) and still get a better answer than if we start from the perfect initial state but have no knowledge of the input. The strong forcing of the system creates new challenges and makes the direct application of data assimilation methods developed for tropospheric weather and ocean circulation difficult.

[8] The full implementation of the extended KF is not practical for large dimensional problems, and the approximate treatment of the state error representation may lead to unbounded error growth [*Evensen*, 1994]. The difficulties have been resolved in the oceanographic community by the introduction of Monte Carlo–based methods to forecast error statistics using the ensemble Kalman filter (enKF) [*Evensen*, 1994; *Houtekamer and Mitchell*, 1998]. More recently, new approaches that combine data assimilation and ensemble predictions for atmospheric and ocean science have proven to perform significantly better in ensemble adjustment Kalman filters (eaKF) [*Anderson*, 2001, 2003]. However, the enFK and eaKF techniques described in the literature need to be adapted for the case of a strongly forced system before they can be applied to the thermosphere-ionosphere system [*Fuller-Rowell et al.*, 2004; *Minter et al.*, 2004].

[9] In this paper we present preliminary results on the use of an ensemble-type Kalman Filter (entKF) technique on a strongly forced nonlinear system, using software [*Minter*, 2002] developed as part of the Utah State University Global Assimilation of Ionospheric Measurements effort [*Schunk et al.*, 2004]. The system is modeled by a GCM, the coupled thermosphere ionosphere model (CTIM) described in section 2. The CTIM-propagated neutral composition of the thermosphere is constrained using simulated measurements from a more sophisticated model, the Coupled Thermosphere Ionosphere Plasmasphere Electrodynamics (CTIPE) model, to which 10% random variability was added. The use of simulated measurements is justified by the need to demonstrate the concept in the absence of true measurements with sufficient spatial and temporal coverage at this time. However, measurements from the Special Sensor Ultraviolet Limb Imager and Special Sensor Ultraviolet Spectrographic Imager on DMSP satellite series will be available in the near future.

[10] The parameter (state) optimized by the filter is the global distribution of the height-integrated O/N_{2} ratio. The Kalman state has 1820 elements corresponding to the spatial cells of height-integrated O/N_{2} ratio. The height-integrated O/N_{2} ratio is calculated as in the study by *Strickland et al.* [1995]. The global three-dimensional distribution of O, O_{2}, and N_{2} is calculated from height-integrated O/N_{2} on the basis of a set of tables derived by Fuller-Rowell using the Mass Spectrometer Incoherent Scatter (MSIS) model and CTIM results. The tables are available by request.

[11] The entKF runs a number of versions of the CTIM model, forced at different levels of geomagnetic activity. Statistics derived from the results are used to specify the uncertainties associated with the model prediction. One version of the model, forced at the most probable level, is used to propagate the Kalman state. The propagated state and “measurements” are then combined, taking into account their associated uncertainties using the Kalman filter equations described in section 3.