Quantification of Representation Error in the Neutral Winds and Ion Drifts Using Data Assimilation

In this work we quantify the representation error of the algorithm Estimating Model Parameters from Ionospheric Reverse Engineering (EMPIRE), which estimates global neutral winds and ion drifts given time‐varying plasma densities. SAMI3 (Sami3 is A Model of the Ionosphere) serves as the background climate model and pseudo‐measurements for the EMPIRE observation system. This configuration allows the data assimilation inputs to be self‐consistent between each other and with the validation data. The estimated neutral winds and ion drifts are compared to the Horizontal Wind Model (HWM14) and SAMI3 “truth.” For both the quiet period on 25 August 2018 and subs7equent storm on 26 August, the EMPIRE estimation of ion drifts is better at low‐to‐mid geomagnetic latitudes with mean error up to 20 m/s. For the high latitudes (poleward of ±60° magnetic), the mean errors exceed 50 m/s with variances up to 200 m/s, and the relative errors are higher than the “truth.” At latitudes of ±87°, the large errors are attributed to a boundary effect. However, the neutral wind mean errors peak at 20 m/s at mid‐latitudes (40°–60° magnetic), with larger uncertainties, then converge to 0 approaching higher latitudes. By conducting this study, we define a method for obtaining the representation error covariance for future use of EMPIRE with SAMI3 as background.


Introduction
Space weather forecasting is extremely desirable for alerting about geomagnetic storms, since forecasts can give operators of technological systems affected by space weather time to react to minimize the impact of these events.Data assimilation algorithms combine regional measurements with global background models to improve estimation and forecasting of a state, and can be used for space weather analysis.Data assimilation methods have been developed to study the ionosphere.Different studies (Bust et al., 2007;Elvidge & Angling, 2019;Mandrake et al., 2005;Matsuo & Araujo-Pradere, 2011;Qiao et al., 2021;Scherliess et al., 2006;Ssessanga et al., 2021;Wu et al., 2015) have estimated the ionospheric density by ingesting GNSS (Global Navigation Satellite Systems) global measurements of TEC (Total Electron Content).Reid et al. (2023) introduced the application of particle filter in Assimilative Canadian High Arctic Ionospheric Model (A-CHAIM), which provides high-latitude ionosphere from ingesting slant TEC data from ionosondes and GNSS receivers, and vertical TEC from JASON-3 altimeter.Pignalberi et al. (2019) also ingest global TEC measurements to update the background model International Reference Ionosphere (IRI).Galkin et al. (2012) developed IRI Real-Time Assimilative Mapping (IRTAM), which also updates IRI in real time but by ingesting Global Ionospheric Radio Observatory (GIRO) data.
Efforts to estimate the plasma drivers in the coupled ionosphere-thermosphere (IT) system, for example, neutral winds and ion drifts, have been used before too.He et al. (2021) updates the model TIEGCM by ingesting thermospheric temperatures.Additional examples of data assimilation algorithms are Assimilative Mapping Ionospheric Electrodynamics (AMIE) algorithm (A.Richmond, 1992), Assimilative Mapping of Geospace Observations (AMGeo) (AMGeO, 2019), and the Data Assimilation Research Testbed (DART) (Anderson et al., 2009).Hsu and Pedatella (2021) ingest ICON measurements of the neutral winds to update the outputs of the system WACCMX + DART.Laskar et al. (2021) also updates this model by ingesting GOLD thermospheric temperatures.The previous algorithms ingest measurements of the plasma or neutral parameters themselves to improve the model estimations.These measurements in some cases are less plentiful than, for example, global TEC measurements.There have been studies of data assimilation algorithms that estimate the plasma drivers by ingesting global TEC measurements, such as GAIM-FP (Full Physics) model (Scherliess et al., 2006(Scherliess et al., , 2009)).Lomidze and Scherliess (2015) have developed the Thermospheric Wind Assimilation Model (TWAM), that updates the winds from GAIM-FP at low and mid-latitudes.
There are different sources of error in data assimilation systems in general, such as the assumption of unbiased errors, the inaccuracy of the background climate models used, or the errors of the observations.Understanding, modeling, and quantifying the different uncertainties associated with data assimilation algorithms is necessary for the optimal use of the method (Yang et al., 2023).By specifying the observation and model error statistics correctly, the states obtained from data assimilation algorithms can be further improved to capture atmospheric variabilities (Waller et al., 2016).
Errors of methods focused on studying the IT have been analyzed for specific assimilation methods.Scherliess et al. (2006) have studied the model error of the GAIM-FP model.Durazo et al. (2021) and Koshin et al. (2019) study the model systematic bias of the TIEGCM algorithm and of a four-dimensional local ensemble transform Kalman filter (4D-LETKF) respectively.Calibration of the background climate models was analyzed by Mehta and Linares (2018).A method to estimate and account for the model error was also developed by Mitchell and Houtekamer (2000).
Different studies to assess how well data assimilation algorithms work have been conducted.Vlasenko et al. (2014) analyzes the impacts of model, background and observation error produced by a data assimilation algorithm using a variational error estimation method (Parmuzin et al., 2006).Gillijns and De Moor (2007) and the references within analyzed a method to determine the systematic model bias in ensemble Kalman filters.
One of the main challenges in data assimilation is to characterize the errors from discrepancies between observations and the physical model used in the data assimilation algorithm to describe them (Laloyaux & Bonavita, 2020).This error includes the representation error, which describes the error of using a discrete physical model that is not able to represent all the scales (Lorenc, 1981) and all the physical processes and dynamics of a system (Janjić et al., 2018).Janjić et al. (2018) describes three main sources for the representation error: errors due to different scales represented in the observations and the model, due to the observation operator used to convert measurements into state space, and due to quality control or pre-processing of observations.Specifically, Janjić et al. (2018) defines pre-processing error as distinct from measurement errors by being due to obtaining derived quantities from the instrument measurement, such as retrieval of atmospheric variables from satellite radiance.
The goal of this study is to define a method to, and then quantify, the representation error of an IT data assimilation algorithm that estimates electric potentials and neutral winds globally from density rates.We focus on the algorithm EMPIRE (Estimating Model Parameters from Ionospheric Reverse Engineering) (Datta-Barua et al., 2009), which estimates the ionospheric convective drivers of neutral wind and electric potential by ingesting global electron density.Input densities have typically been derived from another data assimilation algorithm: IDA4D (Ionospheric Data Assimilation Four-Dimensional (Bust et al., 2007)), which ingests primarily GNSS TEC measurements.EMPIRE separates the estimation of the drivers from the estimation of plasma density, allowing EMPIRE to have a more robustly conditioned observation operator.In previous work (Datta-Barua et al., 2013, 2014;Miladinovich et al., 2016Miladinovich et al., , 2020)), the drivers have been estimated and studied during different geomagnetic events, but they have not focused directly on an assessment of the error of the estimations.López Rubio and Datta-Barua (2022) transitioned from using power series-based neutral wind basis functions to vector spherical harmonics.In that work, the representation errors were not considered, and the model error covariances were pre-processed and fixed as constants.The goals of the analysis in this work are to incorporate representation error in the EMPIRE filter and to give context to future conclusions made from interpreting the outputs.
The governing equation in EMPIRE is the ion continuity equation of ionized atomic oxygen (O + ), as this ionized species primarily dominates the F layer ionosphere in which plasma density peaks.The EMPIRE observation equation is a simplification of the behavior that excludes chemistry, as well as momentum and energy physics of the plasma.This reduction of the physics to the ion continuity equation and finite state basis functions will each introduce a representation error in the algorithm.The objective of this work is to quantify the representation error and then analyze how well EMPIRE, by reducing the observation operator to the ion continuity equation for one species, is able to predict the global ionospheric drivers.
To achieve the objective we use a self-consistent source that supplies both the background model and simulated measurements that EMPIRE assimilates, as well as the validation "truth" with which to compare the resulting estimates.A perfect representation of the ionosphere to use as this source does not exist, but we will use a background model as if it were, and it will be considered as our "truth."The self-consistent physics model that we use is the SAMI3 (Sami3 is Also a Model of the Ionosphere) algorithm (J.Huba et al., 2008).SAMI3 models the ionosphere by solving the continuity and momentum equations of seven different ion species (H + , He + , N + , O + , NO + , N + 2 and O + 2 ) and the energy equations of three of them (H + , He + and O + ).Comparing the results to the chosen self-consistent model, we see how well the EMPIRE simplification of considering only the ion continuity equation captures the dynamics.This comparison and characterization of the representation error will help us understand the strengths and limitations of the data algorithm.The methodology can then be applied in future runs to concurrently estimate a more accurate error covariance.
In Section 2 we show an overview of the EMPIRE algorithm, including the observation equation and Kalman filter setup.Section 3 discusses representation error estimation and algorithm performance analysis for this study.Section 4 describes the synthetic data used to run the EMPIRE algorithm as well as the configuration parameters for EMPIRE.Analysis of the EMPIRE results is shown in Section 5. Section 6 presents the main conclusions.

EMPIRE
In this section, a review of the EMPIRE algorithm developed in previous work is given.EMPIRE reduces ionospheric dynamics to the ion continuity equation of only one element O + by simplifying from the ionospheric flux tube model for electron density (Kirchengast, 1996b) to be due to one species.It ingests O + density rate ∂N ∂t data, assuming charge neutrality with electron density N, and estimates the electric potential field V giving rise to ion drifts ⃗ v ⊥ , and neutral winds ⃗ u that contribute to plasma motion along the field line ⃗ v ‖ .
The source terms of the ions are represented with s prod and s loss (production and loss rate, respectively) and the transport of the ions is represented with the divergence terms.This equation is valid at different altitudes, but we focus on the F layer above 200 km.The subscript "0" describes a quantity that is assumed to be known based on a background model.Terms labeled with a variable "a" have units of number density rate.
The ion velocity is decomposed parallel ‖ and perpendicular ⊥ to the magnetic field directions.The ⃗ E × ⃗ B ion drift contribution to the electron density rate change is assumed to be the dominant contribution in the perpendicular direction, represented with the subscript "⊥," which is a common assumption at F region altitudes.The term is split into a background model and a correction to be estimated a ⊥ = a ⊥,0 + δa ⊥ , where the δ notation indicates the transport correction term to be analyzed through EMPIRE, but need not be small.
Since ⃗ E × ⃗ B drift occurs in the field-perpendicular directions, ion motion parallel to the field is modeled as fieldparallel transport due to neutral wind, as well as transport due to gravity and diffusion: a ‖ = a u,0 + δa u + a grav,0 + a dfsn,0 , where δa u is the EMPIRE neutral wind correction term.The last two effects as well as a prod,0 , a loss,0 are calculated using the model described in Kirchengast (1996b), simplified in Datta-Barua et al. (2009).In recent uses of background models in EMPIRE (López Rubio & Datta-Barua, 2022;Miladinovich et al., 2020), the ion and electron temperatures were provided by the International Reference Ionosphere (IRI), and neutral densities and temperatures are calculated by NRL-MSISE00 (Naval Research Laboratory Mass Spectrometer, Incoherent Scatter, ground-to-Exosphere) model (Picone et al., 2002).The background models for neutral wind and ion drifts are HWM14 (Horizontal Wind Model) (Drob et al., 2015) and Weimer (D. R. Weimer, 2001) respectively.
The plasma density N usually comes from what is considered in the system to be "measurements" (but which may themselves be output from a separate data assimilation method).The electron density rate ∂N ∂t is finite differenced in time from these measurements.The system in Equation 1 can be rearranged into a linear system by expanding the unknown drivers-neutral winds δ ⃗ u and potential field δV-as a multiplication of basis functions and a set of coefficients arrayed in state δx, at the k-th time epoch.Stacking all the grid points' continuity equations we obtain: where a lowercase variable in bold type indicates an array, and uppercase bold type indicates a matrix.At each kth time epoch, column array z k is the difference between electron density rate (dN/dt) k and the sum of all the background terms (a prod,0 + a loss,0 + a dfsn,0 + a grav,0 + a u,0 + a ⊥,0 ) k .The mapping matrix from the state to density rate is represented with H, and δx k is the state of the system formed by the unknown coefficients of the driver corrections (i.e., winds and electric potential) expansions.The details of the mapping matrix H and system state δx k are summarized in Appendix A.
Then EMPIRE calculates the coefficients of the state by solving the overdetermined linear system defined in Equation 2 with a Kalman filter (Miladinovich et al., 2016).The state vector δx k|k 1 is updated using the ingested observations z k : where δx k|k is the post-measurement state estimation with state covariance P k|k , and δx k|k 1 is the premeasurement state estimation with state covariance P k|k−1 .A time update is first applied to the state, assuming a Gauss-Markov state transition: The forecast state at epoch k + 1 is δx k+1 , and δx k is the true state vector at time epoch k.The process noise η k is assumed to be normally distributed with zero mean and covariance Q k .The matrix Φ k is defined as: where I is the identity matrix, τ is a time constant that quantifies how rapidly measurements are "forgotten," and dt is the time step used in the system.The time update state and covariance estimations are given by: The post-measurement state estimation at the k-th time epoch δx k|k is updated by the state transition matrix Φ k .P k+1|k is the forecast covariance matrix of the state at time k + 1, P k|k is the measurement update covariance matrix of the state at time k, and P 0,k|k is the background covariance matrix at time k.The process noise covariance Space Weather 10.1029/2023SW003609 HU ET AL.
Q k is defined so that the time-updated covariance P k+1|k reverts to the background model covariance P 0,k|k in the absence of measurements over time.For the derivation of P 0,k|k , please refer to Appendix A in the publication by López Rubio and Datta-Barua (2022).
The observation error ϵ has historically been defined due to the measurement vector z in Equation 2, which takes into account the difference between ∂N/∂t and a 0 as the measurement vector.Error ϵ is hypothesized to be zero mean and normally distributed with the error covariance R, composed by the R a 0 of those background model terms and the covariance R N of the electron density rate ∂N/∂t.The covariance matrix of the background transport terms, R a 0 , was usually assumed to be a diagonal matrix with the square error of the a 0 term at each grid point.The covariance matrix R N was obtained by propagating the error of the N electron density observations as defined in Miladinovich et al. (2016).
Previously ϵ was purely attributed to provided measurements (i.e., ∂N/∂t and a 0 ).The error source due to linear operator and unresolved state in EMPIRE, termed as "representation error" (Janjić et al., 2018) is missing.Additionally, the climate error information related to time is considered constant, as the values of P 0,k|k for each driver were pre-calculated (López Rubio & Datta-Barua, 2022).In this research, we aim to simultaneously update P 0,k|k to reflect changes over time as EMPIRE completes each iteration k.To improve the error modeling process, the representation error will be formally defined in the following section.

Error Discussion
Observation error is an essential component that needs to be well-defined in the data assimilation algorithm.Other than the measurement error that might be due to the surrounding environment and instruments, Janjić et al. (2018) also noted that because of the representation error, which is associated with unresolved scales and the observation operator, there should be another error term in the modeling of observation error covariance R. Figure 1 conceptually explains the relationship between the real physical system H(x) ("Real world"), linear approximations H x ("Model world") and corresponding measurements y in the observation space.The error between true and modeled variables is the representation error ϵ rep with error covariance R rep .The measurement error between the truth and measured quantity y is denoted as ϵ msmt with error covariance R msmt .The sum of these two error sources is the observation error ϵ in data assimilation with covariance R, as in Equations 9 and 10.  2011).The real world is described by true state x transformed by the relationship H, modeled variables are the estimated state vector x mapped by the linearized transformation matrix H, and y is the measurement vector.The difference between the true and modeled world are represented in error ϵ rep (black).The difference between the true world and the measurement is measurement error ϵ msmt (green).The sum of these errors is the observation error ϵ (orange).
In previous EMPIRE implementation, we neglected the representation error, so ϵ rep and R rep were zero.In the present study, we consider both the representation error ϵ msmt and the measurement error ϵ rep in the EMPIRE framework.These errors are assumed to follow a normal distribution and have error covariances R msmt and R rep , respectively.
The representation error ϵ rep distributed with R rep in the present study will be analyzed numerically.Let δx be the true state, which may be infinitely long.The unresolved scales arise from truncating δx to be discrete and finite δx.
Let H be the "true" relationship to the true state, which may be a non-linear function or have unknown physics.
The matrix H is a linearization of the known relationships in H that by definition does not include unknown physics and may even leave some known physics unmodeled.In EMPIRE, the linearization of physics model and the assumption of a single ion species of O + in EMPIRE algorithm setup should also be attributed to the observation error ϵ.Specifically, the Kirchengast (1996b) model is used to specify the production, loss, gravity and diffusion effects (a loss,0 , a prod,0 , ⃗ v grav,0 and ⃗ v dfsn,0 ) in EMPIRE.The representation error ϵ rep that we want to characterize for the data assimilation algorithm EMPIRE comes from the discrepancy between the true observation H(δx) and an estimate Hδx (see Figure 1) from the ion continuity equation: In general use, we cannot know H(δx) exactly.However, in this work, we create a simulated truth to approximate the representation error due to the observation operator and unresolved states, and we treat the pseudo-observation z as the "truth" H(δx).Let us assume the error ϵ rep is normally distributed with representation error covariance R rep : where the estimate of δx is calculated with the Moore-Penrose inverse denoted with the super-scripted dagger.
Since the electron density rates are assumed to be exact from a self-consistent source, the electron density rate error covariance matrix is considered to be zero.
Independently from the setup of observation error covariance R, the P 0,k|k (López Rubio & Datta-Barua, 2022) was constructed with pre-processed (i.e., before the EMPIRE Kalman filtering) background model variance thereby losing any time-dependent variation.In this study, the model covariance P 0,k|k for this study is reformulated in Appendix B, concurrently updating the model information to the EMPIRE system.
In the analysis of how EMPIRE performs, after numerically quantifying ϵ rep , R rep and P 0,k|k as inputs to the EMPIRE Kalman filter, the true values are compared with the ion drifts and neutral wind estimated from δx k|k .
For an individual component in field-perpendicular zonal and meridional, and field-aligned directions v i = {v ⊥,z , v ⊥,m , u ‖ }, our "truth" is known and the residual in the velocity space ϵ v i can be characterized by comparing the estimates vi to the original values that are self-consistent with the data that were ingested: where ϵ v i is the analysis error of each individual component v i .For ExB drifts (v i = {v ⊥,z ,v ⊥,m ) , which are projected from ion drifts in geomagnetic radial (v ⊥,r ), co-latitudinal (v ⊥,θ ) and longitudinal (v ⊥,ϕ ) directions, and vi is the sum of EMPIRE correction δv i and model calculation v i,0 .For neutral wind v i = {u ‖ }, the analysis error ϵ vi is the EMPIRE correction alone because HWM14 neutral wind serves as the background model inputs and the validation truth,and is projected onto the field-parallel direction.

Method
The goal of this study is to quantify the representation error in density rate space as Equation 12, in which H is the mapping function from coefficients to time-differenced density dN dt , and z is the gap between measurements and modeled electron density rates.Another goal is to analyze the performance in velocity space ϵ v i in Equation 15 in response to different geomagnetic activity levels, for a quiet day and a storm day.To do this we use a selfconsistent source to provide the measurements N, and background a 0 .The self-consistent source used to estimate the representation error ϵ rep of EMPIRE in this study is SAMI3.The magnetic field is provided by the IGRF (International Geomagnetic Reference Field) model (J.D. Huba et al., 2008) and the electric field is solved with a potential solver derived from current conservation (Huba et al., 2010).The ion drift is calculated self-consistently with the solved perpendicular electric field in the low to mid-latitude region of the ionosphere.At high latitudes, SAMI3 uses the D. Weimer (2005) model.The neutral winds are obtained from HWM14 (Horizontal Wind Model) (Drob et al., 2015), including DWM (Disturbance Wind Model) when the Kp index reaches a certain threshold.The neutral properties are characterized by the NRL-MSISE00 model (Picone et al., 2002).The continuity and momentum equations of seven different ion species (H + , He + , N + , O + , NO + , N 2 and O + 2 ) and the energy equations of three of them (H + , He + and O + ) are used to represent the ionosphere.In summary, SAMI3 is driven by Weimer, HWM14 and NRLMSISE00 just as EMPIRE's background is.
A case that is primarily a quiet day is selected for this study, 25 August 2018, and the subsequent day is selected as a storm day. Figure 2   SAMI3 values of the electron density N at altitude range from 100 to 3,000 km will be used as the pseudomeasurements in the data assimilation algorithm.The electron density N from SAMI3 is then finite differenced to obtain the electron density rate ∂N/∂t, that EMPIRE ingests.Also, in EMPIRE we use the same background models that SAMI3 uses, to make all the sources that EMPIRE ingests consistent with SAMI3.The IGRF-11 model (Finlay et al., 2010) is used to describe the magnetic field ⃗ B 0 .The neutral wind is modeled by the updated Horizontal Wind Model (HWM) (Drob et al., 2015).The neutral properties will be given by the NRL-MSISE00 model as in SAMI3.Also, the electron and ion temperature are provided by the IRI model, instead of by the SAMI3 output values, as they were not available for the chosen cases.The temperatures are inputs to the calculation of collision frequency, diffusion coefficient and loss term in the ion continuity Equation 1 of the Kirchengast model (Datta-Barua et al., 2009;Miladinovich et al., 2016).The impact of ion temperature variations is negligible to calculations of the loss term (Datta-Barua et al., 2009;Kirchengast, 1996a).It is shown in Appendix C that the diffusion coefficient and collision frequency are not very sensitive to a change in these temperatures, so using IRI instead of SAMI3 will not change the EMPIRE estimation.
A block diagram of how SAMI3 is fed in and subsequently compared to EMPIRE is shown in Figure 3.We run EMPIRE with the newly derived observation covariance R as defined in Equation 10, with R rep formulated by treating the difference between SAMI3 electron density rates and background modeled values as z in Equation 13.The R msmt is only due to the error source of background models R a0 .Large uncertainties are expected for model outputs during storms, thus R a0 in Equation 14 is assumed to be 70% of the sum of background modeled values a 0 .
Note that R a0 is assumed to be a diagonal matrix in this study.We acknowledge that ionospheric states are highly correlated, such as strongly divergence-free flow at high latitudes, spatially correlated neutral densities and temperature, and ionization and recombination errors.In a realistic setting, R a0 should be constructed with off- diagonal terms indicating the spatial and temporal correlations between states.One reason for our simplification is that the background models only yield global values at each time epoch, without providing information on the standard deviation and correlation.Another reason to assume that R a0 is a diagonal matrix in this work is to reduce the computational cost of matrix inversion.However, the setup of diagonal error matrices in a Kalman gain calculation can lead to biased weight on the innovation vector (z k Hδx k|k 1 ) in Equation 3, and loss of information in state correlation, which yields less optimal estimates of posterior state δx k|k .
After adjusting the error covariance, the correction state δx is estimated by the Kalman filter, and EMPIRE estimated ion drifts are compared with the ion drifts from the sum of SAMI3 and Weimer for validation purposes, since Weimer supplies the high-latitude electric potentials and SAMI3 is coupled to electrodynamics in the lowto-mid latitudes.The estimated neutral wind correction F u δx u (López Rubio & Datta-Barua, 2022) is compared with u produced by HWM14 model since SAMI3 is driven by HWM14.
The difference ϵ vi between the EMPIRE-estimated drivers and the SAMI3 values will be computed using Equation 15to assess mean and standard deviation of the representation error in velocity space of EMPIRE.
A global analysis is performed using a grid of magnetic co-latitude θ between 3 and 177°with a resolution of 6°, magnetic longitude ϕ between −180 and 180°with a step of 6°and altitude h between 200 and 500 km with a step of 50 km.The grid has 12,810 points at each time step.The analysis period spans 48 hr starting at 0 UT on 25 August 2018 at 15-min increments.
Once the filter is configured as described, the state is estimated with EMPIRE at each time step using Equations 3 and 7. Then the estimated ion drifts and neutral winds are obtained at every grid point for the desired period of time.
To assess the effects of representation error on the ion drifts, we compare our results to the SAMI3 drift output following Equation 15 in the perpendicular-to-the-magnetic-field zonal and meridional component directions.
The zonal mean of the difference between the background (Weimer) plus EMPIRE estimate and SAMI3 values of the two components, perpendicular meridional εv ⊥m and zonal εv ⊥z to the magnetic field line direction, will be shown over time and magnetic latitude at an altitude of 300 km.
where ϵ v i is the representation error in velocity space defined as each element of in Equation 15, the magnetic longitude step is 6°and the number of magnetic longitudes used in the EMPIRE grid is n ϕ = 61.
For the neutral wind estimation, the parallel-to-the-field neutral wind is decomposed into geographic east and north.In this work the geographic east transport due to neutral wind is assumed 0 (i.e., not corrected) because in general field lines are oriented largely in the meridional plane.To study the neutral wind representation error we will compare our results to the HWM14 model, as it is the neutral wind input that SAMI3 uses, also following Equation 15.The geographic meridional direction is analyzed using the zonal mean difference between the estimation and the values from HWM14 that drove SAMI3, of the component, which will be projected onto the field-parallel direction εu‖ as EMPIRE only observes neutral wind parallel to the magnetic field lines.They will also be calculated at the altitude of 300 km.The zonally averaged difference for all of the terms will be calculated at each time and magnetic colatitude using Equation 17.
The mean and standard deviation of the difference between the estimation of each of the components of the drivers and the model SAMI3 will also be calculated to assess the representation error in the EMPIRE velocity space, which will be easier to interpret than ϵ rep in density rate space.They will be calculated over magnetic latitude by taking the zonal mean μ ϵ v i (θ) and the standard deviation σ ϵ v i (θ) over longitude and time:
where n times is the number of time steps.During quiet time n times = 72, t 0 = 0 UT on 25 Aug, and t f = 18 UT on 25 Aug.For the storm period, the error statistics are calculated similarily, with t 0 = 18 UT August 25th, t f = 24 UT August 26th, giving n times = 120.

Results
To characterize the representation error as projected into velocity space ϵ v i , we study the difference between EMPIRE and SAMI3 outputs over time over the whole global region, and we also study the dependence of the results on latitude.
Figure 4 shows the zonally averaged ion drift difference εv i of Equation 17 over time in the (a) perpendicular zonal direction and in the (b) perpendicular meridional direction between EMPIRE results and SAMI3 "true" values.
The perpendicular zonal direction coincides with the magnetic φ direction.The parallel ion velocity is not shown, as the ion drift is only seen in the field-perpendicular field direction.We consider the quiet time interval as lasting until 18 UT.
In the perpendicular zonal direction, Figure 4a, the error is small at low-to-mid latitudes up to around ±60°m agnetic.The magnitude of εvi (θ; t) increases dramatically poleward 60°, which indicates that EMPIRE does not provide as accurate zonal ion drift information in the high latitude bands.The estimation in the perpendicular meridional direction, shown in Figure 4b, is close to the model SAMI3 values, as the error is close to 0 m/s in almost all regions except at boundary latitude of ±87°.We can see a higher disagreement saturating at the boundary latitudes of ±87°over time.Figure 5 plots the relative errors μ ϵvi (θ) on a log 10 scale at each latitude band in the (a) perpendicular zonal and (b) perpendicular meridional directions.The blue and red lines represent quiet time and storm time error statistics, respectively.The dashed line at 0 indicates errors that are 100% of the reference truth.As hinted at in Figure 4, the relative errors at the boundary latitudes of ±87°are clearly off by orders of magnitude.This is likely a boundary effect of mapping latitudes to L-shell in the spherical harmonic fitting.The values at the 87-degree latitude will be removed in the subsequent plots to maintain a legible plotting scale.
The time-averaged mean error μ ϵvi (θ) with standard deviation σ ϵvi (θ) as error bars are shown in two directions, Figure 5c field-perpendicular zonal and Figure 5d field-perpendicular meridional.Figure 5c plots zonal drift errors at two scales: for the latitude region within ±60°, the mean errors calculated for quiet time and storm time are bounded within the magnitude range of 20 m/s.For the high latitude region beyond ±60°, the mean zonal errors exceed 50 m/s.Also, the storm time variance is larger than the quiet time.These trends also hold in the perpendicular meridional direction (5d) but absolute speeds are smaller overall.While the absolute errors appear larger in Figures 5c and 5d at the northern high latitudes than the southern, Figures 5a and 5b show that the relative errors are comparable in both hemispheres.
From Figure 5 it is evident that EMPIRE corrections in the region where the magnetic latitude exceeds ±60°d eviate more from SAMI3 outputs for both quiet and storm periods.For a closer examination of high latitudes, we show in Figure 6    In both Figures 5 and 6, significant errors at high latitudes are observed in the ExB drifts and electric potential when compared to the EMPIRE corrections.A number of potential factors could influence these discrepancies in high-latitude estimations: The expansion used to estimate the potential field, described in Miladinovich et al. (2020), models the electric potential as constant along a dipole magnetic field line.This hypothesis becomes inaccurate around geomagnetic latitudes of 60-70°, when L-shell reaches values of 10 (Miladinovich et al., 2020).Also, the magnetic field lines at high latitudes are "open," not represented in the dipole model.The increasing error in the EMPIRE estimations with respect to the model SAMI3 around these magnetic latitudes may be due to this hypothesis not being true in these locations.Another consideration is the grid mismatch between EMPIRE and SAMI3 at high latitudes.SAMI3 runs on the non-orthogonal field-aligned magnetic apex coordinate extending to 87 degrees of magnetic latitude (Laundal & Richmond, 2017;A. D. Richmond, 1995), and the outputs are transformed onto geographic coordinates with 2.5-degree resolution.Then the outputs are interpolated and ingested by EMPIRE in a uniform horizontal resolution of 6°in geomagnetic coordinates.This disparity in resolutions may also contribute to larger errors observed near the magnetic poles.Another contributor might be due to the fact that D. R. Weimer (2001) is the background electric potential model in EMPIRE but D. Weimer (2005) drives SAMI3, but this is likely to be small compared to the other factors.
EMPIRE estimates the neutral wind contribution to the electron density change simultaneously with the ion drifts.
Figure 7a shows the zonally averaged error of the field-parallel meridional neutral wind in comparison with the model HWM14 values, εu from Equation 17, over time.Figure 7b shows the mean error μ ϵu‖ and standard deviation σ ϵu‖ of EMPIRE corrected field-parallel neutral wind over latitudes over quiet time (blue) and storm time (red).From Figure 7a, EMPIRE overestimates the neutral wind in the northern hemisphere, and yields opposite results for the southern hemisphere.A hemispheric bias in the quiet time is seen in Figure 7b, as the time-averaged mean error peaks are equal but opposite at ±40°.The mean error line is shaped like a sinusoidal wave, and converges to 0 approaching the magnetic poles during quiet time.The storm time-averaged neutral wind mean error yields different behaviors in the southern hemisphere compared to the quiet time.We find this is because the measurement vector z fed into EMPIRE during storm time deviates from quiet time.Ion continuity is a scalar equation at each point, which means that total transport is not inherently directionally decomposed.The decomposition and assumptions on motion due to the ExB drifts and neutral wind are made in our formulation of EMPIRE.Near the magnetic poles, the horizontal wind is largely perpendicular to the magnetic field-lines, thus no component from EMPIRE corrections to horizontal wind modeling is projected onto the field-parallel direction.However, in the low-to-mid latitude region, the EMPIRE calculations of neutral wind are highly impacted by the vector projection, and coordinate transformations might be contributing to the error, as EMPIRE only corrects the parallel transport in the geographic north direction.It is also possible that, because the correlation information is neglected in favor of diagonal covariance matrices in the interest of algorithmic simplicity, the Kalman filter yields biased results.
In this investigation, we analyzed the representation error for a quiet day followed by a storm day.As the Moore-Penrose inverse operator in representation error covariance calculation Equation 13is the least squares fitting process, various representation error values may be expected under different conditions.These variations can be due to factors such as solar activity levels and seasonal effects, introducing noise and changing the patterns in z that are then used in the fitting process.For example, the sudden change in zonally averaged differences at 00 UT August 26th in Figures 4 and 7a also indicates notable daily change in estimations from day-to-day.This is due to the inputs to measurements and background modeled values from one day to the next.SAMI3 is driven by daily F10.7 indices, and the HWM is driven by daily F10.7 and Ap indices.Thus, the day-to-day, especially quiet-tostorm, change produces large input gaps, which leads to a sudden change in EMPIRE estimates.A comprehensive analysis of multiple periods under diverse conditions is beyond the scope of this work, but would be necessary to reveal the relationship between representation error and the other possible influencing factors.
Other limits on extrapolating conclusions from this study can be attributed to factors such as the model assumptions and simplification of physical processes.As a data assimilation algorithm based on the Kalman filter, EMPIRE is designed to analyze ionospheric drivers separately from electron density data associated with a single ion species of atomic oxygen.We also neglected spatial correlation information by diagonalizing the error covariance matrices.Furthermore, since EMPIRE corrects the geographic meridional wind, errors can arise from the Kalman filter setup representing the ion continuity equation.This is because the electron density rate is a scalar quantity and does not differentiate between the ExB effect in magnetic coordinates and the neutral wind effects in geographic coordinates.
Despite these challenges, we find the functionality of EMPIRE in calculating ion drifts and neutral winds from O + rate information quite promising at low-to-mid latitudes.Other assimilative methods, such as the ensemble Kalman filter or particle filter, may resolve the issue of specifying error covariance matrices in calculations but could come at the cost of reduced computational efficiency.Additionally, the state vector's length [90 × 1] for neutral wind and ion drift is not overly large for the Kalman filter to handle.Moreover, we believe that the Kalman filter's performance can be further enhanced in the future by: (a) re-scaling the error information at high latitudes compared to low-to-mid latitudes, (b) incorporating spatial correlation information between various states, and (c) revising the mapping matrices to explicitly project the transport onto field-perpendicular and field-parallel directions.These enhancements and considerations have the potential to mitigate some of the limitations found in our current approach.

Conclusions
The goal of this study was to analyze the uncertainties of the EMPIRE data assimilation method: how precise the model is in estimating the ionosphere with just the ion continuity equation from O + , compared to a more complete ionospheric model, SAMI3.We first quantified the representation error in EMPIRE estimation of ionospheric drivers (neutral winds and ion drifts) during quiet and storm time, formulated in Section 3 and Appendix B. Then EMPIRE ingested electron densities provided from SAMI3, which is the self-consistent model used for this analysis as well as the observation source and then the truth comparison.
We investigated the ionosphere behavior during 25-26 August 2018, so that all the inputs in EMPIRE were selfconsistent with SAMI3.Latitude-dependent error statistics in response to geomagnetic activity level were shown.The ion drift estimations from were compared to SAMI3 ion drifts, as the estimated results were expected to be close to the modeled values.We showed that EMPIRE is able to estimate the ion drifts with smaller
errors at low and mid geomagnetic latitudes.In the perpendicular-zonal direction, storm time ion drift errors are found to be limited to a magnitude of 50 m/s, while during quiet times, they remain below 25 m/s in the low-tomid latitude region.In the perpendicular-meridional direction, both storm and quiet time ion drift errors are confined within a range of ±10 m/s.As anticipated, it is noted that the variance of storm time errors is greater than that of quiet times.The simplification to use the ion continuity equation only is good enough at the low-to-mid latitude region, as the main component of the representation error is due to the fitting done to model the potential field with basis functions.The field-parallel neutral wind estimation have lower errors at high and low latitudes, peaking at mid-latitudes.At the location of ±40°N, the EMPIRE neutral wind mean error reaches the maximum magnitudes and opposite signs during quiet times.In the southern hemisphere, the time-averaged neutral wind error in the field-parallel direction during a storm exceeds that observed during quiet times.
where δ ⃗ u is expanded in r, θ, φ) , that are the unit vectors of a magnetic spherical coordinate system.Function Y m l has been defined in Equation A4 which includes the harmonic term Φ m l that contains the unknown coefficients δx u , and the r component is negligible.

Appendix B: Background Model Covariance Setup in the Kalman Filter
In EMPIRE, we estimate the ionospheric drivers of electric potential and neutral wind.The formation of counterpart background model covariance P 0,k|k appearing in Equation 8 describes the uncertainty of background state vector for these two drivers.Here we derive the background model covariance P 0,k|k for electric potential; the formation for neutral wind is analogous.
The Weimer model yields ExB drifts in magnetic radial, latitudinal, and longitudinal directions as v ⊥0,r , v ⊥0,θ and v ⊥0,ϕ at n grid points respectively.The background electric potential state vector xV,0 for degree l = 6 is estimated in Equation B1, which is the Weimer-modeled ExB velocity v ⊥0 linearly transformed by the Moore-Penrose inverse operator of ExB velocity mapping matrix F † ⊥ .The corresponding mapping matrices are F ⊥r , F ⊥θ and F ⊥ϕ . xV,0 Matrix dimensions are given under each array.The corresponding background state covariance P xV is: where P v ⊥0 is the background velocity error covariance matrix.
Assuming the background model values for each nth grid point is linearly independent, P v ⊥0 is a diagonal variance matrix.The background ExB velocity error variances in r, θ, ϕ direction are σ 2 v ⊥0,r ,σ 2 v ⊥0,θ ,σ 2 v ⊥0,ϕ correspondingly, and each diagonal matrix has dimension of [n × n]: and 0 is a square matrix of zeros of dimension [n × n].
For each ith direction i = {"r", "θ″, "ϕ"}, the background velocity error variance matrix diag σ 2 v ⊥ 0,i ) is composed of fitting error and the background climate model error: The fitting error of the mapping matrix is defined as the standard deviation of the difference between the estimated mapping value F ⊥i xV,0 and the background values v ⊥0,i over the global map: Since the correction to ExB velocity is estimated, the background model errors are computed at each iteration of the process update, and they are assumed to be 70% of modeled values: Plugging Equations B5 and B6 into Equation B4, Equation B3 is formed and substituted into Equation B2.
Similarly, the derivation of P xu for the neutral wind corrections δx u is defined by replacing the mapping matrix F ⊥ in Equations B1-B6 with the mapping matrix F u (López Rubio & Datta-Barua, 2022) that maps the neutral wind corrections onto the geographic east and north directions.
By the linear transformation in Equation B2, the electric potential background state covariance P xV and neutral wind background state covariance P xu are each obtained as a square matrix in P 0,k|k at the k-th time epoch, and 0 are the rectangular matrices that fill the gaps. (B7) Note that the background model covariance P 0,k|k is assumed to be a block diagonal matrix at each time epoch, and the correlation between neutral wind and ion drifts are neglected in the simplified setup.However in a more realistic setting, the off-diagonal terms in P 0,k|k should be non-zero to improve the driver estimations.

Appendix C: EMPIRE Sensitivity to the Change of Ion and Electron Temperature in the Ion Contiuity Equation
A sensitivity analysis was conducted by examining the dependence of collision frequency on ion temperature, and of diffusion coefficient on both ion and electron temperatures.
Assuming the only ion species is O + , the field-parallel speed v ‖ is simplified as in Equation C1 (Datta-Barua et al., 2009), based on Kirchengast model (Kirchengast, 1996b), with drivers of neutral wind u ‖ , gravitational effect g ‖ ν O + and the diffusion term along the field-line.
where g ‖ is gravity, and ν O + is the O + O collision frequency.The diffusion coefficient D is formulated by the combination of Boltzmann's constant k B , ion and electron temperatures (T i + T e ), O + mass m O + , ion-neutral collision adjustment term Δ in (assumed 0 in EMPIRE).The gradient of electron density N along the magnetic field line is denoted as ∇ ‖ N.
The collision frequency ν O + depends on ion temperatures as: where C 1 , C 2 , C 3 , C 4 , C 5 are constant coefficients (see Table 1 of Datta-Barua et al. ( 2009)), T n is the neutral temperature, and the neutral density of oxygen, and for oxygen and nitrogen molecules are denoted by N O , N O 2 and N N 2 correspondingly.
The partial derivative of collision frequency with respect to ion temperature is: The partial derivative of diffusion coefficient with respect to ion temperature is: The partial derivative of diffusion coefficient with respect to electron temperature is presented at Eq. C5.
To show the percent change due to temperature variations we let ΔT i = ΔT e = 100 Kand approximate with a first order Taylor expansion as: The fractional changes Δν O + / ν O + and ΔD/D for each temperature input to the EMPIRE horizontal grid at 300 km are plotted in Figures C1a and C1b, respectively.The 5% of the variations in response to temperature changes of 100 K indicate that the ion continuity equation is not significantly affected by varying ion and electron temperature inputs.Hence, the substitution of International Reference Ionosphere (IRI) temperature values, instead of SAMI3, which would have been self-consistent with the truth reference but were not saved, is not expected to have a substantial effect on EMPIRE estimations.

Data Availability Statement
The authors thank the Data Analysis Center for Geomagnetism and Space Magnetism of Kyoto University for maintaining the World Data Center for Geomagnetism in Kyoto.The Ap index and provisional Dst index data shown in Figure 2 can be downloaded at their website (World Data Center for Geomagnetism et al., 2018).The proton density, solar wind speed and magnetic field in GSM coordinates are available at ACE Science

Figure 1 .
Figure 1.Conceptual relationship among the real world, model world, and the corresponding observations.Photo images are retrieved from OSXDaily (2011).The real world is described by true state x transformed by the relationship H, modeled variables are the estimated state vector x mapped by the linearized transformation matrix H, and y is the measurement vector.The difference between the true and modeled world are represented in error ϵ rep(black).The difference between the true world and the measurement is measurement error ϵ msmt (green).The sum of these errors is the observation error ϵ (orange).
plots (a) the Ap and Dst indices, (b) the solar wind density and speed, and (c) Interplanetary Magnetic Field (IMF) By and Bz components for these dates.Given the classification by Palacios et al. (2018), a quiet-minor event corresponds to an Ap index between 0 and 20 nT while indices from 30 to 50 nT corresponds to a moderate storm.The Ap index reaches values higher than 20 nT between 17 UT and 18 UT, and afterward the index starts to increase, indicating that the geomagnetic storm starts to develop.Figure2also indicates that 18 UT

Figure 2 .
Figure 2. Solar wind and geomagnetic conditions 25-26 August 2018.(a) Ap and Dst indices in units of nT, represented by red and blue lines, respectively.(b) Proton density in units of cm 3 and solar wind speed in km/s, shown by red and blue lines, respectively.(c) Interplanetary magnetic field y and z components in Geocentric Solar Magnetospheric (GSM) coordinate, in units of nT.The vertical green line at 18 UT August 25th separates the quiet period from the storm period being investigated.The horizontal dashed line indicates 0 nT in (a) and (c).

Figure 3 .
Figure 3. Block diagram of EMPIRE representation error quantification relative to SAMI3 as the self-consistent truth source.The electron density rate ∂N/∂t is obtained from the electron density N from the SAMI3 model, and the time-differencing error is ϵ ∂N/∂t , counted as one component of measurement error.Another measurement error ϵ a 0 is due to the background modeled term a 0 , which is modeled by Weimer, HWM14, IRI and MSIS.The gap between ∂N/∂t and a 0 is treated as the measurement vector.The representation error ϵ rep is analyzed as an error source in observation error ϵ.Errors are color-coded to correspond to their roles in Figure1.

Figure 4 .
Figure 4. Zonally averaged differences between SAMI3 and EMPIRE correction of ion drifts in the (a) field-perpendicular zonal and (b) field-perpendicular meridional directions.The red vertical line separates the quiet and storm periods: 00 UT to 18 UT August 25th is the quiet period, and 18 UT August 25th to 00 UT August 27th is the storm period.

Figure 5 .
Figure 5.The time averaged ion drift relative errors on a log base-10 scale versus geomagnetic latitude are presented in (a) field-perpendicular zonal direction and (b) field-perpendicular meridional direction.The horizontal dashed line in (a) and (b) indicate that the relative error is 100%.The time-averaged error μ ϵvi (θ) and standard deviation σ ϵvi (θ) between SAMI3 and EMPIRE corrected ion drifts versus magnetic latitude are plotted in the field-perpendicular (c) zonal and (d) meridional directions.The horizontal dashed line indicates 0 velocity components.The figure (c) inset plot zooms in on the mean error at the low-to-mid latitudes.For all of the subplots, the vertical black line is the 0°latitude line.Quiet and storm time error statistics are plotted with blue and red lines, respectively.

Figure 6 .
Figure 6.The northern hemisphere electric potential contour maps are plotted at 8 UT,12 UT,16 UT on August 25th (left) and 26th (right) in geographic coordinates.The EMPIRE-corrected outputs (i.e., EMPIRE + WEIMER) are plotted with solid color contours, and SAMI3 electric potentials driven by WEIMER are plotted with dashed color contours.The gray circle indicates our EMPIRE plotting limit of 81°magnetic.The dusk and dawn sides are denoted on the left and right of circle correspondingly.The colorbar indicates kilo-Volts.

Figure 7 .
Figure 7. (a) Zonally averaged difference of neutral wind in the field-parallel direction from EMPIRE corrections ϵ u‖ (θ,t) across analysis time.The red vertical line separates the quiet and storm periods.(b) Time-averaged EMPIRE field-parallel neutral wind mean errors μ ϵu‖ and standard deviations σ ϵu‖ as a function of magnetic latitude.

Figure C1 .
Figure C1.Change of collision frequency ν O + and diffusion coefficient D with respect to ion temperature T i and electron temperature T e .(a) The variation of collision frequency Δν O + over ν O + [%] with respect to T i .(b) The variation of diffusion coefficient ΔD over D [%] with respect to T i + T e .Each dot in the subplots represents a data pair for the EMPIRE horizontal map at altitude of 300 km.The x-axes is the temperature in the unit of Kelvin, and y-axes is the variation percentage.
Positive values, in yellow, indicate EMPIRE + Weimer model is smaller than SAMI3+Weimer modeled values.Negative values in blue indicate overestimation by EMPIRE compared to the SAMI3 values.