Challenges in Specifying and Predicting Space Weather

Physics‐based Data Assimilation (DA) has been shown to be a powerful technique for specifying and predicting space weather. However, it is also known that different data assimilation models simulating the same geophysical event can display different space weather features even if the same data are assimilated. In this study, we used our Multimodel Ensemble Prediction System (MEPS) of DA models to elucidate the similarities and differences in the individual DA model reconstructions of the mid‐low latitude ionosphere when the same data are assimilated. Ensemble model averages were also obtained. For this ensemble modeling study, we selected the quiet/storm period of 16 and 17 March 2013 (equinox, solar medium). Five data assimilation models and one physics‐based model were used to produce an ensemble mean output for Total Electron Content (TEC), ionospheric peak density (NmF2), and ionospheric peak height (hmF2) for latitudes less than 60° and all longitudes. The data assimilated included ground‐based Global Positioning Satellite TEC and topside plasma densities near 800 km altitude derived from the COSMIC (Constellation Observing System for Meteorology, Ionosphere, and Climate) satellites. Both a simple average and a weighted average of the models were used in the ensemble averaging in order to determine if there was an improvement of the ensemble averages over the individual models.


Introduction
Space weather can have adverse effects on a wide range of human systems and operations, and the ionosphere and upper atmosphere play an important role in space weather disturbances. Space weather can adversely affect high frequency (HF) communications, over-the horizon (OTH) radars, electric power grids, deep-water pipelines, satellite charging, surveying and navigation systems, surveillance, and the Federal Aviation Administration (FAA) Wide Area Augmentation System (WAAS). During the last decade, significant progress has been made with regard to understanding, specifying, and predicting space weather, but many challenges still remain. With regard to space weather modeling, several approaches are being pursued, including those involving empirical, physics-based, data-driven, and data assimilation models, and all of the approaches have been useful to some degree. However, if there are sufficient measurements and multiple data types, the data assimilation modeling approach is expected to be the most reliable for specifying and predicting space weather. Unfortunately, as was found by the meteorologists and oceanographers, different data assimilation models used to describe the same weather condition can yield different results even if the same data are assimilated. The differences can arise for numerous reasons, including the use of "different" background weather models, spatial/temporal resolutions, assimilation methods, and data error analyses. Therefore, ensemble modeling moved to the forefront in both meteorology and oceanography.
The Multimodel Ensemble Prediction System (MEPS) was created by a collaborative effort of Utah State University and the Jet Propulsion Laboratory/Caltech/University of Southern California. The primary MEPS goal is to develop an ensemble modeling approach that will provide a framework for improving space weather specifications and forecasting as well as for studying important science issues. Currently, MEPS contains seven data assimilation models that cover the ionosphere (D, E, F regions and topside), thermosphere, plasmasphere, polar wind, and global electrodynamics. The individual MEPS models have been used in numerous studies during the last two decades (see some relevant references in section 2).
With regard to ensemble modeling, our initial effort involved the four GAIM models listed in section 2. For the same space weather event, the four models were run separately the way they normally are run, with different data types and amounts (Schunk et al., 2014a(Schunk et al., , 2014b. Overall, the GAIM reconstructions were ©2020. The Authors. This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited. similar, but there were also significant differences. Nevertheless, this first step established the state of the art. In a subsequent study (Schunk et al., 2016), the four GAIM models were run for a storm period with exactly the same data. We also reran the GAIM models successively with the addition of new data types in order to see how the different data types affected the GAIM reconstructions. In general, the models exhibited more similarities and less differences when the same data were used in the GAIM models.
In this study, MEPS simulations were conducted for the quiet/storm period of 16 and 17 March 2013 (equinox and solar medium). The first day was quiet (Kp~2), but a persistent storm generated by a Coronal Mass Ejection occurred on the second day (Kp~6). Six MEPS models were used and the outputs from the six models were compared to each other and to an ensemble average of the six results. The MEPS models are listed in section 2 along with some relevant publications. The data sources assimilated by the MEPS models are described in section 3. Section 4 shows the reconstructions from the individual data assimilation models and some ensemble average results. The final section (section 5) provides a summary of this MEPS study.
Some of the differences between these models pertain to the background models employed, the data assimilation method applied, the form of the equations solved, and the source terms included. These differences entail more than half a century of work, and therefore are beyond the scope of this paper.
With respect to ionospheric drivers (e.g., thermospheric densities and winds, low latitude electric field) it needs to be noted that these can be handled with physics-based models, empirical models, or data assimilation models. For the current study, GAIM-GM, GAIM-BL, and IFM employed empirical models whereas GAIM-FP, GAIM-4DVAR and Mid-Low Electro-DA estimated the driving forces as part of the data assimilation process.

Data Sources
The model simulations were conducted for the 16 and 17 March 2013 quiet/storm period and the data assimilated by the DA models included slant Total Electron Content (TEC) from 530 ground-based receivers and topside plasma densities near 800 km altitude derived from the COSMIC satellite GPS observations, which were calculated by differencing subsequent COSMIC slant TEC observations when the slant path crossed zero degree elevation angles. Bottom-side electron density profiles from 80 ionosondes/digisondes were not assimilated for the current study and used for validation purposes (Figure 1). The GPS receiver biases for a number of stations, along with satellite biases, were obtained from the University of Bern, in Bern Switzerland. The remainder of the GPS station biases was provided by the Kalman Filter in the GAIM-GM model. For the COSMIC slant TEC data used to derive the topside densities, the absolute values provided by the University Corporation for Atmospheric Research (UCAR) were used.

Ensemble Model Averaging
Ensemble model averaging has played a significant role in meteorology and short-term climate prediction, and the goal of the MEPS program is to provide ensemble ionospheric modeling. The ensemble averaging can be conducted with data assimilation models, physics-based models, or a combination of both, and we are considering several schemes. For this ensemble modeling work, we selected the storm period of March 2013. In the example to be shown, five data assimilation models (GAIM-GM, GAIM-FP, GAIM-BL, GAIM-4DVAR, Low-Mid Electro DA) and one physics-based model (IFM) were used to produce an ensemble mean output for TEC, N m F 2 , and h m F 2 for latitudes less than 60°and all longitudes. Note that the data assimilation models do not obtain an exact fit to the individual measurements, but instead achieve a simultaneous overall agreement with the various measurements. Also, in this study, the data assimilation models were run just once with no "fine tuning" to correct for possible uncertainties.
A simple average of the models was adapted in the initial ensemble averaging to see if there was an improvement of the ensemble over the individual models. The simple MEPS ensemble average consists of summing the individual model variables and then dividing by the number of models in the sum, (1/N) * ∑V i , where N is the number of models and the V i are the chosen model variables. The variables in this case were the TEC (10 16 electrons/m 2 ), N m F 2 (electrons/cm 3 ), and h m F 2 (km). Even the simple average in a real-time environment is difficult, due to the necessity to put all of the varying model resolutions onto a standard grid in order to accomplish the averaging.
Two cases were considered, one with GPS TEC data only and one with GPS TEC data and topside plasma densities near 800 km altitude. Figure 2 shows snapshots of vertical TEC from the six ensemble models for the storm day (17 March) at 2100 UT for the case when both GPS TEC and topside density data were assimilated. When compared to vertical TEC measurements, the individual models for the run with GPS data only display a mean difference from the data from +7.68 TEC units to −1.73 TEC units, with the ensemble average having a mean of 0.60 TEC units. When the topside density data was added, the mean difference from the data ranged from +7.68 TEC units to −1.72 TEC units, while the ensemble average dropped to 0.13 TEC units.
The simple average described above was applied by taking the output from each of the models in the study, putting the output on a consistent grid, and taking the average of all of the models. This average is then referred to as the "ensemble average." Figure 3 shows the "ensemble average" mean and standard deviation for vertical TEC for the GPS data only run (left panels), and the GPS and topside density run (right panels), with the top panels for the mean and the bottom panels for the standard deviation. The mean of the "ensemble averages" in each case showed a value closest to zero when compared to the individual data assimilation models for TEC and N m F 2 , and second closest to zero for h m F 2 . As can be seen in Figure 3 in the bottom panels, the standard deviation shows the models to vary most in the morning sector toward the poleward edge of the anomalies, and in the evening sector near the Appleton anomaly crests. Figure 4 shows the vertical TEC from the GPS and topside density data run; the ensemble average mean is in the top right panel, the vertical TEC measurements are in the bottom right panel, and the comparison between the measurements and model is in the panel at the left. The two lines in the panel to the left are the best fit line (green) and the 45°line (black). The fit between the data and the model shown by the green line has an R 2 value of 91.9%, while the fit between the data and the model, shown by the black line has an R 2 value of 85.8%. The ensemble mean of the models shows well defined Appleton anomalies that are in good   agreement with the measurements in the lower right panel, along with the fits, shows good agreement with the data and theory. Similar comparisons were also obtained for N m F 2 and h m F 2 . Figure 5 shows the N m F 2 ensemble mean in the top right panel, the N m F 2 measurements from selected ionosondes (Figure 1) in the bottom right panel, and the comparison between the data and ensemble mean in the panel at the left. Figure 6 shows the corresponding comparisons for h m F 2 . Note that with just GPS and topside density data, it is difficult to capture h m F 2 because a given N m F 2 could occur at a range of heights. Assimilating wind, electric field, or density measurements in the F region would help alleviate this problem.
We also considered a weighted average of the models, where the individual models were weighted with a single value. The models were fit to unused GPS slant TEC data and a residual sum of squares values was determined. Since the lowest sum of squares values are the best fit, the weight for each model was determined as the inverse of the sum of squares residuals. Then, the average of the models was determined as Avg = (1/N) * ∑(w i V i )/∑w i , where N is the number of models, w i are the individual model weights, and the V i are the chosen model variables, which were TEC, N m F 2 , and h m F 2 . Figure 7 shows the weighted mean of the ensemble of the six models (five DA and one physics-based), which can be compared to the simple mean in Figure 4. The scatter plot in the weighted mean figure shows a better trend and smaller scatter through the whole range of TEC values over the globe. The comparison between the simple and weighted averages shows that the weighted average gives a slight improvement to the fit as determined by the R 2 for the 45°line, with the values for the simple average being 85.8%, and the weighted average being 86.3%.

Summary
Five MEPS data assimilation models and one physics-based model were used to simulate the mid-low latitude ionosphere for the quiet/storm period of 16 and 17 March 2013 (equinox, solar medium). The DA

Space Weather
SCHUNK ET AL. models assimilated the same data, which included ground-based GPS TEC and COSMIC-derived topside densities near 800 km altitude. Although all of the DA models reproduced the general ionospheric behavior during the quiet/storm period, some DA models produced more favorable comparisons with the data at certain times and places. Therefore, ensemble model averages were also obtained using the five data assimilation models and one physics-based model. Both a simple average and a weighted average of the models were obtained in order to determine if there was an improvement of the ensemble averages over the individual models. The sum of the difference between the data and the model results, as discussed above, shows that the average of the models has the smallest average difference from the data. This indicates that the average of the models has the smallest bias, while the R 2 values show that there is a slight improvement when taking a weighted average versus a simple average. The model runs with both GPS TEC and topside density data produced slightly better reconstructions than when only GPS TEC data were assimilated. When compared to measurements, a simple average of the ensemble of models was definitely better than the individual models for TEC and N m F 2 , but not for h m F 2 . With just GPS TEC and topside density data, all of the models had difficulty in capturing h m F 2 , because a given N m F 2 could occur at different heights. This difference in heights was due to the lack of physics in the models, that is, a constant topside scale height, the limitations of the data assimilation method, or large variances between the background model and the data. Assimilating additional measurements, such as F-region density measurements, equatorial electric fields, and meridional neutral winds would help capture h m F 2 , but the physics of the models needs to be addressed. We also considered a weighted average of the models and found that the weighted mean of the ensemble of the six models (five DA and one physics-based) was slightly better than the simple mean.
The results show that weighting models according to model performance improves the ensemble average. However, more ensemble averaging studies need to be conducted for a range of space weather events, and with additional data amounts and types, to determine the best weights, and weighting method, that give the best now-cast and forecast.
The MEPS system we have constructed will be delivered to the Community Coordinated Modeling Center at the NASA Goddard Space Flight Center so that other data assimilation models can be included in the ensemble modeling system (e.g., Bust et al., 2004;Bust & Crowley, 2007;Codrescu et al., 2004;Datta-Barua et al., 2013;Decker & McNamara, 2007;Howe et al., 1998;Minter et al., 2004)