Potential of Regional Ionosphere Prediction Using a Long Short‐Term Memory Deep‐Learning Algorithm Specialized for Geomagnetic Storm Period

In our previous study (Moon et al., 2020, https://doi.org/10.3938/jkps.77.1265), we developed a long short‐term memory (LSTM) deep‐learning model for geomagnetic quiet days (LSTM‐quiet) to perform effective long‐term predictions for the regional ionosphere. However, their model could not predict geomagnetic storm days effectively at all. This study developed an LSTM model suitable for geomagnetic storms using the new training data set and redesigning input parameters and hyper‐parameters. We collected 131 days of geomagnetic storm cases from January 1, 2009 to December 31, 2019, provided by the Japan Meteorological Agency's Kakioka Magnetic Observatory, and obtained the interplanetary magnetic field Bz, Dst, Kp, and AE indices related to the geomagnetic storm corresponding to each storm date from the OMNI database. These indices and F2 parameters (foF2 and hmF2) of Jeju ionosonde (33.43°N, 126.30°E) were used as input parameters for the LSTM model. To test and verify the predictive performance and the usability of the LSTM model for geomagnetic storms developed in this manner, we created and diagnosed the 0.5, 1, 2, 3, 6, 12, and 24‐h predictive LSTM models. According to the results of this study, the LSTM storm model for 24‐h developed in this study achieved a predictive performance during the three geomagnetic storms about 32% (10%), 34% (17%), and 37% (5%) better in root mean square error of foF2 (hmF2) than the LSTM quiet model (Moon et al., 2020, https://doi.org/10.3938/jkps.77.1265), SAMI2, and IRI‐2016 models. We propose that the short‐term predictions of less than 3 h are sufficiently competitive than other traditional ionospheric models. Thus, this study suggests that our model can be used for short‐term prediction and monitoring of the regional mid‐latitude ionosphere.

predicted foF2 after 1, 2, 3, 4, and 25 h only for geomagnetically quiet days (McKinnell & Poole, 2000;Poole & Poole, 2002). Later, Oyeyemi et al. (2005) extended the ANN model using 59 global ionosonde data from 1964 to 1986, with more input parameters, such as the day number (day of the year), hour, sunspot number, Ap index, the meridian angle, magnetic dip angle, magnetic declination angle, and solar zenith angle. Their model improved the prediction performance by 15%-16%, but the performance for geomagnetic storms was not presented. Yue et al. (2006) developed an ANN model capable of long-term prediction of foF2 by training it with 19 ionosonde data located in the Asia/Pacific region. However, it was a model that could predict only the foF2 parameter accurately during geomagnetic quiet days. More recently, Wichaipanich et al. (2017) presented a neural network (NN) model for foF2 by utilizing data from three ionosonde stations near the magnetic equator of Southeast Asia. The model showed better predictive performance for the RMSE of foF2 in Chiang Mai, Chumphon, and Kodotabang than the IRI model. Still, their results were also analyzed only on geomagnetically quiet days. All of these neural network models have focused on predicting the foF2 parameter only for geomagnetically quiet conditions.
There have also been studies that developed NN models, including geomagnetic storm cases. Wintoft and Cander (2000) developed an NN model for foF2 prediction using Slough ionosonde (51.5°N, 0.6°W) data and the AE index as an input parameter. The model calculated the hourly prediction values from 1 to 25 h later. Nakamura et al. (2007) developed the foF2 predictive ANN model using the Kokubunji ionosonde (35.71°N, 139.49°E) data in Japan and the local K index as an input parameter. Athieno et al. (2017) developed a similar ANN model using 21-year data from Resolute (74.7°N, 265.1°E) located at the polar cap region. The model was limited to hourly prediction and its input parameters included the day number, hour, polar cap index, Ap index, and F10.7 index values. Fan et al. (2019) used an Elman Neural Network (ENN) algorithm different from ANN and tested a short-term foF2 prediction during the geomagnetic storm. The ENN algorithm is more advantageous for time-series analysis in which it stores the past state that it may compensate for the weak points of ANN. These models attempted to predict foF2, including geomagnetic storm cases, but hmF2 was not attended.
Very few studies dealt with both foF2 and hmF2 in the development of NN models. Sai Gowtam and Tulasi Ram (2017) developed an ANN-based model that can predict both foF2 and hmF2 by using GPS radio-occultation (RO) observation data from the Formosa Satellite Mission 3-the Constellation Observing System for Meteorology, Ionosphere, and Climate satellites. The model was trained with inputs neutral wind field information of the Horizontal Wind Model-14 model, in addition to the usual input parameters, F10.7 and Ap index. However, the model results of their study were also limited to geomagnetic quiet days. Tulasi Ram et al. (2018) improved the ANN-based model by using data from the Challenging Minisatellite Payload, the Gravity Recovery and Climate Experiment RO, and the global ionosonde network together. Because these ANN-based models would not consider past data for a specific period from the present, the prediction ability may not be appropriate for phenomena affected by the previous state further back than the specified period.
To overcome the disadvantage of the ANN algorithm, a technique is needed to memorize past data and reflect them in prediction such as the long short-term memory (LSTM) algorithm. Hu and Zhang (2018) used a bi-LSTM technique to memorize data characteristics in both forward and backward directions and developed a model for 1-h hmF2 prediction. Our previous work  predicted both foF2 and hmF2 parameters using the LSTM model. Kim et al. (2020) assimilated the F2 parameter predicted from Moon et al. (2020) into a physics-based model to predict the midlatitude ionosphere for up to 24 h. Although the LSTM based model showed reasonably good predictive performance on geomagnetically quiet days, the model prediction was poor for geomagnetic storm days. The reason for poor prediction for geomagnetic storm days may be due to the fact that the training data for the LSTM model were obtained mostly for geomagnetic quiet days, so that the model was biased to the quiet condition.
Therefore, in this study, we attempt to overcome the problems of the previous LSTM model during geomagnetic storms reported in the previous study . For this purpose, we collected 69 geomagnetic storm events from January 1, 2009 to December 31, 2019, considering the period in which data on Jeju ionosonde (33. 43°N, 126.30°E)  periods. In addition, we have remodeled different LSTM models specialized for 0.5, 1, 2, 3, 6, 12, and 24-h prediction, considering that the performance of the LSTM model varies depending on the prediction target.
In this study, we present the results of the short-term prediction for foF2 and hmF2 during the geomagnetic storms and discuss the limitations of rapidly changing ionosphere states. Section 2 provides a detailed description of the LSTM algorithm and data used in this study for the geomagnetic storm cases. Section 3 presents the LSTM model results, and Section 4 discusses the limitations and possibilities of predicting the rapidly changes ionosphere during storm times. Finally, Section 5 summarizes and concludes our study.

Long Short-Term Memory
Our model has a similar design to the LSTM model developed previously . Figures 1a  and 1b show diagrams of the new LSTM model developed in this study. Figure 1a shows the detailed calculation flow included in one LSTM cell. The LSTM model consists of three steps, and each part has its own characteristics (e.g., Hochreiter & Schmidhuber, 1997). The first step here is the "forget gate layer" that controls what data is left behind from the previously calculated output ( 1 t h ) and the current input data ( t x ). Here, the used sigmoid function (σ) receives the previous output and the present input and gives a weight by getting a value of 0-1. A value of σ closer to 0 means that the result of this function does not affect future results, and conversely, a value closer to 1 means the opposite. The sigmoid function (σ) expressed by Equation 1 is multiplied by the calculated value, t f , by the LSTM cell state ( 1 t C ) to determine which value to discard.
10.1029/2021SW002741 3 of 20 The second is the "input gate layer" step, which determines whether new information will be stored in the cell state. In this step, it decides which values ( ) i t to update by the sigmoid layer (Equation 2), and generates a new candidate value (  t C ) that can be added to the cell state ( ) C t 1 through the nonlinear function tanh layer (Equation 3), where W is the weight and b is the bias. As shown in Equation 4 In the final step, the "output gate layer," it decides what to output. In other words, it is the process of determining what value to output ( t o ) from the current state ( 1 t h ) and input value ( t x ) in the sigmoid function (Equation 5). The updated cell state ( t C ) is normalized to a value between −1 and 1 through the tanh function and then is multiplied by the output value ( t o ) from the sigmoid function (Equation 6). Through this overall process in a one LSTM cell, we can obtain the result value ( t h ) at the current time t. This single-cell serves as one hidden layer, and the optimal number was found by adjusting from 2 to 50 in 2 intervals in this study.

Training, Validation, and Test Data Sets
Previous studies Moon et al., 2020) performed good predictions during geomagnetically quiet periods but did not predict well during geomagnetic storms. We pointed out that most of the longterm training data included geomagnetically quiet days. Actually, during the 5-year training period used in Moon et al. (2020), there were only 57 cases and 105 days of geomagnetic storms. For the new LSTM, we collected measured data under geomagnetic storm conditions for a longer period of time and excluded data for quiet days. We searched for geomagnetic storm events on the Japan Meteorological Agency/Kakioka Magnetic Observatory web page (kakioka-jma.go.jp/en/) during the period when Jeju ionosonde data were available (from January 1, 2010 to August 2017, for a total of 7.7 years). We were able to find a total of 71 events (138 days) associated with available ionosonde data during this period, as shown in Table 1. We divided these events into a training and validation set at a ratio of 9:1 to check the performance of the model. In addition, we used Bz, AE, Dst, and Kp indices as geomagnetic indices corresponding to the events in Table 2. Since the ionosonde data are observed every 15 min, all indices have been interpolated every 15 min.
The test period (first prediction target) in this study is 3 days from September 6 to September 8, 2017, the geomagnetic storm case used in the previous study Moon et al., 2020). Figure 2 shows the space environment changes during the geomagnetic storm of test event #1 (see Figure 2 in Kim et al., 2020). Two more test events were analyzed to evaluate the performance of the LSTM model in this study. In test events #1-3, we compared the observation data of Jeju ionosonde. In the case of Jeju ionosonde, it has not been operating since October 2018, so additional test events could not be analyzed. These events' information are detailed in Table 1. The Jeju ionosonde data observed during the three test periods were downloaded from the Korean space weather center web page (https://spaceweather.rra.go.kr/observation/ service/iono), and the geomagnetic indices corresponding to each event period are obtained from the OMNI web (https://omniweb.gsfc.nasa.gov/ow.html). For the ionosonde data, we used the EDP and the SAO file marked by the highest confidence level index.
KIM ET AL.

Optimal Hyper Parameter Options
When making predictions using the LSTM algorithm, it is important to control the hyper-parameters of the model. In other words, we had to find the optimal model state by adjusting the hyper-parameters. Among the many hyper-parameters, the number of hidden layers, which means LSTM cells described above, and the range of historical data used for prediction are important. In our LSTM algorithm, the range of historical data can be adjusted with a hyper-parameter called lookback. Since each data point is every 15 min, when the lookback parameter is 1, the past 15 min of data are reflected in the current state. If the lookback is 4, data from the past 1 h would be memorized.
Another hyper-parameter, the batch size, can set the data interval for updating the weights, and it can also be adjusted every 15 min. In this study, because we aim to predict a maximum of 1 day (data points = 96), the lookback and batch size are set and validated to 6 h (data points = 24), 12 h (data points = 48), and 24 h (data points = 96). Since we have to validate the prediction performance of 0.5, 1, 2, 3, 6, 12, and 24-h, we set the value of lookahead (prediction target) to 2, 4, 8, 12, 24, 48, and 96. Then we calculated the root mean KIM ET AL.
10.1029/2021SW002741 6 of 20     square error (RMSE) values for the validation data sets using the combination of each hyper-parameter.
The combination with the lowest value was chosen and adopted as the design of the final LSTM model. We performed all the steps for both foF2 and hmF2 models, respectively. Figures 3 and 4 show the RMSE values for the foF2 and hmF2 validation set of the prediction target model, respectively, according to each hyper-parameter combination. We have marked the best combination for each plot with the black arrow compared with different prediction targets. Also, we summarized the best options in the plots in Tables 3 and 4. Here, we were able to find some interesting aspects in the tables. For both foF2 and hmF2, the best options were determined based on a certain prediction target. In the case of foF2, the LSTM structure with 24 lookbacks, 24 batch sizes, and 30 hidden layers showed the best performance in the prediction target of 3 h or less. On the other hand, 48 lookbacks, 24 batch sizes, and 32 hidden layers were the best combinations for the 6 h or more prediction target. For hmF2, the LSTM structure of 96 lookbacks, 24 batch sizes, and 44 hidden layers showed the best performance in 3 h or less, and 96 lookbacks, 48 batch sizes, and 46 hidden layers were the best options for 6 h or more. In Tables 3 and 4, the optimal combinations corresponding to the short-term forecasting targets are indicated by a gray box and the long-term forecasting ones by a gold box to improve readability. We trained the LSTM models using these selected best hyper-parameters and evaluated their performances during the testing periods.
KIM ET AL.

10.1029/2021SW002741
8 of 20 Note. The gray (gold) background boxes indicate the best options for short-term (long-term) prediction. RMSE, root mean square error.

Evaluations of the Training and Validation Data Sets
When developing and training a deep-learning model, training data and validation data must be evaluated.
In other words, we need to check how well our models are training on the training data and how well the validation works. In particular, due to the characteristics of the deep-learning model, it is necessary to catch the end of training well. Moreover, it is also directly related to the overfitting problem. Thus, we conducted the process of evaluating the training and validation data. The MATLAB code provides real-time analysis of progress while training the LSTM model. It is a program option called "training-progress." We used this to KIM ET AL.
10.1029/2021SW002741 9 of 20   This overfitting problem can be prevented by designing a "dropout layer" after the "LSTM layer" when evaluating Matlab's LSTM model validation data. We used the dropout layer to prevent overfitting. Moreover, when evaluating the validation data, we specified an option called "ValidationPatience" of Matlab. This option has the ability to control the number of times the loss for the validation set is allowed to be greater than or equal to the previous smallest loss. We designed this option to avoid overfitting. Therefore, based on the evaluation results in Figure 5, we were convinced that the LSTM models developed in this study were sufficiently usable and then analyzed the test event.

Results and Discussions
The LSTM models trained for geomagnetic storm events in this study are first compared with the LSTM quiet model  for the three test events shown in Tables 1 and 2. Then, we analyze the performance of the LSTM storm models in terms of short-and long-term predictions for foF2 and hmF2. Moreover, we discuss the possibility and limitation that the LSTM-storm models for forecasting services in the regional ionosphere.

Comparison of LSTM-Storm and LSTM-Quiet Models
First, we compared the 24-h prediction results of the LSTM storm model with those of the LSTM quiet-time model . Since the 24-h prediction model results need to be compared, the LSTM model with the hyper-parameter corresponding to 24 h in Tables 3 and 4  the LSTM quiet model was mainly trained with foF2 data of quiet periods. On the contrary, the LSTM storm model trained with only storm period data predicts better the increased foF2 during the storm day but overestimates foF2's during quiet days. In Figure 6b, the observed hmF2 elevated sharply on day 251, but none of the models predicts the sharp elevation of the F2 layer. The prediction of hmF2, especially for stormy nighttime, remains a challenge to data-based and physics-based models.
We wondered if the storm-only LSTM model developed in this study predicted positive storm well for foF2 but not well for hmF2. We speculated that the answer to this would be found in the training data and confirmed how the deep-learning model results were biased according to the training data. As such, we looked closer into the training data set of 61 storm events listed in Table 1. We then confirmed the patterns statistically whether the ionospheric storm developed into a positive or negative storm for foF2 values. In addition, we checked whether the hmF2 values were elevated or lowered during each event period.
In order to analyze each storm pattern, we characterized the 61 ionospheric storm patterns by setting a reference as the average value for a total of 10 days, that is, ±5 days, around each event date. Naturally, the 10-day period includes the geomagnetic quiet condition. The reference values of the ionospheric parameter have been utilized in various studies to distinguish the effects of a geomagnetic storm on the ionosphere. Szuszczewicz et al. (1998) 2017), respectively. It is expected that the longer the window size for the mean value is, the more suitable to make the quiet reference state. The above-mentioned studies also defined a positive ionospheric storm when the storm period value was higher than the reference value. In the opposite case is a negative storm. In the same way, we recorded a positive (negative) storm as the observed values higher (lower) than the reference line for the duration of all training events. We calculated each time span above or below the reference line for each event and then used these time spans to determine the percentage of the positive to the negative storm, called the P/N ratio. In other words, the area above or below the reference line was calculated for a single event, and it was quantified to determine which storm was more possessed. In this way, if the P/N ratio of a storm event is larger (smaller) than 1, it means that the positive (negative) storm is dominant. In hmF2, we defined the rising (falling) of hmF2 to be positive (negative).
In Figure 7, we present how the positive/negative spans are computed for foF2 and hmF2 data of the training storm event #1. The positive spans are in red, and the negatives in blue. Besides, the geomagnetic Kp index is bar-plotted at the bottom of Figure 7 to check the time zone of the geomagnetic storm. Especially, bars with Kp of 4 or more are shown in gold. As shown in the dominant red color region, the foF2 values are mostly higher than the reference (dotted line) during the storm period of training event #1. For the hmF2, positive and negative spans are more or less evenly distributed. For both foF2 and hmF2, the P/N ratios are greater than 1, and we could surmise that the positive storm dominated in training event #1. The other 60 training events all showed different ionospheric storm patterns, and we plotted these results in the Supporting Information (Figures S1-S8).
We expect that the distribution of the P/N ratios for the total of 61 storm events will affect the deep learning model developed in this study. Figure 8a briefly summarizes the P/N ratios of 61 storm events, and histograms in Figure 8b show what types of ionospheric storms affect the LSTM storm model. As can be seen from the histogram, the foF2 training data set has about twice as many positive storms as the negative storm, whereas the hmF2 data set has about half and half for positive and negative storms. It can be indirectly inferred that the LSTM storm model for foF2 is more specialized to positive storms, which may explain that the LSTM storm model predicts well the foF2 positive storm for test event #1, as in Figure 6a. However, the LSTM storm model seems not trained well for the case of a positive hmF2 storm, explaining the failure potential of predicting the elevated hmF2 in Figure 6b.
As we deduced above, it could be a problem with biased training data, but there could be other causes as well. Another cause of that is the fluctuation characteristics of hmF2. When a geomagnetic storm occurred, the fluctuations of hmF2 do not change as clearly as a positive storm or a negative storm like foF2, as reported in various studies (Adebiyi et al., 2014;Feng et al., 2021;Gao et al., 2020). In other words, the fluctuations of hmF2 may not be solved by simply training the space environment factors and the F2 parameters. Most of all, as described by Mikhailov and Marin (2001), the contraction and expansion of the KIM ET AL.

10.1029/2021SW002741
13 of 20 thermosphere affect the altitude of hmF2, resulting in fluctuations in hmF2 in the midlatitude ionosphere. Also, the thermospheric meridional winds and circulations play a large role in it. Therefore, we propose using the thermospheric density and wind field as input parameters to predict the fluctuations of hmF2 in the deep learning model. However, it will require much effort in the future because it is not easy to obtain KIM ET AL.

10.1029/2021SW002741
14 of 20  observations of these values during geomagnetic storms at a specific location. Thus, it is one of the big problems and a challenge in the future. Figure 9 compares the model predictions for test event #2 (a-c) and #3 (d-f) in the same way as in Figure 6. As shown in Figure 9c, test event #2 includes a geomagnetic storm on day 111 of 2018, but there was little change in the ionosphere of the Jeju location. On this day, foF2 was slightly higher than the previous day (day 110) in the nighttime but slightly lower in the daytime. In the case of hmF2, it also did not change on the day of the geomagnetic storm.  both the LSTM storm and quiet models seem to predict reasonably close values to the observations. In the absence of a positive storm, as in test event #1, the performance of the two models is practically the same.
To quantitatively evaluate the performance of our model for the storm days in the three test events, we calculated three indices: correlation coefficient (CC), RMSE, and mean absolute percentage error (MAPE).

CC
Table 5 summarizes the skill scores of all the models compared in Figures 6 (Test #1) and 9 (Test #2 and #3). As in Table 5, the LSTM storm model predicted mostly the best among all the models for the three test events at Jeju location. Quantitatively, our prediction model shows better predictive performance for foF2 (hmF2) by about 32%, 34%, and 37% (10%, 17%, and 5%) compared to the LSTM quiet, SAMI2, and IRI-2016 models.
Although more test events are needed to evaluate the statistically meaningful performance, we could not find events related to geomagnetic storms in the Jeju Ionosonde observation data since 2018. Especially, the LSTM storm model needs to be evaluated more for negative storm events that are lacking for the current KIM ET AL.

10.1029/2021SW002741
16 of 20 Note. The units of foF2 and hmF2 are MHz and km, respectively. The gray boxes mean the best model. CC, correlation coefficient; LSTM, long short-term memory; MAPE, mean absolute percentage error; and RMSE, root mean square error.

Table 5
The Performance Skill Scores of Three Test Events at Jeju Location study. For this reason, it is necessary to utilize ionosonde data at different latitudes and longitudes to evaluate more storm events. If we add data from other locations to the training set, we must also add latitude and longitude information to the input parameters. However, if we trained all of the global ionosonde data, the predictive performance may be further degraded due to the mixture of fluctuations in the ionosphere that appear different locally. Also, it is not easy to collect training data because even with the same geomagnetic storm, ionospheric storms can appear differently depending on geographic location, season, and local time.
In fact, Shim et al. (2018) reported that the positive storms and negative storms appear differently in the east and west of the United States during the same geomagnetic storm event. So, at the present stage, we do not consider it possible to conclude how it is really right to train all global data at once. Nevertheless, if we use ionosonde data from Japan or China together, we expect it can be more helpful in predicting the Northeast Asia region. In the future, we suggest that such research needs to be continued.
Moreover, we can overcome this problem by constructing various sets of training data. For example, we may categorize all types of storm/occurrence times and then evaluate them. Another one is that the input parameters could be extended to the thermospheric density and wind field. However, there are too few examples of geomagnetic storms to evaluate the former way. Also, the latter method is limited because there are few observations of the density and wind field data of the thermosphere. So, this approach is expected to be possible in the future as more data becomes available. Furthermore, to use the LSTM storm model in a forecasting service, it is needed to know when the forecaster changes the LSTM quiet model to the storm model timely. This is a difficult issue because it requires predicting a geomagnetic storm. One practical solution to this issue is to use predicted Kp indices from global space weather services, where all the space weather data, including satellite data for solar wind, are collected and analyzed for the prediction. For example, Kp index greater than 4 is predicted over the next day, then the regional ionospheric forecaster adopts the LSTM storm model rather than the LSTM quiet model. We expect that the LSTM storm model developed in this study can be useful in this way. Besides, our research group plans to use these ideas for forecasting and surveillance work in the future.

Performances of Short-Term and Long-Term Predictions
In this section, we evaluated the performance of the LSTM storm model for 1, 2, 3, 6, 12, and 24-h predictions. During the test event #1 period, the models predict foF2 and hmF2 for next target times. For example, the 1-h prediction model computes the value for 1 h later than the current time by using the observed data up to now. In other words, it is easy to understand that a 1-h prediction model makes predictions every hour. Figure 10 shows the histogram, which includes the predictive ability of each model along with performance skill scores. The correlation coefficient is colored in gold, the slope plotted in 1-to-1 correspondence is in red, and the RMSE is in blue. We additionally used the slope obtained by a 1-to-1 correspondence in order to intuitively know the relationship between the observed value and each model predicted value. It is evident from Figure 10 that RMSE's for both F2 parameters increase with the prediction time except for the 0.5 h prediction. On the contrary, the correlation coefficient and 1:1 slope decrease as the prediction time increases. It is congruent with what we initially predicted. What is unique here is that the 1-h prediction model performs better than the 30-min prediction model. The best RMSEs of the 1-h prediction are 0.3 MHz and 17 km for foF2 and hmF2, respectively, which are encouraging results for the practical application of the model.
For practical application, we regard the LSTM models up to 3 h prediction as short-term prediction models and longer than 3 h as long-term prediction models. As shown in Figure 10, the RMSE values of the short-term prediction models for foF2 are significantly less than 1 MHz, while that of the long-term models exceeds 1 MHz. The performance of the short-term models is comparable to other deep learning models (Athieno et al., 2017;Fan et al., 2019;McKinnell & Poole, 2000), although it is not easy to compare with the RMSE from these models because they have been developed for targeting different local, date, local time, and space environment conditions. For hmF2, the short-term prediction models were not easy to significantly improve performance. Nevertheless, based on the results presented in Figure 10, we argue that prediction models of less than 3 h are sufficiently competitive.
KIM ET AL.

Conclusions and Summary
In this study, we have developed a new LSTM model for predicting foF2 and hmF2 to overcome the weaknesses of the LSTM quiet model developed by Moon et al. (2020). We collected 61 geomagnetic storm events from January 1, 2010 to August 2017 (about 7.7 years) for the training data set. Also, the space environment indices related to geomagnetic storms were used together as training sets. Optimal hyper-parameters were searched for the LSTM models with each prediction target time. Three test events were selected to evaluate the performance of a geomagnetic storm-specific LSTM model (LSTM storm model), and correlation coefficient, RMSE, and MAPE scores were calculated and diagnosed for each model (LSTM storm, LSTM quiet, SAMI2, and IRI). We also discussed the performances and predictability of short-term (up to 3 h) and longterm (longer than 3 h up to 24 h) prediction models.
Our results are summarized as follows: KIM ET AL.

10.1029/2021SW002741
18 of 20 Figure 10. The performance skill scores for (a) foF2 and (b) hmF2 of each long short-term memory prediction model. The gold, red, and blue histograms mean the correlation coefficient, 1:1 slope, and the root mean square error (RMSE) values, respectively.
1. In geomagnetic storm (test event #1-3), the LSTM storm model for foF2 (hmF2) showed better performance in terms of RMSE by 32%, 34%, and 37% (10%, 17%, and 5%) than the LSTM quiet, SAMI2, and IRI-2016 models, respectively. 2. As the test event analysis, we propose that it is important to train the observational data near the specific region when developing a regional ionospheric prediction model using deep learning technology. This is because the state of the ionosphere during a geomagnetic storm varies depending on the location, date, local time, and thermospheric conditions. 3. Although the performance degrades significantly with increasing prediction times, short-term predictions (up to 3 h) show an RMSE of less than ∼1 MHz for foF2 and 25 km for hmF2. Therefore, we propose that short-term prediction models are sufficiently competitive.
This study used deep learning techniques to predict ionospheric storms during geomagnetic storm periods to overcome the limitations of previous learning techniques. Most significantly, this is the first attempt to develop a deep learning model by collecting only geomagnetic storm cases. Besides, our study is also meaningful in that it presents several possibilities for each prediction target model. Finally, we are confident that the results of this study will provide a means to find a way to respond to the risk factors from the ionospheric storm in the aspect of space weather.
Data Availability Statement and the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (2019K2A9A1A0610292012). Also, the information of the geomagnetic storm events is obtained from the Japan Meteorological Agency/Kakioka Magnetic Observatory webpage (kakioka-jma.go. jp/en/). The SSLab team (Su-In Moon, Se-Heon Jeong, and YongHa Kim) at the Chungnam National University helped design the skills of LSTM deep learning algorithms. The authors would also like to thank the Naval Research Laboratory (NRL) for providing the SAMI2 model (https://www.nrl.navy. mil/ppd/branches/6790/sami2) to be used and the IRI model developer for opening the IRI-2016 model (http:// irimodel.org/) as open-source code. For the deep learning algorithm, we used the deep learning toolbox, a MATLAB code.