A New Method for Predicting Non‐Recurrent Geomagnetic Storms

Predicting non‐recurrent geomagnetic storms caused by Coronal Mass Ejection (CME) is important and necessary for space weather forecasting. Previous studies have shown that it is feasible to predict non‐recurrent geomagnetic storms by reconstructing precursors from cosmic ray intensity (CRI) in the frequency domain. However, the difficulty lies in predicting the minor storms, the moderate storms, and the storms accompanying ground level enhancement (GLE). This study proposes a new method that includes the spectral whitening method and CEEMDAN‐CWT for predicting non‐recurrent geomagnetic storms. The method was applied to 229 CME‐driven events, including 166 events with Kp ≥ 5 and 63 events with Kp < 5, during the solar cycles 23 and 24. This study analyzed 166 geomagnetic storm events and found that 129 of them were accurately predicted, resulting in a recall rate of 77.7%. Additionally, the study found that as the Kp index increased, the amplitude of the precursor detected by this method also increased, while the time interval between the onset of CME and the maximum amplitude of the precursor decreased.

, which is limited by the time scale (approximately 1 hr) of the dynamic process of the magnetosphere during storms (Wu & Lundstedt, 1997). When the same data and model were used to predict Kp index, Bala and Reiff (2012) found that the root-mean-square error of their prediction results increased from 0.65 in 3 hr to 0.85 in 6 hr. Zhelavskaya et al. (2019) used the method of machine learning to predict Kp index, and they also found the same phenomenon that the root mean square error of the result also increased with the advance of the forecast time. In contrast, it is difficult for the technology systems affected by storms to respond effectively within a few hours. However, Improving the lead time for predicting geomagnetic storms has significant implications for enhancing the decision-making capabilities of infrastructure operators (Haines et al., 2021). For example, the prediction ability about 1 day during storm is effective to the orbit prediction of low-earth orbit satellites (Y. Zhang et al., 2018). Therefore, the longer the leading forecast time, the better for the technical system affected by storms. Several studies have been able to accurately predict geomagnetic storms several days in advance. However, these studies only take into account recurrent magnetic storms caused by coronal holes and corotating interaction regions or combine all geomagnetic storms. CME-driven non-recurring geomagnetic storms are not considered mainly or are even completely excluded from these studies (Bernoux et al., 2022;Haines et al., 2021;J. Wang et al., 2023).
Many previous studies have shown that the occurrence of non-recurrent geomagnetic storms could be predicted 10-25 hr in advance by using precursor signals carried in cosmic ray intensity (CRI) data (Astapov et al., 2017;Dorman, 1999Dorman, , 2005Papailiou et al., 2012;Ye et al., 2022). Though the variation of CRI in the time domain before storms, the occurrence could be predicted more than 6-12 hr in advance, even 20 hr in advance (Dorman, 2005;Lingri et al., 2019;Munakata et al., 2000;Papailiou et al., 2012). Similarly, by analyzing the variation of CRI in the frequency domain, the precursor of the geomagnetic storm can be extracted to predict the occurrence of geomagnetic storms with lead times of about 20 hr (Adhikari et al., 2019;Kudela et al., 2001;Ye et al., 2022;Zhu et al., 2015). Ye et al. (2022) used the time-frequency analysis CEEMDAN-CWT (Complete Ensemble Empirical Mode Decomposition with Adaptive Noise-Continuous Wavelet Transform) method to investigate 65 non-recurrent geomagnetic storms (Kp ≥ 7) that occurred during the 23rd and 24th solar activity cycle, and pointed out that geomagnetic storm precursor signals reconstructed from CRI by using this method could predict the occurrence of non-recurrent geomagnetic storms 22.5 hr in advance. Despite its effectiveness, the complexity of CRI limits its ability to remove the influence of ground level enhancement (GLE) events and predict moderate and small storms (Kp ≤ 6) (Ye et al., 2022).
CRI exhibits a complex structure that includes both non-recurrent signals, such as GLE, FD, and CME-driven fluctuations (Belov & Gushchina, 2018;Dorman, 1999) and periodic signals, such as 22-year cycle, 11-year cycle, annual variation, seasonal variation, diurnal variation, and so on (Chowdhury et al., 2016;Jeong & Oh, 2022;Krymsky et al., 2005;Shen & Qin, 2017;Tezari et al., 2016). The periodic signals in CRI are due to the fact that the cosmic ray is modulated by the heliosphere magnetic field, solar wind, and Earth magnetosphere (Jeong & Oh, 2022;Kravtsova & Sdobnov, 2016;Mandrikova et al., 2018), while most of the non-recurrent signals in CRI are due to the occasional activity of the sun, such as CME, flare. According to Ye et al. (2022), there are precursor signals associated with CME-driven geomagnetic storms could be identified in the frequency domain of CRI using the time-frequency analysis method CEEMDAN-CWT. However, the weaker precursor signals associated with CME-driven geomagnetic storms will be obscured by periodic changes, making this method only suitable for stronger storms (Kp ≥ 7). To predict weaker non-recurrent geomagnetic storms (Kp < 7), it is necessary to remove the periodic signals in CRIs in order to identify the weaker precursor signals associated with CME-driven storms.
In this study, by using the spectral whitening method (SWM) and CEEMDAN-CWT method to analyze the long-term cosmic ray data of Oulu station, the 229 geoeffective CME events selected from 1998 to 2019 were analyzed, and the performance of this prediction method is discussed. Section 2 presents the data and method used in this study. The results are introduced in detail in Section 3. Finally, a summary and conclusion will be given in Section 4.

Data Selection
To fully verify the applicability of this method to non-recurrent geomagnetic storms, we selected all CME-driven events occurring in solar cycles 23 and 24 from Richardson and Cane's catalog of Near-Earth Interplanetary Coronal Mass Ejections catalog. This catalog provides important ICME parameters, including ICME start and Writing -original draft: Cong Wang, Qian Ye end time, solar wind velocity, interplanetary magnetic field strength, and relevant CME information (Richardson & Cane, 2011). We focus on data between start of interplanetary shock and end of ICME, as storms are likely to occur in the periods.
The reference Kp index is obtained from the National Ocean and Atmospheric Administration Space Weather Prediction Center (NOAA-SWPC). Following the screening, 229 CME-driven events were selected (Figure 1), among which 166 events were geomagnetic storms (Kp ≥ 5), and 63 events were not. The distribution of maximum Kp index in all events and the monthly smoothed sunspot number in the same interval are plotted in Figure 1. The occurrence frequency of CME-driven events is strongly modulated by the cycle, a nearly periodic 11-year change in the Sun's activity measured in terms of variations in the number of observed sunspots on the Sun's surface. For instance, only one CME-driven event recorded in 2008, the minimum year, and without geo-effectiveness. Because the solar cycle reflects magnetic activity, various magnetically driven solar phenomena follow the solar cycle, including coronal mass ejections.
Cosmic ray data from the Oulu Station (Location: 60.05°N, 25.47°E; Cutoff rigidity: 0.8 GV), adopted for this study, can be dated back to 1964, with stable quality and long continuous time (Mishev et al., 2020). Considering the computational speed and time resolution, we used cosmic ray data from 1998 to 2019 for 22 years in this study, with a time resolution of 30 min.

Spectral Whitening Method
J. S. Wang et al. (2014) proposed a filtering method-SWM, which is a pure mathematical filtering method independent of physics. It can effectively remove periodic components in data so that the concerned non-recurrent components can be found easily. Previous studies have demonstrated the successful use of SWM to extract geomagnetic storm effects from ionospheric data (Z. Chen et al., 2014;. Additionally, SWM has been used to investigate the relationship between the residual 27-day quasiperiodicity and ionospheric Q disturbances (Z. , as well as the effects of the tropical on ionospheric total electron content (Zhao et al., 2018). Hence, SWM could be used to process the CRI to remove the periodic signals of CRI. For CRI data, which could be regarded as a set of time series data g(t), the algorithm of the SWM is where g s (t m ) is the cosmic ray data processed by the SWM; P env (ξ) is the upper envelope of the power spectrum of g(t); the power spectrum of g(t) is obtained by Fourier transform; P 0 is the value that occurs most often in the data set of P env (ξ). To reduce the noise enhancement, the three-point running average was applied on * ( ) for smoothing, and the final data g s (t m ) was obtained. Equation 1 reveals that the spectral weighting method involves dividing the Fourier transform of the observed data series by the real envelope function P env (ξ). This approach effectively suppresses the periodic component while emphasizing the non-periodic component as a major contributor (J. S. Wang et al., 2014). Finally, the inverse Fourier transform converts the whitened spectrum into time series data. The data processed by SWM are provided by C. Wang and Ye (2023b). Based on Figure 2, it can be inferred that the periodic signal of CRI has been removed while the non-recurrent signals of CRI have been preserved.

CEEMDAN-CWT
To obtain precursors aroused by non-recurrent geomagnetic storms, Ye et al. (2022) propose an ensemble method CEEMDAN-CWT. This algorithm can predict CMEs' geo-effectiveness through the time-frequency domain analysis technique. Procedures of CEEMDAN-CWT is: where IMF n is nth intrinsic mode function, and Res(t) represents the residual function; x(t) is reconstruction signal in the time domain, and given by the addition of certain IMF n ; ψ*(t) is the mother wavelet function, which is continuous in both the time domain and the frequency domain, and s is the scale parameter. At first, through CEEMDAN, CRI is decomposed to several intrinsic mode functions and the residual (Res). Then, IMF4 and IMF5 were chosen to reconstruct a new signal as they fluctuate more significantly when CMEs propagate to the Earth. Furthermore, by adopting CWT to analyze the new signal x(t), we obtain the spectrum of it. Figure 3a is an example of the spectrum.
Many factors modulate CRI, and through CEEMDAN-CWT, the variations in the reconstruction spectrum caused by CME mainly distribute between 4 and 24 hr. Thus, to see fluctuations more clearly over time, we used the proxy function precursor(t) to trace the impact of CMEs on CRI, and precursor(t) is calculated as: As shown in Figure 3b, the blue line is the precursor signal after quantization, and the red line is the diff of the precursor. The code of CEEMDAN-CWT is provided by C. Wang and Ye (2023a).

Precursor Signal Reconstruction
The precursor signals of CME-driven geomagnetic storms could be reconstructed from the frequency domain signals of CRI. However, the precursor signals, influenced by GLE events (defined as an event registered when there are near-time coincident and statistically significant enhancements of the count rates of at least two differently located neutron monitors including at least one neutron monitor near sea level and a corresponding enhancement in the proton flux measured by a space-borne instrument(s) (Poluianov et al., 2017) and periodic signals, can't be reconstructed successfully when GLE events occur or when the geomagnetic index Kp < 7 occurs 10.1029/2023SW003522 6 of 12 (Ye et al., 2022). Through the method described in Section 2 above, 229 events were analyzed. It was found that this method could effectively reconstruct the precursor of CME-driven events, including the events with GLE and events with geomagnetic index Kp < 7 (Figure 4). Figure 4a illustrates the geomagnetic storm caused by the CME outbreak on 4 November 2001. According to LASCO's observation, the CME occurred at 16:35 on November 4, followed by a GLE event was observed at Oulu station. The CME reached the Earth at 01:52 on November 6 and immediately triggered a severe storm (Kp = 8) lasting 6 hr. Comparing Figures 4a1 and 4a2, the reconstructed precursor rose obviously and rapidly before the severe storm outbreak and was unaffected by the GLE event. Therefore, it can be seen that the method mentioned in this paper could exclude the influence of GLE events well to improve the accuracy of geomagnetic storm prediction.
As shown in Figure 4b, the storm lasted 9 hr on July 19, caused by the CME outbreak on 17 July 2016. The precursor's magnitude and diff were gradually increased before the storm's onset and could predict the storm in advance, although the magnitude and diff were lower than that of the 4 November 2001 event ( Figure 4a). Furthermore, this shows that the method proposed in this paper effectively predicts weak geomagnetic storms (Kp ≤ 6).

Superposed Epoch Analysis
To investigate whether the precursor's amplitude varies according to the level of the corresponding storm, we choose the superposed epoch analysis technique to statistically analyze the precursors with different Kp indexes. We choose the data from CME burst to the ICME end as a sample, and add time span of all samples together, then obtain T mean through dividing the total time span via event amounts. The obtained T mean was 107 hr in this research. According to T mean , we interpolate all data to the same length. To visualize the target signal's fluctuation before and after events, we added data from 1 day to the start and end of the target signal. After processing, we averaged data with different Kps and signed its local maximum as the figure (Figure 5). Due to the influence of GLE, there were 4 events (one event with Kp = 7 and 3 events with Kp = 8) with an obvious precursor. Still, the amplitude of the precursor signals increased much more than other events, so they were excluded from the superposed epoch analysis.
As shown in Figure 5, the 225 events were divided into six groups based on the Kp index of storms. It was evident that with the increase of the geomagnetic activity index, the amplitude of the precursor increased, while the time interval between the onset of CME and the maximum amplitude of the precursor decreased. This is because the structure and speed of CME are closely related to its geoeffectiveness Rahman et al., 2014). Notably, the result indicates that the precursor extracted from CRI could not only indicate the onset of storms but also predict the strength and onset time. Figure 5 shows that the amplitude of the precursor 24 hr before the CMEs outbreak was approximately 55. This suggests that the periodic signals of CRI processed by the SWM have been removed, leaving only the non-recurrent signals (such as the variation signals affected by CME). An exception is the group of events with Kp = 9, where the average amplitude of the precursor 24 hr before the CMEs outbreak reached 100 in this group. This is because most of the events (Kp = 9) are preceded by several geomagnetic storms and accompanied by GLE events. Therefore, the average amplitude of the precursor 24 hr before the CME outbreak is typically larger than in other groups, but this does not affect the growth of the precursor, and does not affect the ability to predict storms.

Analysis of Geomagnetic Storm Forecast Results
The accuracy rate and recall rate of the forecast results of the model is generally used to judge the quality of a forecast model. Based on the prediction results, the proportion of the events predicted correctly as positive examples are called the Precision rate (P for short). In the sample of actual positive events, the proportion of events correctly predicted is called the Recall rate (R for short). However, due to such a phenomenon that has a high P and a low R or the other way around, it is difficult to evaluate the quality of the model by using the P or R alone. Thus, it is necessary to introduce an index F1 that takes into account both P and R, and the expression is: Figure 6 shows the precision rate, recall rate, and F1 values of the forecast model proposed in this paper under various threshold conditions. The threshold was set based on a base value equal to the precursor's average amplitude of the precursor prior to the onset of CME without storm Kp ≥ 5 lasting 24 hr. Once the precursor's amplitude increases to 1.1 times or more than the base value, the threshold is considered to have been reached.
Based on the findings depicted in Figure 6, increasing the threshold leads to a gradual increase in precision rate but a rapid decrease in recall rate after reaching the maximum at 1.2 times the base value. This suggests that while the prediction precision of this model increases with higher thresholds, it also misses more positive events, resulting in more false-negative events. According to the F1 value, the optimal threshold of this model should be set as 1.2 times the base value when both the precision rate and the recall rate are taken into account.

The Threshold Is Set to 1.2 Times the Base Value
Table 1 records the forecast results when the model threshold is 1.2 times the base value. Among 166 positive geomagnetic storm events, 129 were successfully predicted, with a recall rate of 77.7%. This result is similar to the efficiency of the CME-driven storm prediction model supported by Telloni et al. (2019). We regarded the time between the precursor amplitude reaching the threshold to the arrival of a shock to the earth as the leading time of this model. According to the threshold value of 1.2 times the base value, the overall average leading time can reach 50.4 hr, which is better than the result supported by Ye et al. (2022). Moreover, with the decrease of geomagnetic storm strength, the leading time will gradually increase because the speed of CME that can produce strong geomagnetic storms is usually faster, and the  10.1029/2023SW003522 9 of 12 time of shock generated by them to reach the Earth is shorter .
Further analysis of events according to different Kp indexes (Figure 7) shows that the number of missed events decreases with the increase in Kp value. The precision rate of events under different Kp indexes is around 80%. There are 30 events affected by GLE, of which only eight events failed to predict successfully. It is worth pointing out that the nine events (Kp ≥ 7) where predictions failed are not a result of the reconstructed precursor failing to reach the threshold. Rather, the predictions failed due to the impact of GLE events or the close CME. The precursor amplitude was already far above the threshold when the CME outbreak of the corresponding event, leading us to classify these nine events as negative. Figure 8 displays the distribution of leading time and quiet mean values among the 129 storms that were successfully predicted. From the left panel, we know the leading time is mostly between 20 and 80 hr. Meanwhile, the right panel presents the quiet mean value, which is calculated based on CRI during the quiet period before the storm. This value is closely related to CRI, which has significant inter-annual and seasonal variations, but due to application of SWM, unlike CRI, this value for different events concentrates in a narrow range of 25-75. Besides, the threshold value is also determined by this value in our work, and if the threshold is not sensitive to periodic changes, the threshold setting will be more generalized.

Summary
Predicting CME-driven non-recurrent geomagnetic storms is essential in space weather forecasting. Although it has been proven feasible to reconstruct precursors from CRI in the frequency domain to predict non-recurrent geomagnetic storms (Ye et al., 2022), this method is limited to events with Kp ≥ 7 and without GLE, due to the influence of periodic signals and GLE events in the CRI. This study introduces the SWM to eliminate the influence of periodic signals and GLE, and investigates all 229 CME-driven events (including 63 events without geomagnetic storms and 166 events with different strength geomagnetic storms) during the 23rd and 24th solar activity cycles. The result shows that the storms could be predicted 50.4 hr in advance on average using the prediction model proposed in this paper, and 129 of the 166 storm events were successfully predicted. The main results of this investigation are as follows.
1. The SWM can efficiently eliminate the impact of periodic signals and GLE, enabling the model to predict the storms with Kp = 5 and 6 and the storms accompanying GLE. 2. With the strength of geomagnetic storm increase, the amplitudes of precursor signals gradually increase, the number of missed events decreases, and the mean leading time under different Kp index decrease. This findings indicates that the precursor extracted from CRI have the potential to predict the onset and strength of geomagnetic storms, as well as the onset time of storms. 3. When the threshold is set to 1.2 times the base value, the predicted results of the model are optimal, and the average forecast lead time can reach 50.4 hr. And in other studies, prediction models using deep learning algorithms usually reach optimal performance with a lead time about 6 hr (Tasistro-Hart et al., 2021).
In the current work, we studied all the CME-driven events in two solar activity cycles to verify the precision and reliability of the prediction model. In future investigations, we plan to extend the cosmic ray data to multiple stations, expand the sample size of the prediction model, and further improve the precision and recall rate of the model for predicting CME-driven non-recurrent geomagnetic storms.

Data Availability Statement
The cosmic ray data (http://cosmicrays.oulu.fi/) as detected by Neutron Monitor Network have been acquired from Database of the University of Oulu, Finland. The list of CME-driven events is available at Data Set S1. The data processed by SWM (C. Wang & Ye, 2023b) are available at https://doi.org/10.5281/zenodo.8093239. The code used for analysis (C. Wang & Ye, 2023a) is available at https://doi.org/10.5281/zenodo.8093257.