F10.7 Daily Forecast Using LSTM Combined With VMD Method

The F10.7 solar radiation flux is a well‐known parameter that is closely linked to solar activity, serving as a key index for measuring the level of solar activity. In this study, the Variational Mode Decomposition (VMD) and Long Short‐term Memory (LSTM) network are combined to construct a VMD‐LSTM model for predicting F10.7 values. The F10.7 sequence is decomposed into several intrinsic mode functions (IMF) by VMD, then the LSTM neural network is utilized to forecast each IMF. All IMF prediction results are aggregated to obtain the final F10.7 value. The data sets from 1957 to 2008 are used for training and the data sets from 2009 to 2019 are used for testing. The results show that the VMD‐LSTM model achieves an annual average root mean square error of only 4.47 sfu and an annual average correlation coefficient (R) of 0.99 during solar cycle 24, which is significantly better than the accuracy of the LSTM model (W. Zhang et al., 2022, https://doi.org/10.3390/universe8010030), the AR model (Du, 2020, https://doi.org/10.1007/s11207-020-01689-x), and the BP model (Xiao et al., 2017, https://doi.org/10.11728/cjss2017.01.001). The VMD‐LSTM model exhibits strong predictive capability for the F10.7 index during solar cycle 24.

used a multilayer feedforward neural network to predict the F 10.7 index one day in advance using 11 days of historical values as the model input, achieving a correlation coefficient of approximately 0.93.C. Huang et al. (2009) applied Support Vector Regression to predict the value of the F 10.7 index from 2002 to 2006 with a mean absolute percentage error ranging from 2.71% to 5.56%.Warren et al. (2017) introduced a linear prediction model of F 10.7 .Wang et al. (2018) established a linear multi-step prediction model that combined correlation and different parties to enhance the multi-step prediction performance of the F 10.7 index.These machine learning models could be used to address some issues under specific circumstances, but most of them were usually trained at a shallow level and couldn't extract the hidden information within the data, especially the intricate temporal cyclic changes.
However, deep learning methods were capable of learning the periodic variation of temporal data by undergoing deep and extensive training to optimize the loss function to the fullest (Kingma & Ba, 2014).Xiao et al. (2017) employed a Back Propagation Neural Network to predict the F 10.7 index, which exhibited better short-term prediction accuracy compared to other models.Luo et al. (2022) combined Convolutional Neural Networks with Long Short-Term Memory (LSTM) to predict the value of F 10.7 for the next 27 days, spanning from 2003 to 2014.Gao et al. (2022) combined the sunspots number into LSTM to make a short-term prediction method for F 10.7 in the next 7 days based on a 54-day solar radiation flux index, with a root mean square error (RMSE) 11% lower than that of the Space Weather Prediction Center (SWPC) in America.W. Zhang et al. (2022) used the LSTM model for short-term prediction of the F 10.7 index and verified that the LSTM model outperformed ordinary neural network models.
At the same time, the VMD algorithm has been widely applied in many works.Niu et al. (2018) established the VMD-ARIMA-HGWO-SVR model to improve the stability and accuracy of container throughput prediction.Error analysis and model comparison results showed that the VMD was more effective than other decomposition methods such as CEEMD and WD.Abdoos (2016) combined VMD with Extreme Learning Machines for shortterm wind prediction, which had advantages in accurate prediction and saving computational time compared to previously reported methods.Y. Zhang et al. (2019) proposed a Mixture model based on VMD-Wavelet Transformation (VMD-WT) and Principal Component Analysis-Back Propagation Screening-Radial Basis Function (PCA-BP-RBF) neural network for short-term wind speed prediction.The experimental results showed that VMD-WT could better solve the problems of modal aliasing and endpoint effect, making the periodic characteristics of intrinsic mode functions (IMF) more obvious and improving the prediction performance.
At present, prediction models of the F 10.7 index have good predictive performance.However, previous models rarely consider the nonlinear problem of the prediction sequence itself and the prediction accuracy still has room for improvement.To improve the prediction accuracy of the F 10.7 index, the VMD algorithm (VMD) is introduced to decompose the F 10.7 index for reconstructing the prediction.The VMD algorithm effectively addresses the periodicity and strong nonlinearity of time series data.Meanwhile, the LSTM neural network exhibits good prediction ability for time series with low complexity and low nonlinearity.Therefore, the VMD method and the LSTM model are combined to construct a VMD-LSTM prediction model for the F 10.7 index in an attempt to further improve the prediction accuracy of the F 10.7 index.

Data
The daily values of the F 10.7 index are obtained through measurements by radio observatories in Canada, and they can be accessed for download from the National Oceanic and Atmospheric Administration's website.For this study, we utilized the F 10.7 index data spanning from 1957 to 2019.Specifically, the data set from 1957 to 2008 was utilized as the training set, while the data from 2009 to 2019 constituted the testing set.These training and testing sets are visually depicted by the black and blue lines in Figure 1, respectively.

LSTM
Since the development of deep learning methods, a relatively systematic and complete LSTM framework has been developed over several generations and has been widely used in many fields.The core of the LSTM model is the cell state and gate structure.The cell state acts like a pathway that continuously transmits relevant information obtained during sequence processing.Therefore, information from earlier time steps can also reach cells at later time steps, effectively overcoming the impact of short-term memory.The main function of the gate structure is to add and delete information.Combining these two mechanisms, the LSTM model can determine which information needs to be remembered and forgotten in sequence processing (Hochreiter & Schmidhuber, 1997).The detailed algorithm is outlined below and illustrated in Figure 2.
The specific implementation steps of LSTM are as follows (Wei et al., 2018): 1.The data h t−1 from the cyclic core of the previous time step and the input feature X t at time t are spliced into the sigmoid activation function of the neural network layer, and the generated data f t is a number between 0 and 1.It represents the ratio of cell state C t−1 that should be retained.The number 0 means that all C t−1 data are forgotten, and the number 1 means all C t−1 data are retained.
2. There are two steps to add new information to the cell state.The first step is to input the gate layer to determine whether to update the data.The second step is a tanh neural network layer that generates a candidate vector  C to be added to the cell state.The result vectors from the first and second steps are dotted and multiplied to determine the updated cell state.
3. The forgetting gate and the input gate realize the forgetting and remembering of long and short-term information.= ⊙ −1 + ⊙ ̃ (4) 4. Finally, the output data is determined according to the cell state, the input information X t at time t, and the cyclic kernel h t−1 at time t − 1.The probability vector of the input gate layer and the candidate output result vector are dotted and multiplied to get the final output h t .
Where σ is the sigmoid excitation function, tanh is the hyperbolic tangent function, ⨀ represents multiplication by elements, i t is the input control quantity, W i and W c represent the weights of input gates and control units, respectively.The b i and b c are biases of control unit, W f represents the weights of forget gate and control units, The b f is biases of forget unit, W o and b o are the output gate weights and biases, respectively.
Unlike the recursive decomposition mode of the Empirical Mode Decomposition algorithm (EMD) (N.Huang et al., 1998), the VMD algorithm converts the signal decomposition into a variational decomposition mode.Through multiple adaptive Wiener filter banks, the decomposed components can be adaptively segmented.This is very effective in overcoming the phenomenon of pattern confusion arising in the decomposition, that is, the same component appears in different frequency bands or different components appear in the same frequency band.The specific implementation steps of VMD are as follows: After the Hilbert transform, the obtained analytical signal is analyzed, and the unilateral spectrum is calculated.Then it is multiplied by the center band modulation to the corresponding baseband.The norm of the demodulated signal gradient is computed and the signal bandwidth is estimated for each mode.The constrained variational problem is as follows: To transform the constrained variational problem into an unconstrained variational problem, a LaGrange multiplier operator and a quadratic penalty factor α are added.In the presence of Gaussian noise, α ensures the accuracy of signal reconstruction and maintains the stringency of the constraint, to obtain the optimal solution of the constrained variational problem.The expression is as follows: where u k represents each modal function, ω k is the center frequency of each mode, and k = 1, 2, …, N. The λ is the Lagrangian multiplier, δ is the Dirac distribution, and f(t) is the analytic signal.
Then, a multiplicative alternating direction algorithm is used to solve the above problem, and the components and their center frequencies are continuously updated to obtain the optimal solution.In general, VMD is a decomposition mode that converts the signal to decomposed into non-recursive and variational modes, which can well decompose noise signals.The overall framework of VMD is a variational problem.Assuming that each mode has a finite bandwidth with different center frequencies, each mode and its corresponding center frequency are continuously updated by the alternating direction method using the multiplication operator, after the noise signal is decomposed, each variable mode component and its center frequency can be obtained.

Genetic Algorithm
In the context of signal processing using VMD, the decomposition results are significantly influenced by the optimal number of decomposition modes (K) and penalty parameters (α).If K is excessively high, spurious components may be generated, and if it's too small, relevant components may be lost.On the other hand, the value of α impacts the bandwidth of the decomposed Intrinsic Mode Functions (IMFs)-higher α results in narrower bandwidths, while lower α leads to broader bandwidths.Thus, the optimization of parameters K and α is crucial.
For determining the optimal parameters K and α in VMD, a popular method named as Genetic Algorithm (GA) was selected for VMD parameter optimization due to its merits in universality, search efficiency, and strong global optimization ability when compared to other optimization algorithms like ant colony and particle swarm algorithms (Chen et al., 2023).GA is a nonlinear global optimization algorithm grounded in natural selection and genetic principles.The process of optimizing VMD parameters using GA involves six main steps: encoding, population initialization, fitness evaluation, selection, crossover, and mutation.The fitness evaluation step is particularly critical in guiding the optimization process, necessitating the definition of a suitable fitness function to assess the individual's proximity to the optimal value.
In this study, the GA was employed to optimize the modal number K and penalty factor α for VMD.The fitness function chosen was the envelope entropy, which signifies the sparse characteristics of the original signal.A higher envelope entropy value indicates more noise and less feature information in the IMF, and vice versa.The envelope entropy (Ep) of signal x(i) (i = 1,2, …, N) can be represented by Formula 5.The a(i) in the formula is the envelope signal obtained by Hilbert demodulation of k modal components decomposed by VMD, ε(i) is the probability distribution sequence obtained by calculating the normalization of a(i), N is the number of sampling points, and the entropy value of the probability distribution sequence ε(i) is calculated as the envelope entropy Ep.
In the context of GA-optimized VMD, the parameter settings were defined as follows: the range for K was [2, 10], the range for α was [0, 5000], the number of iterations was set to 50, the population size was 30, the crossover probability was 0.9, and the mutation probability was 0.1.We first optimize the hyperparameters of VMD using the training data from 1957 to 2008, then apply the best VMD hyperparameters obtained from this optimization process to decompose both the training data from 1957 to 2008 and the testing data from 2009 to 2019.This is to ensure that the model is constructed solely based on information within the training set and the envelope entropy corresponding to an individual was calculated.Iterative updating using an appropriate convergence factor was performed until termination conditions were met, leading to the determination of optimal VMD parameters.

VMD-LSTM
As an enhanced recurrent neural network, LSTM networks can effectively address not only the issue of long-distance dependence, a challenge for traditional RNNs, but also the problems of exploding or vanishing gradients commonly encountered in neural networks (Graves, 2012;Tan et al., 2018).It is very effective in processing sequence data.It can reduce the nonlinearity and complexity of sequences through decomposition.Therefore, we combine the LSTM neural network model and the VMD algorithm to predict the F 10.7 index one day in advance.This combined method is called the VMD-LSTM model.Figure 3 illustrates the primary process of the combined forecast.
1. We applied the VMD algorithm separately to decompose the training data from 1957 to 2008 and the testing data from 2009 to 2019, resulting in three components and a residual.2. These decomposed components were trained separately using an LSTM model with a stacked LSTM with two hidden layers and 50 neuronal data per layer.After finishing the training of the LSTM model, we can get the prediction results of each component.3. Finally, by adding up the predictions of each component one day ahead, we can obtain the predicted value of F 10.7 one day ahead.
The VMD-LSTM is a rolling prediction model, where the time step of the model is 7 days.And it means that the F 10.7 value 1 day ahead is predicted by the historical values of the previous 7 days.The data was processed by Min-Max Normalization methods.This method scales the data within a specified range, typically between 0 and 1.The formula for Min-Max Normalization is Here,   is the F 10.7 data,  normalized is the normalized value,  min is the minimum value in the data set, and  max is the maximum value in the data set.The prediction model consists of a two-layer LSTM network.It uses the Adam optimizer and the batch size is set to 32.Meanwhile, the learning rate is set to 0.001 and the number of epochs is set to 100.
In this work, two evaluation metrics were selected to measure the performance of the model, namely the correlation coefficient (R) and the RMSE.
where N is the number of samples, X i is the predicted values, Y i is the observed values,   is the mean of X i , and   is the average of Y i , respectively.R represents the correlation between the observed value and the predicted value.The closer the value approaches 1, the better the predicted value matches the observed value.RMSE reflects the deviation between the predicted value and the observed value.The smaller the value, the better the prediction model.

Analysis of Decomposition Results
GA-VMD method requires using the GA algorithm to first find the optimal parameter combination corresponding to the signal.The change in fitness with the number of iterations during the optimization process is shown in Figure 4.As Figure 4 shows, the best fitness is reached at the 25th iteration.The optimal parameter combination (K, α) = (3, 2626) was found through the algorithm's search process.After determining the optimal parameter combination, the signal is then decomposed using VMD according to the optimal parameters to achieve the best decomposition effect.Figure 5 represents the IMF components after VMD decomposition, arranged from low frequency to high frequency.
For the sequence obtained after VMD, we utilize sample entropy to evaluate the sequence's complexity and total harmonic distortion (THD) to assess its nonlinearity.The extent of distortion is contingent upon the level of nonlinearity (N.Huang et al., 1998).Sample entropy, introduced by Richman and Moorman (2000), is a comprehensive metric for analyzing non-stationary time series.Its core principle involves quantifying the likelihood of generating new subsequences within the time series.The complexity of a time series is gauged by evaluating the probability of creating novel patterns within the signal and a higher probability of generating new patterns indicates a higher complexity of the sequence.Harmonic distortion refers to the extra harmonic components present in the output signal due to nonlinear elements when an audio signal is amplified, exceeding those present in the input signal.This distortion arises from the system's lack of complete linearity.We quantify it by expressing the percentage of the root mean square of the newly added total harmonic components to the original signal's RMS.
Table 1 gives the sample entropy and THD obtained for the F 10.7 data set.From Table 1, it can be observed that the Sample entropy of the F 10.7 signal (2.33) is higher than the Sample Entropy of its IMF components.This indicates that the original signal is relatively more complex.The Sample entropy increases sequentially for IMF1, IMF2, and IMF3, with the Sample entropy of RES being 1.50.Also, IMF3 has the lowest THD (1.09), followed by IMF2 (2.87), RES (6.97), and IMF1 (28.34), with the F 10.7 original signal exhibiting the highest THD (28.37).A higher THD value indicates a more pronounced influence of nonlinear elements within the signal.Overall, the F 10.7 signal displays a higher sample entropy, indicating a higher level of complexity and a potential presence of novel patterns.On the other hand, the high THD suggests a notable influence of nonlinear elements within the signal, particularly in the F 10.7 signal and the IMF1 component.

Prediction Results
In this study, the performance of the LSTM and VMD-LSTM models in predicting the F 10.7 index during low and high solar activity years is evaluated.The years 2009 and 2014 are selected for analysis.The results show that the VMD-LSTM model outperforms the LSTM model in terms of both RMSE and R. For the year 2009, the RMSE and R of the LSTM model are 1.13 sfu and 0.93, while those of the VMD-LSTM model are 0.89 sfu and 0.95.Similarly, for the year 2014, the RMSE and R of the LSTM model are 9.89 sfu and 0.94, while those of the VMD-LSTM model are 8.20 sfu and 0.95.Figures 6 and 7 show that the prediction deviation of the VMD-LSTM model is lower than that of the LSTM model and is relatively consistent with the observed values.However, as solar activity intensifies, the deviation between the predicted values of both models increases.In most cases, the prediction deviation of the VMD-LSTM model is lower than that of the LSTM model.In summary, the predicted values of the VMD-LSTM model are closer to that of the LSTM model.The LSTM model is typically trained in an end-to-end manner, mapping all the information of the time series into a vector.This vector integrates various dimensions of information, making it prone to overfitting and more sensitive to noise in the sequence.After undergoing VMD decomposition, the F 10.7 sequence is broken down into multiple sub-sequences.
Compared to the original sequence, each sub-sequence contains less information from different dimensions, thereby reducing the nonlinearity and complexity of the sequence.Simultaneously, this decomposition method effectively addresses common issues like endpoint effects and modal component aliasing in EMD methods.Hence, the VMD-LSTM model can achieve better prediction results.One is more concerned about the model performance in predicting the F 10.7 index during high solar activities.
Here, one case of the F 10.7 index, during the period from 24 June to 2 August 2014, is chosen for analysis.Figure 8 shows a comparison of two models (LSTM, VMD-LSTM) and the observed results during this period.As can be seen in Figure 6, the prediction values of the VMD-LSTM model are much closer to the observation of the F 10.7 index for most points.This indicates that the VMD can effectively improve the prediction ability and the VMD-LSTM model can accurately predict the subsequent values of the F 10.7 index, based on the analysis of the information of the training set samples.

Compared Results
To better evaluate the prediction performance of the VMD-LSTM model, we compare its results with those of the LSTM model (W.Zhang et al., 2022), BP models (Xiao et al., 2017), and AR models (Du, 2020).The BP model uses the output error to estimate the error of the direct leading layer in the output layer and then uses this error to estimate the error of the previous layer.Repeat this process to obtain the error estimates of all other layers.The AR model utilizes the dependency relationship between the historical time series of the predicted target and its values at different periods to establish a regression equation for prediction.The LSTM model solves the long-term dependency problem by introducing memory units and three gating mechanisms.The memory unit is responsible for storing sequence information, and three gating mechanisms (forgetting gate, input gate, and output gate) control the reading, writing, and retention degree of the memory unit.These three models are commonly used for processing time series.In contrast to the first two neural network models, the AR model is a statistical approach designed for processing time series data.The results are shown in Table 3, the VMD-LSTM model demonstrates lower RMSE values compared to the LSTM, AR, and BP models in most years, indicating its strong performance in predicting the F 10.7 index.However, for certain years, the performance of the VMD-LSTM model may slightly lag behind other models, possibly due to specific characteristics of the data.Further research and analysis are needed to investigate this.
Furthermore, we applied the same approach to predict the F 10.7 index for a 7-day horizon, and the results are summarized in Table 4.The table

Summary
In this study, we propose a VMD-LSTM model for the short-term prediction of F 10.7 , which combines the VMD algorithm with the LSTM network.First, the VMD algorithm is applied to decompose the F 10.7 data into subcomponents, and then the LSTM network is used to make predictions for each sub-component.Finally, the predicted values of the subcomponents are summed up to obtain the predicted value of F 10.7 index.The performance of the VMD-LSTM model is compared with that of the LSTM, AR, and BP models.The results demonstrate that the VMD-LSTM model outperforms the LSTM, AR, and BP models in terms of prediction accuracy.Specifically, the RMSE and R of VMD-LSTM are better than those of LSTM in the same year during Solar Cycle 24.The RMSE and R of LSTM during the entire solar cycle are 5.71 and 0.98, while the RMSE and R of VMD-LSTM during the entire solar cycle are 4.47 and 0.99, with the RMSE decreasing by 21.72%.The high accuracy of the VMD-LSTM is attributed to the effectiveness of the VMD algorithm in reducing the nonlinearity and complexity of the F 10.7 data series, which facilitates the LSTM network in capturing the internal patterns of series variation.It is a pity that our model is currently unable to achieve real-time prediction.In the future, unknown data cannot be decomposed for real-time prediction processes.Therefore, it is necessary to use rolling forecast methods.Predict one piece of data at a time and then add that piece of data to the known historical data.The new historical data is then used to predict the next data, and the process is repeated in turn.
It is also necessary to continuously update the parameters in the prediction model to gradually adapt to new inputs.Meanwhile, we can refer to Stevenson's work (Stevenson et al., 2022)

Figure 1 .
Figure 1.The Daily average value of F 10.7 index from 1957 to 2019.

Figure 2 .
Figure 2. Structure diagram of long short-term memory neural network.

Figure 3 .
Figure 3. Prediction process of combined Variational Mode Decomposition algorithm-Long Short Term Memory neural network model for F 10.7 index.

Figure 4 .
Figure 4. Genetic algorithm for Variational Mode Decomposition parameter optimization results.
exhibits a lower RMSE than the LSTM model, with an averaged RMSE of 4.47 sfu for VMD-LSTM and 5.71 sfu for LSTM during solar cycle 24, indicating a 21.72% reduction in RMSE for the VMD-LSTM.Furthermore, the VMD-LSTM model shows superior performance in terms of correlation coefficient compared to the LSTM model, with a correlation coefficient of 0.99 for the entire solar cycle, demonstrating the effectiveness of the VMD-LSTM model in predicting the F 10.7 index.

Figure 6 .
Figure 6.The comparison of the Variational Mode Decomposition algorithm-Long Short Term Memory neural network prediction values with the observations in 2009 and 2014.

Figure 7 .
Figure 7.The difference obtained by subtracting the predicted values of Variational Mode Decomposition algorithm-Long Short Term Memory neural network (LSTM) from the observed values, and the difference obtained by subtracting the predicted values of LSTM from the observed values in 2009 and 2014.
clearly indicates that the 7-day predictions generated by the VMD-LSTM model outperform those by the SWPC and a recent LSTM model with multiple inputs developed byGao et al. (2022).In Gao et al., the LSTM method is utilized for the 7-day prediction of F 10.7 based on their linear relationship.For the years 2009-2019, the VMD-LSTM model consistently demonstrated substantially lower RMSE compared to the M-LSTM and SWPC models, affirming its ability to provide highly accurate predictions for the 7-day forecast of the F 10.7 index.Notably, in 2014, the RMSE of the VMD-LSTM model closely aligned with that of the M-LSTM model, yet the VMD-LSTM model retains a slight edge, underscoring its competitive advantage in the 7-day prediction.In summary, the VMD-LSTM model showcases superior performance in predicting the F 10.7 index for a 7-day forecast, particularly when compared to other models.This advancement holds significant implications for the accurate prediction of solar activity and space weather forecasts.

Table 2
presents the prediction results of the F 10.7 index one day ahead using both the LSTM model and the VMD-LSTM model.The VMD-LSTM model Figure 5. Decomposition results of the Variational Mode Decomposition method for the F 10.7 index.

Table 1
Gives the Sample Entropy and Total Harmonic Distortion Obtained for the F 10.7 Data Set

Table 2
Prediction Results of Long Short-Term Memory and Variational Mode Decomposition Algorithm-Long Short Term Memory Neural Network for the F 10.
7 Index in Solar Cycle 24 Figure 8.The comparison of the Variational Mode Decomposition algorithm-Long Short Term Memory Neural Network model prediction with the observations from 24 June to 2 August 2014.

Table 3
and use the VMD-LSTM model for medium-term forecasting, which is our next research direction.Comparison of Root Mean Square Error Between the Variational Mode Decomposition Algorithm-Long Short Term Memory Neural Network and Other Models (1-Day Prediction)

Table 4
Comparison of Root Mean Square Error Between the Variational Mode Decomposition Algorithm-Long Short Term Memory Neural