A hybrid deep learning approach by integrating extreme gradient boosting‐long short‐term memory with generalized autoregressive conditional heteroscedasticity family models for natural gas load volatility prediction

Natural gas load forecasting provides decision‐making support for natural gas dispatch and management, pipeline network construction, pricing, and sustainable energy development. To explain the uncertainty and volatility in natural gas load forecasting, this study predicts the natural gas load volatility. As the natural gas load volatility has the time‐series features, along with long‐term memory, volatility aggregation, asymmetry, and nonnormality, this study proposes a natural gas load volatility prediction model by combining generalized autoregressive conditional heteroscedasticity (GARCH) family models, XGBoost algorithm, and long short‐term memory (LSTM) network. The model first takes the GARCH family models parameters of sliding estimation and meteorological factors as the influencing factors of volatility, and then it screens these influencing factors through the extreme gradient boosting (XGBoost) algorithm. Finally, the selected important features are input into the LSTM network to predict the volatility, and the 90% confidence interval of the volatility is calculated. Compared with a variety of single and combined models, the model proposed in this study has an average reduction of 45.404% in the evaluation index of mean squared error. The experimental results show that the model proposed in this study has a good performance and accuracy in predicting the volatility of natural gas load.


| INTRODUCTION
With the implementation of the "carbon emission peak" (carbon emission peak refers to an inflection point of greenhouse gas emissions) and "carbon neutrality" (carbon neutrality refers to achieving net-zero carbon dioxide emissions) policy, natural gas, as a clean energy source, has become the spotlight in the energy field. 1 In recent years, the global natural gas industry has developed rapidly in the background of the energy transition process. Forecasts show that global demand for natural gas is expected to continue to grow over the next 30 years and it is expected to peak in the mid-2030s and mid-2020s. 2 Due to the COVID-19 pandemic in 2020, global energy development has been continuously and negatively affected, especially coal and oil, but the demand for natural gas is still increasing rapidly, and it is expected to reach 4.6 × 1012 cubic meters by 2030. The demand mainly comes from China, India, the Middle East, and Southeast Asian countries. 3 A reasonable estimate of the natural gas load can enable the natural gas industry to develop rapidly and orderly. In the long run (in the upcoming few years), natural gas producers and operators hope to obtain sufficient benefits from the production and sale of it; the natural gas management department hopes to formulate reasonable policies to help certain regions solve energy problems. In a relatively short period of time (in a few hours or a few days), natural gas operators hope to control the peak value on the basis of the predicted load value, so as to ensure the production and life of the people; The energy management department expects to obtain the predicted load data in advance, so as to reasonably dispatch related energy (such as electricity and coal) to maintain the balance of energy supply and demand in the region. It is clear that different functional departments have different requirements for natural gas load forecasting, and the orderly operation of society is inseparable from reliable natural gas load forecasting. In previous studies, it is a more traditional method to predict the load of natural gas at a certain time based on the historical load of natural gas. 4 This is called point prediction, that is, predicting a certain single value at a certain point in time. However, in actual production and life, the load forecast of natural gas cannot maintain the ideal model, and the actual load of natural gas may deviate greatly from the predicted value. This difference may make it difficult for natural gas producers and operators to make optimal decisions. In addition, the predicted value of natural gas load cannot fully reflect the change of load, so it is difficult to reasonably measure the supply and demand of natural gas. Given the uncertainty of natural gas load changes, interval prediction of the natural gas load is more scientific and reasonable, and more flexible than point prediction. 5 In natural gas load forecasting, volatility is a key influencing factor of uncertainty in natural gas load forecasting. Volatility complements point prediction and makes interval prediction feasible by determining the range of variation in load forecasting.
The natural gas load volatility effectively takes into account the concentration and dispersion of natural gas load, the specific value of natural gas load and the probability of occurrence of various load values, and the maximum and minimum natural gas load. By predicting the natural gas load volatility, the standard deviation of the natural gas load can be reasonably estimated, and the maximum and minimum load values in a certain period of time can be further calculated by means of probability statistics. This kind of interval prediction results can better provide data-based decision support for natural gas producers and operators. Therefore, this study focuses on the prediction of natural gas load volatility. Based on the following two points: (1) The natural gas load volatility is difficult to predict and the prediction accuracy of the volatility is low; (2) The factors affecting the natural gas load fluctuation are nonlinear and time-varying. This study proposes a new hybrid model based on parameters of the GARCH family and meteorological features (GFMF)-extreme gradient boosting (XGBoost)-long short-term memory (LSTM) for natural gas load volatility prediction. The model first sliding estimates the parameters of generalized autoregressive conditional heteroscedasticity (GARCH), threshold autoregressive conditional heteroskedasticity (TARCH), exponential generalized autoregressive conditional heteroscedasticity (EGARCH), and power autoregressive conditional heteroscedasticity (PARCH) models in the GARCH family of models, second the GARCH family of model parameters as well as meteorological factors are filtered by XGBoost for features, and the features with larger importance values are selected and input into the LSTM, thus predicting the natural gas load volatility. And it is further compared with other single and combined models such as autoregressive integrated moving average model (AR-IMA), multiple linear regression (MLR), support vector regression (SVR), multi-layer perceptron (MLP), convolutional neural network (CNN), and so on. In this model, the parameters of the GARCH family models can effectively explain the volatility of the load, XGBoost can select features reasonably to improve the model's prediction accuracy, and the LSTM network has strong learning and adaptive capabilities. The experimental results show that the model proposed in this study has a good predictive ability, and the predictive accuracy of the model is significantly improved.
The contributions of this study to existing research are as follows: (1) The calculation method of natural gas load volatility is more explicit, which distinguishes itself from the previous concept of volatility in the field of financial assets. In addition, a specific statistical description and analysis of natural gas load volatility are carried out, which is different from the previous volatility analysis of prices and stocks. (2) A hybrid deep learning model for natural gas load volatility forecasting is established based on the GARCH family and XGBoost-LSTM, which can effectively capture the nonlinearity and diversity of natural gas load volatility and its influencing factors.
(3) In the previous literature, there is still a lack of research on natural gas load volatility prediction. To provide certain empirical evidence, we conducted an indepth evaluation of different GARCH family models, a combination of models of GARCH family, and deep learning algorithms. The results show that the appropriate GARCH family parameters can effectively improve prediction accuracy. The GFMF-XGBoost-LSTM model performs best.
The organizational structure of this study is as follows. The second section briefly summarizes the current research status of natural gas load volatility. The third section analyzes the data used in this study and explains the construction of the model. The fourth section mainly describes the experiment from the division and normalization of the data set, the experimental results, and analysis. The fifth section summarizes the research results and limitations of this study.

| RESEARCH STATUS
Volatility forecasts are widely used in the financial asset industry. To model and predict volatility, Engle proposed an autoregressive conditional heteroscedasticity model (ARCH) to describe the clustering and persistence of financial market volatility. 6 But it is difficult to reflect the characteristics of long-term memory and nonlinearity of actual data. In view of the shortcomings of the ARCH model, Bollerslev proposed a generalized autoregressive conditional heteroscedasticity model (GARCH). Although the GARCH model effectively improves the accuracy of volatility prediction, it still has certain limitations. For example, the GARCH model cannot describe the asymmetry of sequence fluctuations. 7 For this reason, many scholars have conducted extensive research on ARCH and GARCH models, and gradually formed GARCH family models. Related research shows that the GARCH family models have the advantages of strong explanatory and good forecasting effect in the prediction of financial asset volatility. Cifter A uses Markov transform generalized autoregressive conditional heteroscedasticity model (MS-GARCH) to predict electricity prices. The research results show that the MS-GARCH model has a more accurate forecasting effect than the standard GARCH model. 8 Cai and others established the HSMA-realized HAR GARCH model (Heteroskedastic single measure adaptive Realized HAR GARCH model), which effectively improves the model's goodness of fit, volatility and VaR prediction accuracy compared to the Realized HAR GARCH model. 9 Some of the above research results are mainly researched and improved on the basis of GARCH family models. However, with the development of economic and social informatization, people find that traditional financial time series methods are difficult to deal with large amounts of data and multiple features. As a result, intelligent prediction methods combined with machine learning or deep learning have developed rapidly in the field of volatility prediction. The network structure of deep learning is a deep neural network. These neural networks can tolerate erroneous data and find out nonlinear relationships among data. Compared with the econometric model, it does not have too many parameter predictions, so it can quickly and accurately learn the law from the data. 10 Kim et al. used a new hybrid LSTM model to predict stock price fluctuations. This model combines the LSTM model with various generalized autoregressive heteroscedasticity (GARCH) models to effectively improve the prediction performance. 11 Hu et al. combined the GARCH model with LSTM and ANN to propose a new hybrid method for predicting copper price fluctuations. Experimental results show that this method significantly improves the prediction performance of the model. 12 Su used eightparameter volatility forecasting models and eight combined volatility forecasting models to discuss whether neural network methods, leverage effects, and nonnormal return distribution settings can improve the volatility forecasting effect. 13 Vidal et al. added LSTM to CNN to predict the volatility of gold. The research results show that compared with GARCH and LSTM models, the hybrid model has a greater improvement. 14 In the field of electricity forecasting, load forecasting has also been carried out by researchers with the help of relevant models in the GARCH family. Ghasemi-Marzbali A proposes a method for forecasting short-term electricity prices and loads that combines wavelet transform, least square support vector machines (LSSVM), generalized autoregressive conditional heteroscedasticity (GARCH), and the virus colony search algorithm (VCS). 15 Zolfaghari et al. proposed a hybrid forecasting method combining wavelet neural network (WN), ARIMA-GARCH family models for electricity forecasting. 16 Bikcora et al. proposed a model for forecasting the probability distribution of daily electricity demand based on the ARMA-GARCH, CAViaR, and CARE models. 17 These models and methods provide some reference for the prediction of natural gas load volatility.
The fluctuation of the natural gas load is similar to that of stocks and prices, but there are also inconsistencies. On the one hand, the change of natural gas load is also closely related to weather and many other factors. The forecast of load volatility is easily affected by many nonlinear factors. Therefore, the standard GARCH family models are difficult to reasonably predict the volatility of natural gas load. On the other hand, too many influencing factors will affect the effect of prediction, and appropriate influencing factors can effectively improve the prediction accuracy of the model. Kang et al. combined dynamic modeling (EMD), complex network (CN), and XGBoost, and improved the accuracy of urban traffic travel time prediction through feature screening. 18 Zhang Y et al. used the XGBoost model to screen out key features to predict the concentration of PM2.5 pollutants in Beijing in the next 24 h. 19 In addition, natural gas loads are susceptible to the influence of pipeline materials, and different materials may cause certain fluctuations in natural gas loads when external influences vary. 20,21 Bajcar et al. studied the effects of traffic-induced vibrations on natural gas pipelines. Dynamic stresses from traffic vibrations affect the loading of natural gas pipeline materials and may have some impact on natural gas loads and pose some safety threat to the surrounding environment. 22 Liang et al. investigated the mechanical behavior of common polyethylene natural gas pipelines when ground is overload. Their study shows that the point of maximum stress shifts from the top to the middle of the pipeline as the ground load increases. 23 It can be easily found that the stresses to which the transport pipeline is subjected can have an effect on the natural gas load and influence the fluctuation of the load. At present, there are few studies related to natural gas load volatility prediction, but some similar studies have also been done, such as volatility analysis and prediction in the wind power industry. For example, Shen et al. examined the volatility prediction ability of different GARCH family wind power models. The results show that the use of the Markov state switching GARCH model to predict wind power volatility is significantly better than the traditional GARCH model. 5 Agrawal and others first established an autoregressive model of high-frequency data on wind farms and calculated the volatility measurement of 5-min data through this model. 24 Chen made systematic research on wind power forecasts based on the volatility model. 25 The mature application of wind power industry volatility forecasting provides an effective reference for natural gas load volatility forecasting.
From the above literature, we can find: (1) The parameters of the GARCH family of models have a strong explanatory power for volatility analysis and volatility modeling.
(2) Compared with the single GARCH family model, the performance of the hybrid model composed of the GARCH family and the neural network was improved.
(3) Deep neural networks such as LSTM have the ability to learn complex nonlinear relationships from data sets. (4) The prediction of natural gas load volatility has strong practical significance, but there is a lack of relevant research. (5) Appropriate input features improve the prediction accuracy of the model, while overfull input features will have a negative impact on the prediction. XGBoost can effectively screen important features.

| Data source and preprocessing
In previous studies, the researchers directly forecast the natural gas load, ignoring the fluctuation of the natural gas load, which makes it difficult to reasonably estimate natural gas load intervals. In the study of this paper, the data are taken from the real load values of a metering branch under a natural gas station in Xi'an City. The natural gas station, located in Weiyang District, Xi'an, is an important node of the entire natural gas pipeline network in Xi'an. Weiyang District is one of the central urban areas of Xi'an, with a dense population and developed industry. Therefore, the natural gas station here is complicated, with one intake pipeline and three output pipelines. This station is one of the relay stations for natural gas supply in Xi'an, so the load data of a metering branch under this station is selected in this study. This metering branch provides mainly urban gas and is therefore representative and universal in terms of urban gas forecasting. The subject of this study is natural gas hourly load volatility, and the load values of a metering branch are used as the sample for the study, with a sample interval of approximately 20 s (366,920 sample points in total) from May 24 to August 17, 2021. This is shown in Figure 1, as can be seen herein, the real load data variation of natural gas has a certain cyclical variation pattern, but at the same time the variation fluctuates sharply. The transition between two peaks and trough load is not very gentle, but almost abruptly increased or decreased. From this load data, it is possible to calculate the gas hourly load fluctuation rate. In addition, the parameters of the GARCH family model were considered in addition to the construction of the feature data set. It was also considered that the load influencing factors of natural gas include meteorological factors such as temperature, weather, season, and wind direction, 26-28 so we added these meteorological factors to the feature set of natural gas load volatility for the experiment. Therefore, the feature data set includes the main parameters of the GARCH family of models, weather, temperature, wind direction, season, and so on.
There are still some abnormal data in the unprocessed original data set. This is because errors in the collection and recording of these high-frequency data will result in missing or abnormal data. The use of unprocessed data may affect the accuracy of prediction, so the original data needs to be filtered and processed. This study uses the outlier detection method and the horizontal processing method to correct the abnormal data. The corrected data becomes relatively smooth and closer to the true value of the load, as shown in Figure 2 (special reference: the amount of data is huge, so only load values for 100 consecutive time points are shown).

| Definition of natural gas load volatility
The word "volatility" comes from the financial asset industry. In a narrow sense, volatility refers to the standard deviation of continuous compound interest returns within a certain period of time. In a broad sense, volatility refers to the degree of unpredictable changes in variables over time. 29 The volatility in finance is a measure of the uncertainty of the return on assets and reflects the risk level of the return on assets. The natural gas load volatility reasonably reflects the changing law of natural gas load. In probability statistics, the average F I G U R E 1 Natural gas load data F I G U R E 2 Abnormal data processing value and standard deviation can reflect the most basic characteristics of variables, in which the average value can reflect the degree of concentration of the variable, and the standard deviation can reflect the degree of dispersion of the variable. 30 The ratio of the standard deviation to the average value indicates the relative size of the dispersion of the variable. This ratio is called the coefficient of variation, which can well reflect the law of variation of the variable. Therefore, this study defines the natural gas load volatility based on the coefficient of variation, that is, the f (natural gas load volatility) is the ratio of s (standard deviation of the load) to x (the average value of the load), where the standard deviation and the average value are calculated based on the number of samples(n). The formula is as follows: The calculation of the volatility of natural gas hourly load is to divide the standard deviation of the load by the average value of the load in a unit hour. Thereinto, the average hourly load is the arithmetic average within the hour, as shown in Figure 3. The standard deviation of the hourly load is the standard deviation based on a given sample within the hour, as shown in Figure 4.

| Descriptive statistics
After data processing, it is necessary to further analyze the data characteristics of natural gas hourly load volatility, which can reflect the basic law of natural gas load volatility. The hourly load volatility from 00:00 on May 24, 2021, to 23:00 on August 17, 2021, is plotted as a time series diagram, as shown in Figure 5. From 7:00 to 9:00 and from 22:00 to 0:00 every day, the volatility is large and changes strongly, and the volatility is small and changes stably in other time periods.
As shown in Figure 6 and Table 1, the smaller the standard deviation, the weaker the dispersion of the sequence. The standard deviation of the sequence is 0.069631, indicating that the dispersion of the sequence is weaker. The skewness of the sequence is 3.944191, which is greater than 0, indicating that the sample has a right-skewed distribution. The kurtosis of the sequence is 20.64961, which is greater than the peak value 3 of the normal distribution, indicating that the sequence has a tall and thin distribution with the characteristics of sharp peaks and thick tails. The JB statistic is 32141.22, and the p-value is 0, so the sequence does not obey the normal distribution. In addition, through the distribution characteristics and changing laws of the above chart, we can see that the sequence distribution may be asymmetry, and there may be a certain phenomenon of volatility aggregation, which needs to be further verified. In summary, the hourly load volatility has the characteristics of right skewness, peak and thick tail, non-normality, and asymmetry.

| Stationarity test
In the analysis of time series data, the stationarity test is indispensable, which can avoid false regression problems. The commonly used stationarity test method (ADF test) is used here, in which the SIC criterion is selected as the test standard, and the ADF test results are shown in Table 2. Compared with the absolute value distribution of 1%, 5%, and 10%, the absolute value of the T-value is greater than the above three levels. It can be concluded that the original hypothesis is rejected at the above three levels, and there is no unit root in the sequence. And the test p-value of the load rate series is 0, indicating that the original hypothesis is completely rejected. Therefore, it is a stationary series.

| Autocorrelation test
, if there is a correlation among the various period values of the random error term, the covariance u u t s t s n cov( , ) 0( , , = 1, 2, …, ) t s   , then there is autocorrelation between the random error terms. As shown in Table 3, the AC (autocorrelation coefficient) value and PAC (partial autocorrelation coefficient) value of this sequence are not significantly different from zero value when the significance level equals 5%. The p-values of Q statistics are all less than 5%. It can be seen that the null hypothesis is rejected at the 5% significance level, where the serial autocorrelation is considered. It shows that the hourly load volatility of natural gas has a long memory.

| ARCH effect test
If the variance of the random error term u t in the model , then the random error term u t has heteroscedasticity. The ARCH process is σ a a σ a σ a σ υ Where p is the order of the ARCH process and υ t is the random error term.
After the stationarity test and autocorrelation test are performed on the sequence, the residual sequence obtained by the model estimation also needs to be an ARCH test. 31 If the residual sequence passes the ARCH effect test, indicating that the residual sequence has an ARCH (autoregressive conditional heteroskedasticity) effect, and GARCH can be used to model and analyze the natural gas load volatility. On the contrary, it means that the sequence does not meet the GARCH modeling conditions.
(1) Graphical test of ARCH effect test: Natural logarithm processing is performed on the sample sequence sp { } t , that is, the sequence sp {ln( )} t is estimated as the dependent variable, so the basic form of the estimation is: Estimated by the ordinary least squares, the results are as follows: By drawing the residual sequence of the average value equation regression, it can be judged whether there is an ARCH effect according to the volatility cluster phenomenon presented in the time series diagram, as shown in Figure 7. Through the residual time series diagram, we can observe the clustering phenomenon of fluctuations: it is smaller in a period of time (e.g., the 625th hour to the 800th hour), and larger in other periods of time (e.g., the 1200th hour to the 1300th hour), which shows that the error term may have conditional heteroscedasticity. (2) ARCH LM test: The test of ARCH LM mainly applies F-statistic and chi-square statistics to judge whether there is an ARCH effect or not. 32 As shown in Table 4, the probability p-values of the F-statistic and the chi-square statistic are both less than 0.05, rejecting the null hypothesis, and the sequence has an ARCH effect. Therefore, the GARCH model can better fit the fluctuation sequence.

| Construction of hybrid forecasting model
These complex statistical characteristics mean that the natural gas load volatility is difficult to predict, and it is difficult for a single statistical method or machine learning algorithm to reasonably explain the characteristics of natural gas volatility. Through the above analysis, we know that the GARCH family models can explain the characteristics of natural gas load fluctuations well. Deep learning algorithms such as LSTM have the advantages of stronger learning ability, adaptability, portability, and so on, which not only conform to the long-term memory characteristics of natural gas load volatility but also can quickly and accurately predict natural gas volatility. Therefore, the combination of

| Analysis of predictability of load volatility
The change of natural gas load does not follow some strict rules and it is often affected by a variety of factors. Therefore, the change rule of the natural gas load is uncertain and nonlinear. From the characteristics of volatility and the load characteristics of natural gas, the characteristics of natural gas load fluctuations can be summarized: (1) Time series: Time series refers to the trend of one or some random variables with time. The natural gas load is taken as a random variable and the variance of this variable changes with time. The fluctuation of natural gas load conforms to the change form of the time series, they are a long-term trend, seasonal change, cyclical change, and irregular change.
(2) Long-term memory: The return on financial assets is similar to long-term memory, and the volatility of natural gas load also has long-term memory. According to the results of the autocorrelation test, it can be found that the volatility of natural gas load has a strong autocorrelation, indicating that historical information will affect the future volatility for a long time, and there is a correlation between the previous volatility and latter volatility.
(3) Volatility aggregation Volatility aggregation means a state that is from one unit of time to the next. Generally speaking, large fluctuations are often followed by large fluctuations, and small fluctuations are followed by smaller fluctuations. According to the above graph test of the ARCH effect, it can be found that there is fluctuation aggregation in the volatility of natural gas load.
(4) Asymmetry The volatility of time series mostly has an asymmetry, that is, the impact of positive information and negative information has a different degree of influence on the fluctuation of future load. In the financial industry, facing the impact of information of the same intensity, negative information usually has a greater impact on asset price fluctuations than positive information. Through the above descriptive statistics of natural gas load volatility, it can be seen initially that there is also an asymmetry in natural gas load fluctuation. In the following study, asymmetry modeling will be used to further reflect the asymmetry of natural gas load fluctuation.
(5) Non-normality When the kurtosis value of the sequence is greater than 3, it indicates that the sequence has the characteristic of sharp peaks and thick tails, which do not conform to the normal distribution. According to the above descriptive statistics, it can be judged that the natural gas load volatility conforms to a non-normal distribution.

| GARCH family models and parameter estimation
(1) GARCH model: When the ARCH model describes the dynamic change process of sequence volatility, many parameters need to be estimated, which leads to a more complicated and changeable estimation of the ARCH model. 33 The GARCH model has one more conditional variance term (σ2) than the ARCH model, so the complexity of parameter estimation is reduced, the model is more long-memory, and the time-lag structure is more reasonable and flexible. The GARCH model used in general empirical testing is GARCH (1, 1), so this study mainly applies the GARCH (1, 1) model. Equation (6) is the average value equation and Equation (7) is the variance equation.
Thereinto, in the average value equation, is the vector of explanatory variables, and γ γ γ γ = ( , , …, )′ k 1 2 is the vector of coefficients. In the variance equation, ω is a constant term, u t−1 2 is an ARCH term, and σ t−1 2 is a GARCH term. Estimate the average value equation and variance equation through the GARCH (1, 1) model, and the results are as follows: (2) TARCH model: The TARCH model is an asymmetric GARCH model, which refers to the use of dummy variables to set a threshold to distinguish the influence of positive and negative shocks on the law of fluctuation. 34 The commonly used TARCH model is TARCH (1, 1), which counts a dummy variable on the basis of GARCH (1, 1), and meets the following conditions: Then the variance equation of the TARCH model is shown in Equation (11).
Thereinto, d t−1 is a dummy variable. When u < 0 , indicating that the asymmetry effect occurs, and bad news will bring α γ ( + ) times the impact, otherwise d = 0 t−1 , indicating that the asymmetry effect does not appear, and the good news brings α times the impact. When γ 0  , it indicates that there is a leverage effect.
When γ > 0, it shows that the main function of the asymmetric effect is to increase the fluctuation. When γ < 0, it shows that the asymmetric effect is to reduce the fluctuation. The average value equation and variance equation are estimated by the TARCH (1, 1) model, and the results are as follows: (3) EGARCH model: The EGARCH model is also an asymmetric model, 35 but the difference is that the variance equation analyzes not σ t 2 but σ ln t 2 . When the estimated value of β σ ln t−1 2 in the variance equation is a positive number, it indicates that the sequence has a persistent phenomenon of fluctuations (cluster phenomenon). The variance equation of the EGARCH (1, 1) model is as follows: Thereinto, σ ln t 2 is the logarithm of the conditional variance, indicating that the leverage effect is exponential, so the predicted value of the conditional variance must be non-negative. When γ 0  , it indicates that there (4) PARCH model: The PARCH model is also an asymmetric model, 36 and its variance equation analyzes not σ t 2 but σ t h (h represents the power value). The variance equation of the PARCH (1, 1) model is as follows: Thereinto, h > 0, when i r (5) Parameter sliding estimation: The GARCH family models parameters used in this study include the main parameters of the GARCH model, TARCH model, EGARCH model, and PARCH model. As the parameters of the feature input, a fixed value is not used, but a sliding estimation method. 37 That is, the model parameters at time t are estimated from the volatility data from time t-21 to time t. The model parameters at time t + 1 are estimated from the volatility data from time t-20 to time t + 1, followed by sliding estimation. The sliding estimation process is shown in Figure 8, and some specific moments estimation parameters are shown in Table 5.

| LSTM
To solve the problem of gradient explosion and gradient disappearance, LSTM introduces a gate mechanism on the basis of RNN (recurrent neural network). 38 LSTM can effectively solve the problem of long-term dependence and reasonably control the increase and deletion of information in neurons. 39 The LSTM network structure is shown in Figure 9.
In the LSTM unit structure, cell (c t ) is a memory unit that is in charge of the transmission of information. The input gate (i t ) is in charge of the increase of information, the forget gate ( f t ) is in charge of the deletion of information, and the output gate (o t ) is in charge of the output of information. The complete training process of LSTM is that at each time t, the three gates receive the input vector x t at time t and the information of the hidden state h t−1 of the LSTM at time t − 1 and the memory unit c t . Then perform logical operations on the received information, and the logical activation function σ determines whether to activate it. Then the processing result of the input gate and the processing result of the forget gate is integrated to produce a new memory unit c t , and finally the final output result h t is obtained through the nonlinear operation of the output gate. The calculation equation of each process is as follows. 40 Thereinto, σ is generally a nonlinear activation function. W xi , W xf , W xo , W xc are the weight matrices of the nodes connected to the input vector at each layer. W hi , W hf , W ho , W hc are the weight matrices of each layer connected to the previous short-term state h t−1 . b i , b f , b o , b c are the nodes of each layer the bias term.

| XGBoost
XGBoost implements a general decision tree algorithm. Compared with the gradient lifting algorithm, the performance of the decision tree algorithm can be improved by 10 times. 41 All tree models of XGBoost are CART models, and the tree integration model is shown in Equation (25). 42 Thereinto, y î represents the predicted value of the ith sample. K represents the number of trees, F is the set of regression trees, fk represents k sub-models in F , and xi represents the input sample data. The objective function of XGBoost consists of two parts, a loss function and a regular term, as shown in Equations (26) and (27): Thereinto, L φ t ( ) represents the objective function of the t-th iteration, y i t ( − 1) represents the predicted value of the t-1 iteration.
fk Ω( ) represents the regular term of the model of the t-th iteration, fk Ω( ) plays a role in reducing overfitting. γ and λ are penalty coefficients. T is the number of leaf nodes, and ω is the score of leaf nodes. Perform a second-order Taylor approximation expansion of Equation (26), as shown in Equation (28) Thereinto, gi represents the first derivative of sample xi. hi represents the second derivative of sample xi. ωj represents the output value of the j-th leaf node, and Ij F I G U R E 8 Sliding estimation process of model parameters represents the sample subset of the value of the j-th leaf node.
It can be seen from Equation (28) that the objective function is a convex function. Taking the derivative of ωj and making the derivative function equal to zero value, the ωj that makes the objective function reach the minimum can be obtained, namely: The smaller the value of the objective function, the better the model. In the training process, the greedy algorithm is used to divide the subtree, enumerate the feasible division points, and continuously calculate the node loss to select the leaf nodes with large gains. The gain calculation is shown in Equation (31).

| Establishment of hybrid model
Through analyzing previous research, we can find that single GARCH family models often perform poorly when dealing with multiple influencing factors and nonlinear sequences. A single LSTM model has a poor interpretation of volatility, therefore the prediction accuracy of volatility will not be too high. The combination of GARCH family models and LSTM models can effectively solve the shortcomings in volatility forecasting. However, it is difficult to achieve better prediction results simply by combining GARCH family models with LSTM networks, because too many feature inputs often have a negative impact on model predictions. So XGBoost algorithm is applied to the study because XGBoost is equivalent to the "catalyst," which can effectively improve the prediction performance of the model. In this study, the sliding estimated GARCH family of model parameters and meteorological factors are used as influential features of the volatility. Then use the XGBoost algorithm to perform feature screening on these influencing factors, and input the screened important features into the LSTM network for prediction. Finally, the effectiveness of this model is verified by comparing it with a variety of single models and combined models. We call this model the GFMF-XGBoost-LSTM model and the model architecture is shown in Figure 10. The GARCH family models parameters can better explain the fluctuation characteristics of the load series. Thereinto, GARCH can better describe the long-term memory and volatility aggregation. TARCH, EGARCH, and PARCH can better characterize the asymmetry and non-normality of volatility. The sliding estimation parameters in these models are equivalent to extracting pre-training information for the deep neural network, effectively reducing model changes. XGBoost sorts the importance of features and can accurately screen out important features. Using these important features as input features can reduce the interference of irrelevant features and effectively improve the prediction accuracy of the model. The LSTM network has the ability to learn complex and nonlinear relationships from data sets, and can effectively process nonlinear, long-memory time series data.

| Model evaluation indicators
To measure the accuracy of the prediction model, this study uses mean square error (MSE) and root mean square error (RMSE), 43 and the index equations are shown in Equations (32)- (33). These two evaluation indicators are used because it is considered that in the neural network, the MSE can effectively converge even if a fixed learning rate is used, and the gradient of the MAE update is always the same, which is not conducive to the learning of the model. In view of this, the experiment mainly refers to the values of MSE and RMSE indicators.
Thereinto, y i  x is the predicted value, y i is the true value, and n is the number of test samples. RMSE is based on MSE, which is more intuitive in order of magnitude. The ranges of MSE and RMSE are both [0, +∞). Generally, the larger the value of MSE and RMSE, the greater the error and the lower the prediction accuracy of the model.

| Data set division and normalization
The data set needs to be divided before it can be input into the model for training, otherwise, the training results may be overfitted due to training on all the data. The experiment divides the data set into the training set, validation set, and test set. Thereinto, the training set data is 60%, the validation set data is 20%, and the test set data is 20%. The training set is used for model fitting data samples. The validation set is used to adjust the hyperparameters of the model and to initially evaluate the model's ability. After the optimal parameters of the model are found, the data is then tested on the test set so as to evaluate the generalization ability of the model. As the load volatility and the GARCH family models parameters have different meanings and dimensions, they will have an impact when they are input to the forecasting model. Therefore, this type of data needs to be normalized. The normalized data can reduce the training time of the model, accelerate the convergence speed of the model, and further improve the prediction accuracy of the model. The data normalization calculation equation is shown in Equation (34): Thereinto, x norm is the normalized value, x is the original data, x min is the minimum value in the original data, x max is the maximum value in the original data, and the normalized data size is constrained to be between [0, 1].

| The prediction results of A single model
The experiment inputs data into GARCH family models (GARCH, TARCH, EGARCH, and PARCH), commonly used deep learning models (LSTM, MLP, and CNN), and some machine learning models (ARIMA, MLR, and SVR) for training and prediction. The purpose is to compare the forecasting effects of different models on load volatility. As shown in Figure 11, it is the fitting curve of the predicted value and the true value of every single model. It can be seen that the fitting effect of the LSTM and MLP models is better. As shown in Table 6, it is the value of the evaluation index of every single model. Thereinto, the value of MSE (0.0044710) and RMSE (0.0668658) of LSTM is the smallest. Therefore, the overall performance of the LSTM model is better. However, these GARCH family models and machine learning models did not show good prediction performance. This may be due to the non-normality, asymmetry, and heteroscedasticity of natural gas load volatility, which makes it difficult to predict natural gas load volatility.

| The prediction results of the combined model
Through the above prediction, it is found that the LSTM model has a better fitting effect and higher prediction accuracy than the prediction results of a single GARCH family model and machine learning model. Furthermore, to improve the prediction accuracy of volatility, we rely on (1) GARCH family models can describe the characteristics of load fluctuations; (2) LSTM has a better prediction effect at the inflection point and surge point of volatility; (3) LSTM The neural network has a good explanation for the data. Through this experiment, we use the parameters of each model in the GARCH family as input features and input them into deep learning models such as LSTM. The prediction results are shown in Figure 12A-D. The evaluation results of the model are shown in Table 7.
Through the above prediction results and evaluation results, it can be found that compared with LSTM. The MSE and RMSE of GARCH-LSTM decreased by 6.48% F I G U R E 10 Model architecture and 3.29% relatively. The MSE and RMSE of TARCH-LSTM decreased by 0.75% and 0.38% relatively. The MSE and RMSE of PARCH-LSTM decreased by 0.48% and 0.24% relatively. Therefore, in general, the prediction accuracy of GARCH-LSTM, TARCH-LSTM, and PARCH-LSTM models has improved. The prediction accuracy of other models has not improved significantly, and the prediction accuracy of some combined models has decreased significantly. It shows that the description effects of different GARCH family models parameters are different.
To test whether the GARCH family of parameters, as well as meteorological factors, have a positive impact on the forecasts or not, here is the process we used. We take a total of 15 sliding estimation parameters from the GARCH model, TARCH model, EGARCH model, and PARCH model as the input features of the LSTM network. Second, the meteorological factors are taken as the input features of the LSTM network, and their values are mapped to the [0, 1] interval and transformed into quantitative data. The quantitative results of influencing factors are shown in Table 8. Finally, the 15 parameters of the GARCH family model and the meteorological factors were simultaneously used as input features for the LSTM. The prediction results are shown in Figure 13 and the evaluation results are shown in Table 9. From the prediction results and evaluation results, when GARCH family parameters are used as input features, it is clear that GARCH family-LSTM has higher prediction accuracy than GARCH family-MLP and GARCH family-CNN. When meteorological features are used as input features, MF-LSTM has higher prediction accuracy than MF-MLP and MF-CNN. When both GARCH family parameters and meteorological features are used as input features, GARCH family-MF-LSTM has higher prediction accuracy than GARCH family-MF-MLP and GARCH family-MF-CNN. However, comparing the prediction accuracy of these nine models with the above models, it was found that too many input features would instead reduce the prediction accuracy. This indicates that too many input features have a negative impact on prediction. All parameters or meteorological features of the GARCH family are used as input features, and the prediction accuracy is poor compared to that of a single GARCH family model, although the fit is better.  fitting effect and prediction accuracy are not improved. It can be concluded that too many parameters are used as input features, and the prediction accuracy is not significantly improved. What's more, it will lead to a decrease in the prediction effect of some models. Therefore, only appropriate parameters as input features can effectively improve the prediction accuracy.

| Prediction results based on the GFMF-XGBoost-LSTM model
To select appropriate parameters as input features, the XGBoost method is used here to filter features and compare them with other feature selection methods. First, perform XGBoost modeling on the training set containing all the features, use five-fold cross-validation to find the optimal parameters, and sort the features according to F score. Then filter the sorted feature set, use the F score value to evaluate whether the feature can be retained, and remove the feature with the lowest F score in turn; use the AUC (area under the curve) value of the verification set under the new feature subset to determine the prediction of the remaining feature Whether the result is better. In the process of feature selection, we need to weigh the number of features and the effect of improving the model. However, some features have a limited improvement effect on the model. Therefore, the features that have a greater impact on prediction are selected as much as possible in the experiment. We may wish to select features by setting a threshold h (the specific value of h is set according to the experimental results). If the AUC value of the validation set is increased and greater than h, the features that have just been deleted are retained. If the AUC value of the validation set is increased and greater than h, we keep the features that were just deleted. If the AUC value of the validation set is increased and less than h or decreased, we remove the deleted features. This algorithm can not only screen out the features that have a greater impact on the target variable but also reduce the redundancy between features. 44 The 22 parameters were filtered by XGboost and the top 10 parameters in terms of feature importance were finally selected as input features, as shown in Figure 14. In all trees, the number of times a feature is used to split a node is Weight, and the total gain of a feature at each node split is Total_gain.
The average gain value is calculated as follows: We sort the features according to their importance, the first tenth features are finally taken as input features, they are f0 (volatility), f3 (hours), f8 (EGARCH_ω), f11 (EGARCH_β), f10 (EGARCH_γ), f12 (GARCH_ω), f13 (EGARCH_α), f16 (PARCH_α), f21 (TARCH_α), f4 (Highest temperature). The optimal parameters of the LSTM network are finally determined through multiple training and verification of the training set. The main parameter settings are shown in Table 10. The prediction results are shown in Figure 15 and the evaluation results are shown in Table 11.
Note: The GFMF-XGBoost-LSTM model is that the GARCH family model parameters and meteorological factors are filtered by the XGBoost algorithm and fed into the LSTM network for prediction. The GFMF-RF-LSTM model is that the GARCH family model parameters and meteorological factors are filtered by the RF algorithm and fed into the LSTM network for prediction. The GFMF-GBDT-LSTM model is that the GARCH family model parameters and meteorological factors are filtered by the GBDT algorithm and fed into the LSTM network for prediction, and so on. According to the above prediction results and evaluation results, it can be found that appropriate input characteristics play a positive role in the prediction of the model. GFMF-XGBoost-LSTM model has higher prediction accuracy, less error with the real value, and a better fitting effect. And, to verify that XGBoost has better results in feature selection, the prediction results after XGBoost filtered features are compared with those after the random forest and GBDT filtering, and it can be found that the prediction results after XGBoost filtered features are more accurate, among which GFMF-XGBoost-LSTM has the highest prediction accuracy.
To confirm that the proposed model has obvious advantages over other comparison models, we compare the MSEs of the GFMF-XGBoost-LSTM model and several models with higher prediction accuracy in Tables 6, 7 Table 12. In short, the GFMF-XGBoost-LSTM model has an average reduction of 45.404% in MSE, which shows that the GFMF-XGBoost-LSTM model has higher prediction accuracy. In natural gas load forecasting, if we ignore the fluctuation and heteroskedasticity of the load, the standard error and confidence interval of the mean gas load will be too narrow where the prediction of natural gas load volatility can effectively solve this problem. The prediction of natural gas load volatility can not only make the interval prediction of natural gas load feasible but also determine the probability of future load mean in a certain interval. In view of the large fluctuation, weak regularity, and strong uncertainty of load volatility, in the next research step, we also calculate the confidence interval of hourly natural gas load volatility, so that the estimated volatility is within the predictable range as far as possible, so as to prepare for the interval prediction of natural gas load in the next research step. Figure 16 shows the fitting curve of the predicted value, volatility within the 90% confidence interval and real volatility of the GFMF-XGBoost-LSTM model at the time, in which the average value of volatility in the figure is 0.0200 and the 90% confidence interval of the average value is (0.0157, 0.0244). It should be noted that in the process of calculating the confidence interval, due to the huge volatility, the lower limit of the 90% confidence interval has a negative value, and we treat the negative value as a zero value. This is because when the natural gas load has extremely small fluctuations (i.e., the volatility tends to zero), it will not seriously affect the safety of natural gas production and operation. But it is worth noting that people should pay enough attention when the natural gas load fluctuates greatly (i.e., the volatility is obvious). The confidence interval of volatility can continue to carry out the interval prediction research of natural gas load, and it also provides more reliable decision support for natural gas production and operation.

| Result analysis
The prediction results of a single LSTM model are better than a GARCH family model, MLP, CNN, ARIMA, MLR, and SVR. The MSE of the LSTM model is 14.939% lower than the GARCH model, 14.955% lower than the TARCH model, 14.937% lower than the EGARCH model, 14.963% lower than the PARCH model, 13.570% lower than the MLP model, 15.796% lower than CNN model, 7.592% lower than ARIMA model, 6.848% lower than MLR model and 30.852% lower than SVR model. As shown in Table 6.
The parameters of each model in the GARCH family are input into the depth neural network as information features, and the prediction results are quite different. Thereinto, the parameters of GARCH family models have a positive effect on the prediction accuracy of LSTM, while the prediction accuracy of MLP and CNN models has been reduced. Compared with the MSE of the LSTM model, the GARCH-LSTM model decreased by 6.931%, the TARCH-LSTM model decreased by 0.759%, EGARCH-LSTM model increased by 17.245% and PARCH-LSTM model decreased by 0.485%. It can be seen that the GARCH-LSTM model has the best prediction performance among these combined models, while the EGARCH-LSTM model has poor performance, as shown in Table 7.
Taking all the parameters of GARCH family models