Improved Markov-chain-based ultra-short-term PV forecasting method for enhancing power system resilience

The awareness capability of output power for renewable resources is essential for enhancing the resilience of power systems. Photovoltaic (PV) forecasting technology is an essential technology for increasing the operation efﬁciency and controllable resources for power systems after extreme natural events. Conventional Markov chain (MC) methods often ignore the time characteristics and the actual distribution of the PV output power sequence when making PV forecasts. This article proposes improved MC methods of equal quantity and clustering-based division methods. The methods can consider the interval distributions of the PV output power time series and select an hour as the time interval. As a sequence, the predicted power at the next moment can be closer to the expectation of the output power distributions. Such a method is combined with a similar day algorithm to calculate the forecast result. Case studies were conducted with one-year operation data from a 25-MW PV station. The results indicate that the proposed methods can effectively improve the accuracy of prediction results compared with traditional methods.


INTRODUCTION
In recent years, extreme natural events have brought great technical challenges to power system operations. Enhancing the resilience of power systems can increase the ability of power systems to adapt to these challenges. The awareness and number of controllable resources are two key points for enhancing the resilience of power systems. Renewable energy resources provide an increasing amount of energy for power systems and play an important role in power system operations. The cost of developing and utilising renewable energy resources has been greatly reduced, the scale is continuously increasing, and the economic efficiency has been significantly improved. With the development of power electronics and energy storage technologies, renewable energy resources can be regarded as a partly controllable resource. In this way, renewable energy resources, including solar and wind energy, can be dispatched to facilitate the restoration process after extreme natural events so that the resilience of the system can be enhanced.
However, there are still technical obstacles to the large-scale integration of renewable energy power generation. The output of renewable energy has strong randomness and fluctuations. As far as photovoltaics (PVs) are concerned, time, weather, and clouds cause fluctuations in output power. In addition to the following load changes, conventional power supplies must consider renewable energy fluctuations. With the gradual expansion of the scale of renewable energy use, high requirements are placed on the regulated power supply, and a large number of flexible peak-shaving energy storage and pumped storage power stations are required if the power of the new energy cannot be accurately predicted. Hence, developing more effective PV power generation forecasting technology can increase the awareness capability of the power system and significantly improve the flexibility and resilience of power systems. Moreover, the forecasting technology of PV systems can build a stronger power grid that adapts to the widespread integration of renewable energy resources and promote the implementation of national energy transition policies [1].
The proportion of renewable energy in the power grid has increased due to the innovation of development technology and the reduction of cost. For a grid with large-scale access to renewable energy, extreme weather can have a significant impact on the grid's power generation capacity. The basic four stages of resilient power system coping with extreme events are as follows: preparation in advance, resistance and absorption, response and adaptation, and rapid recovery.
The power network simulation model is built by predicting the power generation output and grid load. In [2], the simulation results are used to analyse the power failure probability and its impact on the energy market, and a risk-based analysis framework is built to quantify the vulnerability of power grid operation. And the load recovery after the risk is accelerated through the topology control mitigation solutions.
The appropriate reserve capacity for grid operation is determined through forecasting to protect against the risk of highimpact and low-probability events. The innovation of power generation and load forecasting methods plays an extremely important role in improving the resilience of power grid under the condition of high renewable energy source (RES) penetration [3].
Currently, a simple way to implement a PV forecast system in practical engineering projects is to build a simple neural network model by collecting data such as numerical weather forecast results, real-time weather station data, real-time output power data, and inverter unit status. The average system accuracy for such a typical forecast system is approximately 85% [4]. With the increase in RES permeability in the power grid, this accuracy will not meet the actual demand. To improve the prediction accuracy, many related studies have been carried out for the prediction of PV power generation.
In a previous study [5], building a reference evaluation of solar transmittance physical model that does not consider the carbon dioxide transmittance was proposed to predict the solar irradiance data of four stations in India. Solar radiation models were proposed, improved, and applied based on empirical formulas in various places [6][7][8][9][10]. They have good engineering and experimental applicability and are of great significance to regions lacking solar radiation observations. However, the above methods of establishing physical models often put high demands on the quantity and quality of actual parameters.
In recent years, machine-learning-related methods have emerged and have been widely used in the field of PV prediction. Research includes artificial neural networks (ANNs), knearest neighbours (KNN), deep neural networks, support vector machines, and the combination of these methods [11][12][13][14][15][16][17][18][19]. These methods avoid the process of building complex physical models, and the prediction has high accuracy in the case of abundant data.
However, in reality, many areas do not have a large amount of accurate weather information to support the training of the above methods. In the case used in Section 4, the time resolution of the meteorological observation data in this area is only 1 h, and the time resolution of the PV output data can reach 5 min. The prediction effect of neural network construction with meteorological characteristics is not ideal. In this case, the time-series method has relative advantages. Because its model is simple and easy to construct, it is necessary to study this type of method.
In previous work [20], multiple state transition matrices were generated according to seasonal, weather, and daily characteristics, and the fluctuation characteristics and the amount of fluctuation were considered to establish a Markov chain model. In other research [21], the Grey-Markov (GM) chain was adopted, using two GM (1,1) models to predict monotonic data, and a Markov chain model was established for the relative error of the prediction. In other work [22], KNN and AdaBoost were used to improve the clustering algorithm to divide samples, and each weather submodel established a Markov chain model with the attenuation coefficient as the input and solar irradiance as the output; then, a PV engineering formula was used to calculate the output power value.
The difference between the above several Markov methods lies in the definition of the model input and output, which can be applied to different problems or combined with other algorithms. To improve the prediction effect, the Markov chain method itself can be improved in terms of interval division.
However, the power of PV power generation has a strong correlation with weather conditions. In other words, the power curves of PV power generation are quite similar under similar weather conditions. Therefore, the sample clustering method is important for PV forecasting. The method of self-organising mapping has been used [23,24] to cluster the weather conditions into different types (cloudy, sunny, rainy etc.) according to the solar radiation, relative humidity, temperature, and other meteorological conditions. In other work [25][26][27], the k-means clustering algorithm was used to divide the entire sample based on the meteorological parameters calculated by the correlation. In addition to the commonly used parameters of temperature and irradiance, in another study [28], solar time and geomagnetic declination were considered, and the output of similar dates was mixed with simple mathematical processing and model mixing as the prediction results, which also yielded good results. Other researchers simply distinguished the weather according to the average value of irradiance for all data, and then they established an ANN model [29].
Based on the combination of the similar day algorithm and the traditional Markov chain, two major categories for the division of intervals are proposed-overall sample division and hourly sub-model division-each of which can adopt equal value, equal data volume, and clustering methods. An actual case of a PV power station in Shandong Province illustrates that the improved method reasonably and effectively improves the accuracy of the forecast.
The major contributions of this paper are listed as follows.
1. A data processing model of similar daily algorithm is established. The traditional classification method of weather is to divide the historical weather into three or four types (sunny, cloudy, rainy etc.) by clustering method. The method proposed in this work can reflect the similarity of weather characteristics and changes in the time period of the day. 2. Several improved methods based on the traditional Markov method are proposed. Several clustering methods are used in the step of state interval division to improve the effective utilisation of data and display the data distribution characteristics. 3. On the basis of the above method, several small sub-models are built in order to accurately reflect the power distribution and variation law in different time periods. The prediction results of the case show that: a) the advantages of the improvement are proved by comparing with the conventional Markov method; and b) compared with long shortterm memory (LSTM), one of the most commonly methods used in time-series prediction issue, the improved method is effective under the condition of limited characteristic data and insufficient historical samples.
The rest of this article is organised as follows. Section 2 introduces the basic principles of the similar day algorithms and improved Markov chain. On this basis, the complete forecasting process is introduced in Section 3. In Section 4, actual cases and results are used to verify the effectiveness of the forecast. Finally, conclusions are presented in Section 5.

Similar day algorithm
The similar day algorithm clusters the samples, which can effectively improve the prediction accuracy for specific weather conditions. The correlation analysis between tags and features should be completed first. Some features that unrelated with output power or low correlation should be screened out to reduce the dimensionality of system features and system complexity. Clustering of samples can help reduce the occurrence of overfitting and improve the calculation speed of the prediction system. The Pearson correlation coefficient is a common method used to measure the linear correlation degree of the change trend of two variable series. The principle of this method is simple and easy to calculate. The main steps are as follows. By analysing the data sequence, the target variable X 0 to be predicted is obtained, and all the feature value sequences X 1 that may affect the prediction result are determined. The covariance and the standard deviations of the two sequences are found, and the Pearson correlation coefficient is calculated as According to the calculated absolute value of the similarity of the feature, the corresponding relationship between the Pearson coefficient and the correlation is combined, as shown in Table 1. The retained feature is selected, and the feature is used to determine the similarity date. After all sample data have been normalised, the standardised Euclidean distance between the forecast day and the sample day at the corresponding time point of each day is calculated. The sample set is given as X = {X 1, X 2 , … , X n }, and the forecast day feature is X 0 . Then, where S is the standard deviation of the sample set X , and is the weight vector of each feature. The final similarity is ranked from small to large, which is the most similar weather condition to the day to be tested.

Markov chain
The Markov model was proposed by the Russian mathematician Markov in 1906. It is an important method for dealing with discrete random processes. When the state of the random process of research and analysis in the future moment is only related to the current value but not correlated to the previous moment, this property is called "no aftereffect". After long-term research and experimentation, the Markov model has a good effect on short-term prediction. It has been widely used in communication, meteorology, geography, economics, and many other fields. Assuming a random process, The set of possible values of the t random variables is called the "state space", S = {S 1 , S 2 , … , S n }, where n is the total number of states that divide the state space, and the Markov process satisfies the following formula: This indicates the probability of the system transitioning from state i to state j at time t − 1. Moreover, When P i j irrespective of the time t, the sequence is called a "homogeneous Markov chain". From the state transition probability, the state transition matrix is obtained as It is supposed that the probability that the system is in state and the probability should meet The probability when t = 0 is called the initial probability, and the corresponding initial probability vector is The probability vector at time t is

Improved Markov chain
In Markov chain modelling methods, the state interval division process mainly adopts the traditional equivalent method [30][31][32]. This state interval division method depends on experience and experimentation. This classification of samples may result in a small number of data in some intervals or a lack of statistics for the process of power rise rapidly, which may not reflect the characteristics of time-series data.
Therefore, the clustering method is proposed for the division of Markov state space, and it is proved effective through case studies in Section 4.
In the following, three methods of dividing state intervals are introduced.

Equivalent division
Equivalent division is the most common and basic division method. According to the maximum and minimum values of historical data x max and x min , the interval width is where is a small reserved amount for the convenience of statistics and calculations. For a PV power plant with a rated capacity of p N , the above formula is written as Then, the n state intervals are divided into

Equal quantity division
In order to accurately reflect the power distribution and variation law, it is necessary to ensure that there is sufficient data in each divided interval. Otherwise, accidental factors and measurement errors may increase the impact on the results. At the same time, the results cannot be calculated if the matrix is sparse. Therefore, all historical data are sorted by power value.
The samples after sorting are as follows: Then, the n state intervals are divided into

One-dimensional k-means clustering division
Another division method is based on the k-means onedimensional clustering method, which finds the optimal cluster centre for the entire sample set.
Step 1: The number of cluster centres n, which is the number of divided state intervals, is set.
Step 3: The Euclidean distance between each point in the sample and each cluster centre is calculated. The smallest distance among them is taken, and it is divided into the corresponding cluster centre group. Thus, the cluster centres corresponding to all samples are grouped into G 1 , … , G n . The amount of sample data in each group is N i ,i = 1, 2, … , n. The sum of all minimum distances is defined as the loss function.
Step 4: The cluster centres are recalculated as Step 5: The maximum number of iterations and the convergence threshold of the objective loss function are set, and the above process is repeated until the maximum number of iterations is reached or the difference between the two iterations of the objective loss function is less than the given value-that is, the process is stopped when the loss function tends to converge. The final cluster centre is recorded, and the grouping of the state space is completed.
By redefining the state space of the Markov chain every hour, the predicted process can further reflect the time characteristics of PV power. In this way, at different time points, the PV power changes are clearer and more specific, reducing the existence of state intervals with sparse data.
The above three methods can be divided according to the overall dataset and divided according to the hourly dataset.
"Divided according to the hourly sample" means that several sub-models are used to predict; each is responsible for outputting the power prediction of particular moment. For example, the M 8 means the status transition matrix used for predicting the PV power transition between 8:00 and 9:00 a.m. The matrix is obtained by statistics on all the datasets which time label is between 8:00 and 9:00 a.m.
In this article, they are referred to as follows: overall equivalent division (OEV), overall equal quantity division (OEQ), overall clustering division (OC), hourly equivalent division (HEV), hourly equal quantity division (HEQ), and hourly clustering division (HC). Among them, the OEV method is the conventional Markov prediction method. This paper mainly reflects the improved advantages by comparing the remaining methods with the conventional methods.

MAIN PROCESS
The conventional method uses historical samples to build a single Markov model for prediction. To reflect the characteristics of PV power under different time and weather characteristics, for improvement on this basis, each hour is modelled separately. For all samples of the same hour in the historical data of similar days, the state space is divided according to the proposed meth-ods, and the state transition matrix is calculated at each time.
According to the weather conditions on the day to be predicted and the actual output at that moment, the state probability vector at the next moment is predicted, and the predicted power value is calculated. The specific process, shown in Figure 1, is as follows.
Step 1: According to the weather forecast data of the day to be tested and the principle of the similar day algorithm, the date serial number of the historical sample closest to the day to be tested is found. According to all the data of these dates, a set of statistical power sequences is established by hour.
Step 2: According to the power history data collection, several methods proposed in the improved Markov chain are used to divide the state intervals and determine the expected value of each interval.
Step 3: According to the divided interval, the state transition frequency matrix and the state transition probability matrix are calculated.
Step 4: This step is the Markov property test.
Passing the Markov property test is the premise of using the Markov chain to solve the prediction problem, which can be completed by using the 2 test in the following equations: where m i j is the frequency corresponding to the transition from state i to state j in the time-series history sample, and P i j is the corresponding frequency. If the statistic obeys the 2 distribution with (q − 1) 2 degrees of freedom, the confidence level is α, which satisfies 2 ≥ 2 ((q − 1) 2 ). (20) Then, the sequence satisfies the Markov property test and can be processed by the Markov chain.
Step 5: The actual PV output at each moment is divided into the corresponding state space i, and then, the state probability vector at that moment meets It is multiplied by the state transition matrix to find the state probability vector at the next moment. According to the expected value of each interval, the final prediction value of the next moment is obtained.
Several indicators are used to evaluate the forecast results: 2. Root-mean-square error (RMSE) 3. Mean absolute percentage error (MAPE) (24) whereP is the average predicted power of all samples,P is the actual power, P is the predicted power, and C i is the total capacity.

CASE STUDIES
The case studies are based on a dataset of a 25-MW PV station along the coast of Shandong collected in 2019. The time resolution of the selected dataset was 5 min, and the data were combined with weather data [33] downloaded from the weather website for the entire year with a resolution of 1 h. In this work, the time resolution is 5 min and time ranges from 8 a.m. to 5 p.m., 9 hours per day. The data of 11 months were selected as historical samples to calculate the transition matrix for Markov chain methods. The 28 days in December when the power data could be obtained were used for verification. Time period of prediction:9 × 28 = 252h. Various weather conditions of sunny,

Data processing
The first step of data processing is to delete unnecessary data in the historical dataset. In practical projects, the PV station has no output after sunset and before the sun rises. Such zero-output power can be deleted from the historical dataset to save space and reduce processing time for the proposed forecasting methods.

Similar day algorithm
The similar day algorithm clusters the samples, which can effectively improve the prediction accuracy for specific weather conditions. Distinguishing weather by characteristics generally divides the weather into sunny, cloudy, and rainy days, so the characteristics of cloud cover, solar radiation, weather, and rainfall conditions have a great impact on the similarity of weather. The correlation index between the 24 features of the dataset and the output power value Pearson coefficient is obtained, as shown in Table 2. The calculated results from Table 2 and the scatter diagram drawn by the output power and short-wave radiation are shown in Figure 2. Consistent with the calculated results, they are important factors to forecast the PV power that must be considered when analysing meteorological conditions. The same method was used to determine several other features with higher correlation. Finally, relative humidity, total cloud coverage, sunshine duration, and solar shortwave radiance were selected as the characteristics of similar day division. After normalising all sample data, the distance between the forecast day and the sample day at the corresponding time point of each day is calculated. Arrange the dates in order of weather similarity from smallest to largest. The calculated Pearson coefficient value is taken as the weight of similarity calculation = [0.502, 0.482, 0.553, 0.772].

Markov chain calculation
The following describes the overall forecasting calculation process, taking December 1 as an example, with the OEV method.     Table 3, when there are a smaller number of similar days, the time series lacks Markov property, and it is not suitable to use the Markov chain method. In the overall division method (OEV, OEQ, and OC), the power remains at a low level during the earliest and latest time periods of the day, which leads to almost all samples belonging to the state S 1, so the calculated value of 2 is lower than the standard. The forecast for this period cannot be processed in the Markov chain and can The hourly division method can effectively improve the Markov property of the time series and reduce the number of times that the 2 distribution is not satisfied. This also shows that this improved division method is more reasonable.
After the Markov property test is passed, the PV power at the target time can be predicted. The specific steps are shown in Figure 1

Forecast result analysis
In the case of selecting similar days as 2, 5, 10, 15, 20, and 30 days, the predicted power value is compared with the actual value. Figure 3 shows that, for clustering weather conditions, the number of days with similar weather conditions cannot be too small. If it is, the sample lacks universality and is greatly affected by special circumstances.
When the number of days is greater than 15, the accuracy of the forecast is not significantly improved, but the time consumption increases proportionally. Therefore, after comprehensive consideration, 15 similar days are selected as the forecast condition. Figure 4 shows the influence of the number of state intervals on the prediction results. Under the condition of ensuring the Markov properties, the more detailed the division of the historical data samples, the higher the prediction accuracy often is. For the clustering method shown in the figure, the number of states changes very little at more than 10, and the increase in the number of state spaces also means that the predicted time consumption increases proportionally. Therefore, in this study, the number of states of 5 and 12 were taken as examples for prediction.
Some of the predicted results are shown in Figures 4 and 5. Figure 5 shows a magnified view of the time period in Figure 4 for a clearer comparison. When the results of the 5-state OEV are compared with the 12-state OEV, and the 12-state OEV with the 12-state HC, it is obvious that increasing the number of divided states and using the clustering method divided by hours  can effectively make the predicted curve closer to the true value. The comparison can be made more accurately by calculating the overall evaluation index of the forecast result.
Combining the results in Figures 3-6 and Table 4 reveals that the following.
1. The prediction effect of the improved Markov was obviously better than that of the conventional Markov. A more detailed division of the state interval helps to improve the accuracy of the prediction. In the same method, the error of the prediction results of 12 state intervals is only 60% that of five state intervals. Comparing the three overall interval division methods and their corresponding hourly division methods shows that, for the same interval division, compared with the overall division by hour, the error is only 70-80% of the latter, and the MAPE can be reduced by approximately 1%. The improved method can better distinguish the data at this moment, obtain the Markov transition matrix more reasonably and effectively, and greatly improve the accuracy of the prediction results. 2. LSTM that has been developed on the basis of recurrent neural networks is a common method for dealing with timeserial problems. It determines the oblivion, retention, and output of historical information through the forget gate, the input gate, and the output gate and has achieved great success in many areas. Therefore, a four-layer LSTM network model with 20 neurons in each layer is built for training. To avoid overfitting, the dropout parameter is set as 0.2. The time span of data input is 30 min. The time step to forecast is 5 min. The prediction error is shown in Table 4. The LSTM method may be more suitable for long-term prediction than one-step ultra-short prediction. In addition, the temporal resolution of meteorological data is not enough to build multi-feature series model, which limits the improvement of prediction effect. Therefore, the improved Markov proposed in this work may be more suitable in limited data condition.

CONCLUSION
A method based on the traditional Markov-chain-based ultrashort-term PV power prediction method was proposed to increase the prediction accuracy. A non-uniform division method was proposed to determine the state intervals and was combined with a similar day algorithm for the proposed improved Markov chain method. The equal data volume method and one-dimensional k-means clustering division by hour method are applied to determine the state intervals. Compared with the traditional physical modelling method with the data of practical projects, the proposed method is simpler and easier to implement. Moreover, the proposed method is more robust to weather factors. As verified by the same dataset, processing the data of each hour separately is better than processing all the data. The HEQ and HC methods can improve the accuracy of prediction while ensuring operational efficiency.
Under normal weather conditions, this improvement in accuracy is conducive to the advance planning of the power system and reduces the requirements of the system for climbing ability. In extreme natural conditions, it helps to anticipate risks in advance, prepare backup capacity, and recover quickly after extreme natural events. They may be suitable for application in practical projects to increase the awareness ability and resilience of power systems.