Information entropy theory for steam turbine system monitoring study

A steam turbine is one of the components in a power plant. Considered as a strong coupling system, steam turbines may possess correlation information among the operation parameters. In this article, a correlation extraction method based on mutual information theory is proposed for performance monitoring. First, five functions between two variables including linear and nonlinear relationships are tested to validate the proposed method. In the first industrial case study, a sensor fault detection based on the governing stage pressure is conducted on the abnormal fluctuation mutual entropy values calculated by operation data. The results show the proposed method may effectively be used to detect sensor errors and locate the first error sample. Furthermore, uncertainty analysis on the heat rate is studied to sort the importance degree of the relative parameters. The mutual information entropy difference values are then calculated to quantify the sensitivity degree of the operation parameters to the heat rate index. The results in this case study are in accordance with the sequences by the Guide to the Expression of Uncertainty Measurement (GUM) and Monte Carlo (MC) methods. In conclusion, mutual information entropy may hence be used as a new solution for turbine monitoring, combined with information theory.

various data mining approaches have been researched in fault detection in steam turbine system. Principal component analysis (PCA), 5 partial least squares (PLS) 6 and other statistical analysis methods have been proposed for liner system. However, parameters to be monitored in the steam turbine system are usually in the complex coupling relationship. Thus, these methods cannot meet the requirements of steam turbine operation research. Kernel PCA (KPCA) 7 and methods based on neural network or support vector machine 8 have then been reviewed. These methods can obtain the main feature information of the process and have a significant inhibition effect on the measurement error in nonlinear systems. 9 However, models used in these methods take operation data directly as input parameters, rather than the information contained in the data samples. In this article, a new method based on information theory will be taken as a new try to analyze the sensor fault detection.
In the aspect of uncertainty analysis on operation parameters in steam turbine system, some researchers have paid attention to the influence quantification order research. Currently, there are two main methods to evaluate the measurement uncertainty. One is GUM (Guide to the Expression of Uncertainty in Measurement), developed by the International Organization for Standardization ISO, 10,11 dividing uncertainty into standard uncertainty and expanded uncertainty. The other one is MC (Monte Carlo) method proposed as a distribution of the spread of numerical method. 12 It is a numerical method between analytical and experimental method and can be used to generate pseudo-random numbers with given probability distribution, only considering the statistical characteristics of variables. In this article, difference values calculated based on information in data samples will be taken as a new try for uncertainty analysis, compared with the GUM and MC results.
In this article, mutual information theory is proposed to be applied as a method in fault detection and uncertainty analysis. First, five functions between two variables including linear and nonlinear relationships are tested to certify the validity of proposed method. As the first industrial case study, a sensor fault detection based on the governing stage pressure is conducted on the abnormal fluctuation mutual entropy values calculated by operation data. The proposed method is used to detect sensor errors and locate the first error sample. Moreover, uncertainty analysis on the heat rate index is then studied to order the importance of relative parameters. The mutual information entropy difference values are calculated to quantify the sensitivity of operation parameters to the heat rate. The results in this case study are compared with sequences calculated by GUM and MC methods for verification. Moreover, since there are many stream data in power plants, it is necessary to study mutual information entropy characteristics in dynamic data monitoring in the future research.

METHOD
Measurement and definition of information is the most basic questions of information theory. The concept of entropy to information theory was first promoted in "Mathematical Theory of Communication", a text of probability theory and mathematical statistical methods by Shannon in 1948.

Entropy theory
The information contained by the data set called self-information can be expressed by Equation (1). The definition of information entropy is proposed as the mathematical expectation of self-information as Equation (2). 13 where I(x i ) represents the information quantity brought by x i , and H(X) is the information entropy value of data set X, as a measurement of the uncertainty and information quantity inX. Therefore, H(X) can be used to characterize the general features of the system or the degree of uncertainty, and thus establish the corresponding detection and analysis models. Moreover, the information entropy can be extended to the problem of multivariate random variable uncertainty description. On the base of random variables' conditional probability distribution, condition entropy would be proposed.

Mutual information entropy
The random variable (X, Y ) based on the two given data sets X = {x 1 , x 2 , .x i . ., x n } and Y = {y 1 , y 2 , ..y j ., y m } contains information called joint entropy, expressed by H(X, Y ) as Equation (3) , H(X, Y ) reaches the maximum when data sets X and Y are independent with each other. 14 When X and Y ({y 1 , y 2 , ..y j ., y m }) with some correlation, the information in Y related with X will influence the measurement on the information in data set X. This influence can be quantified by conditional entropy. 15 Given X = x i , the condition entropy of data set Y can be defined as Equations (4) and (5).
The confirmation of X(Y ) will reduce the uncertainty of Y (X). The reduction of uncertainty is called average mutual information, I(X; Y ), seen in Equation (6).
The mutual information entropy is a concept which is used to measure of the amount of information in information theory and describe the degree of system ordering (or uncertainty).

Correlation test
From the definition of the mutual information entropy, the higher correlation between two data sets, the larger mutual information entropy value will be. The value will reach 1 as its maximum, when the two data sets equal with each other. When the mutual information entropy from two sequences is larger than 0.5, these two sequences can be regarded as correlated. Data set X is a random sequence with 400 numbers, ranging from 0 to 1. Y is in linear or nonlinear relationships with X, and each sequence will be calculated for the mutual information entropy with the input parameter X.
In the first situation, Y linearly correlates with X, while with different correlation coefficients. The second situation shows the nonlinear correlation between Y and X. The last Y is another random sequence, as seen in Equation (7). Mutual information entropy values in each situation are calculated with different scales shown in Figure 1. The scale concept is brought to calculate probability distributions, representing the number of sections divided from sequences.
F I G U R E 1 Mutual information entropy values of Y and X

TA B L E 1 Correlation coefficients by the test functions
In the linear correlation, Y = X + 1 and Y = 5X + 1, the mutual information entropy values stay the constant number 1, indicating Y and X in strong correlation independent on the correlation coefficient. The result of the second situation shows the larger index of the power function, the weaker correlation between Y and X. In the third situation, the result reveals the two sequences are almost independent, without any correlation. Scales, ranging from 2 to 20, cause mutual information entropy values fluctuation in the result. When scale increases, the entropy values will try to reach the limit approximation value (=1), with some fluctuation in the entropy value growth. In the first step in scale increasing, the entropy value increases, until the first inflection.
The common correlation coefficients are then calculated to compare with the mutual information entropy values. The correlation equation can be seen as Equation (8).
where X, Y stands for the correlation coefficient, Cov(X, Y ) is the covariance between X and Y , while D(X) and D(Y ) are the variance values of X and Y , respectively. The X, Y values are calculated and listed in Table 1.
In this table, the X, Y values match with the mutual information entropy values variation. The correlation coefficients calculated by X + 1 and 5X + 1 are the larger than the others, and the one calculated by random is the smallest. Moreover, mutual information entropy values variation can represent relationship between the scale and the entropy values.

PERFORMANCE MONITORING IN TURBINE SYSTEM
Thermodynamic calculation for steam turbines is an important part of online economic analysis. Relative internal efficiency for steam turbine can reflect the operation efficiency. It can also show the flow characteristics of cylinders to some extent. Combining empirical value or variable condition calculation can help to analyze broken blades inside cylinder, blocked flow passage, scaling and other exceptional conditions.

Flow passage and main steam flow
The measurement of turbine main steam flow in power plant usually depends on throttling devices such as the flow nozzle or orifice. However, these methods are likely to cause throttling losses, directly affecting output efficiency of the unit, especially in large capacity units. Therefore, a large capacity steam turbine system design is normally without the main steam flow throttling device in order to reduce the resistance of the system. The measurement is usually taken by the calculation method based on relevant parameters of the main steam flow. Currently, the main steam flow displayed in the field is the commonly calculated by Equation (9) where G 1 is the calculated main steam flow, P 0 stands for the governing stage pressure, K 1 is proportional coefficient according to the unit designing THA (a steam turbine continuously working at running condition as rated inlet steam parameters, rated pressure, normally operated heat recovery system and water supply rate of 0%), and K 2 represents the temperature correction coefficient, respectively. The main steam flow calculation model represents the governing stage pressure accounting an important role, further in turbine performance calculation. The turbine system can be seen as a high coupling system, with many interdependent parameters. The governing stage pressure can then be considered highly coupling with the unit output (Load), the first extraction steam pressure (P 1 ) and the second extraction steam pressure (P 2 ). In the normal working condition, the relation between P 0 and [Load, P 1 , P 2 ] should be steady, so as the mutual information entropy values. On other hand, the disorder in P 0 can be detected by the mutual information entropy between P 0 and [Load, P 1 , P 2 ], as well as the different faculties in the disorder.

Influence factors in heat rate analysis
The heat rate is defined as the ratio of the heat of steam turbine system from the external heat source to its output power, seen as Equation (10) where HR is heat rate, kJ/kWh, Q SR is heat consumption, kJ/h, and N is the load, KW. The heat consumption of reheat unit is calculated according to Equation (11). The description of each symbol has been listed in Table 2. The turbine heat rate uncertainty in a large degree depends on parameters as power load, main steam flow, main steam temperature, reheat steam temperature, cold reheat steam temperature, feed water temperature and the main steam pressure. Moreover, these parameters have different influences on the turbine performance. The closest correlation between the parameter and heat rate deserves high attention for it is the base of operation monitoring and optimization.

Sensor error detection on governing stage pressure (P 0 )
The calculation of main steam flow in the SIS (Supervisory Information System) of a power plant has an important role in evaluating the turbine performance. In this article, we obtain P 0 and [Load, P 1 , P 2 ] data samples from the DCS both in normal and abnormal working conditions, as Figures 2 and 3. Figure 2 displays the 346 operation data under the normal F I G U R E 2 P 0 and [Load,P 1 ,P 2 ] in normal situation F I G U R E 3 P 0 and [Load, P 1 ,P 2 ] in abnormal situation working condition. Figure 2A-D lists the load, the first extraction steam pressure, the second extraction steam pressure and the governing stage pressure, respectively. While Figure 3A-D shows [Load, P 1 , P 2 ] and P 0 with 1100 data samples, and the unit suffered a sensor fault in P 0 measurement. The values of mutual information entropy of above two working conditions are both calculated and displayed in Figure 4, with the scale equals 5, and every 10 data grouped into a sample set.
The mutual entropy calculated by normal data ( Figure 4A) is higher than the one calculated by the abnormal data ( Figure 4B) in the general view. The line representing the abnormal working condition can be analyzed by three parts. The first part includes the first to 316th sample sets, the second one is from 317 to 953, and the last part is the remaining data. In the first part, the entropy values from the normal and abnormal situation have coincidence in some part showing data stay in the initial period. In this period, the P 0 values can be judged as normal ones as seen in Figure 3D and the related part in Figure 4B. The entropy values in the second period are almost the same number zero revealing the sensor fault, and the number zero is caused by an invariant constant (seen in Figure 3D) rather than those varied with the [Load, P 1 , P 2 ] in the normal working condition. The entropy values in the third part are lower than the normal entropy values, representing a weaker relationship between P 0 and [Load, P 1 , P 2 ] compared with the one under the normal working condition.
In Figure 5, load and P 0 data samples have been moved together, and the first bad samples in different periods have been marked. Figure 4B shows the 1091 entropy data, and the 317th, 490th and 954th are the first bad entropy samples in The bad entropy value can be taken to calculate the first bad sample position. Since 10 data are grouped into a sample set, the first bad sample can be calculated by Equation (12).
Pos BE E(Pos BE − 1) = normal, E(Pos BE ) = abnormal where Pos BD stands for the position of first bad sample, and Pos BE is the position of first abnormal entropy value. With the equation, the first bad sample positions in different periods can be calculated to be 317 (317 = 317), 499 (490 + 10 − 1) and 963 (954 + 10 − 1), respectively. The results are the same as marked in Figure 5. In Figure 5,

Ranking in influence parameters of heat rate (uncertainty analysis)
Uncertainty analysis, or sensitivity analysis, is to conduct quantitative analysis on relationship between the operation parameters and the heat rate. ASME PTC6-1996 steam turbine performance test will be taken as the basis in the 600 MW coal-fired unit simplified test data for the case study. According to the previous paper, the heat rate together with turbine running parameters as power load (Load), main steam flow (D zq ), main steam temperature(T zq ), reheat steam temperature (T zr ), cold reheat steam temperature (T lzr ), feed water temperature (T gs ) and the main steam pressure (P zq ) are taken as test data.
First, E 0 is calculated by the heat rate and the whole parameters mentioned above for mutual entropy value, seen in Equation (13).
Second, E i (i = 1, 2 … , 7) should be calculated by the heat rate and parameters but the ith one, as Equation (14). Based on the definition of E i with E 0 , it can be seen that the nearer the E i with E 0 , the weaker relation between the ith parameter (X[i]) and heat rate (Y ). Therefore, |E 0 − E i | can been taken to measure the uncertainty degree caused by the ith parameter to the heat rate. The absolute value of the entropy differences can be seen in Figure 6.
In the figure, indexes calculated by |E 0 − E i | with the proposed method in the article have been shown in the form of histogram, as well as those by GUM 16 and MC. 17 Moreover, Table 3 has listed the indexes values by GUM and MC methods in the reference. The results reveal that, although the values are not the same, the sensitivity index order is almost the same as those by GUM and MC. Power load and main steam flow have a similar strong relation with the heat rate. Main steam temperature, reheat steam temperature, cold reheat steam temperature and feed water temperature have relative weaker influences on heat rate compared with power output and main steam flow. Moreover, the main steam pressure has the weakest relationship with the heat rate compared with other parameters. This result corresponds to constant pressure and sliding pressure operation. Therefore, compared with the real operation and reference results, the order calculated by mutual information entropy values can be considered reasonable.

CONCLUSIONS
In this article, mutual information theory is proposed as a new method compared with neural network to be applied for sensor fault detection. With the inverse calculation, the first bad sample can be located by mutual information values, and the accuracy has been testified by the actual operation conditions. Moreover, uncertainty analysis on parameters related to the heat rate index is also ordered based on entropy difference values. The results can be seen consistent with those by the exited GUM and MC methods. Since there are many stream data in power plants, based on this research, it is necessary to study mutual information entropy characteristics in dynamic data monitoring in the future research.

ACKNOWLEDGEMENT
This work is supported by National Natural Science Foundation of China (51706093), Science Foundation of Nanjing Institute of Technology (YKJ201711, YKJ201607) and Natural science Research Project of Jiangsu Higher Education Institutions (18KJB470012).

PEER REVIEW INFORMATION
Engineering Reports thanks Xiaofeng Yuan and other anonymous reviewers for their contribution to the peer review of this work.