Research on transformer fault diagnosis: Based on improved ﬁreﬂy algorithm optimized LPboost–Classiﬁed regression tree

The information of dissolved gas in transformer oil can reﬂect the potential fault in oil immersed power transformer. In order to improve the accuracy of transformer fault diagnosis, a transformer fault diagnosis model based on IFA-LPboost-CART is proposed here. First, a LPboost-CART model is established. The classiﬁcation and regression tree (CART) are used as the weak classiﬁers, and the linear programming boosting (LPboost) ensemble learning method is used to adjust the weight of each weak classiﬁer to construct a strong classiﬁer. Then the improved ﬁreﬂy algorithm (IFA) is adopted to optimize the number of CART and the maximum number of splits of CART in LPboost-CART to obtain the IFA-LPboost-CART model. The experimental results show that, compared with the existing methods, such as CART and support vector machine (SVM), the proposed IFA-LPboost-CART model has higher fault diagnosis accuracy, which can provide technical support for transformer fault diagnosis.


INTRODUCTION
Power transformer is the core component of power system, which plays an irreplaceable role in electricity transformation and transmission. The stable operation of transformers is not only related to the safety and stability of the power system, but also has an important impact on the national economy [1]. The transformer fault has many disadvantages, such as: Expensive maintenance, renewal costs and huge economic losses caused by power failure. Therefore, how to accurately diagnose the transformer fault and reasonably arrange the maintenance plan are of great significance [2][3][4]. With the aging and insulation failure of power transformer, transformer oil decomposes and produces a large amount of gas dissolved in oil, such as hydrogen (H 2 ), methane (CH 4 ), ethane (C 2 H 6 ), ethylene (C 2 H 4 ), acetylene (C 2 H 2 ), carbon monoxide (CO), carbon dioxide (CO 2 ) etc. Detecting and evaluating faults in transformer by analysing the dissolved gas in the oil is called dissolved gas analysis (DGA) [5,6]. Dornenburg method, Duval method etc. These methods are simple to operate and are suitable for most voltage levels and types of transformers. However, due to the complex relationship between dissolved gas information and transformer fault type, the traditional three-ratio method has the shortcomings of low diagnostic accuracy, so it is difficult to use these simple ratio methods to diagnose transformer fault types. With the rise of intelligent algorithms, the combination of machine learning algorithm with DGA technology is applied in transformer fault diagnosis [7,8], such as the k-nearest neighbour (KNN), CART, bayesian network, artificial neural network [9], etc. With the development of deep learning, some scholars try to use deep belief network (DBN) for transformer fault diagnosis [10,11]. Literature [10,11] builds a DBN transformer diagnostic model based on DGA. However, deep neural network often requires a large training sample. Due to the good operation and maintenance management of power equipment, the failure of power transformers is mostly a small probability event, resulting in few equipment samples in abnormal conditions and the emergence of unbalanced data sets, which restricts the training effect of deep neural network models [12]. Support vector machine (SVM) has powerful generalization ability in small sample data sets, which is very consistent with the characteristics of transformer fault sample set [13][14][15][16][17][18][19][20]. However, the c and g parameters of SVM have great influence on model performance [21,22], so, optimizing the model parameters to improve the diagnosis accuracy is a key problem. For example, Zhang et al. [13] used IKH algorithm to optimize SVM, which effectively improved the accuracy of transformer diagnosis. Li et al. [14] combined genetic algorithm with SVM and put forward the optimal input features for the model. Tian et al. [15] proposed an ICA-SVM transformer fault diagnosis model. Illias et al. [16] proposed a hybrid feature selection technique based on GSA, ANN, and SVM. Chen et al. [17] proposed the bat algorithm to optimize the transformer fault diagnosis method of LS-TSVM.
Although SVM has been widely used in transformer fault diagnosis, study has shown that a single SVM is inferior to some ensemble learning methods [23,12,[24][25][26]. Wang et al. [12] proposed a transformer fault diagnosis method based on Bayesian optimization random forest, compared with a single intelligent diagnosis method, it has a higher accuracy rate. Another example is boosting ensemble learning algorithm, Liu et al. [24] proposed a AdaBoost-RBF method, the accuracy of fault diagnosis can be effectively improved through boosting ensemble method. There is also PSO-ELM-Adaboost [25] and Adaboost-cloud [26] transformer ensemble learning fault diagnosis method. Senoussaoui et al. [27] compared the impact of different ensemble learning algorithm on the accuracy of transformer fault diagnosis and believed that the ensemble learning algorithm can effectively improve the accuracy of transformer fault diagnosis. Therefore, ensemble learning algorithm has certain advantages in transformer fault diagnosis.
Linear programming boosting (LPboost) ensemble learning algorithm is a variant of the Adaboost algorithm and belongs to the boosting family. It can obtain the best linear combination of weak classifiers through linear programming techniques, because of its better convergence and classification characteristics than adaboost, it has been widely used in various fields in recent years [28][29][30][31][32][33]. For example, Thaseen et al. [30] proposed the integrated intrusion detection model based on LPboost, and the results showed that the strong classifier built via LPboost has better performance than a single model. Liu et al. [31] proposed a hybrid prediction model based on LPboost to predict the content of PM2.5 in the city, and the hybrid model has higher accuracy. Chen et al. [32] applied LPboost in the medical field, and the results showed that it has a good effect in the diagnosis of colorectal cancer.
In this research, in order to improve the accuracy of transformer fault diagnosis, a LPboost-CART transformer fault diagnosis model based on DGA is proposed. In this model, the classification and regression tree (CART) are used as the weak classifier, and LPboost algorithm is used as the ensemble framework to adjust the weight of each weak classifier to construct a strong classifier. Considering that the LPboost-CART is greatly influenced by the number of CART and the number of splits of CART, an optimization strategy based on the improved firefly algorithm is propose to improve the performance of LPboost-CART model, which provides suggestions for the selection of parameters in the model. In order to prove the effectiveness of this proposed model, the DGA data set was used to compare IFA-LPboost-CART with existing methods, such as SVM and CART. And an actual case study is used to verify the performance of the model. The results show that the established transformer fault diagnosis model has higher accuracy than the existing methods, and the diagnosis results are consistent with the actual case. The model proposed in this paper is more appropriate for transformer fault diagnosis, and can provide technical support for transformer fault diagnosis.

Decision tree
Decision tree is a machine learning classification algorithm, which has the characteristics of simple implementation, easy to understand and fast speed. It can be seen as a tree classification model, which includes three kinds of nodes: Root node, internal node and leaf node. In the process of decision tree formation, "the best feature of data set" is used as the basis of decision tree layer by layer division, which is called attribute. At present, the methods of dividing optimal attributes are: information gain, information gain rate, Gini index. The corresponding algorithms are ID3 decision tree, C4.5 decision tree and CART. In this study, CART is used as the weak classifier [18]. Suppose that there are n types of samples in the sample set D, in which the proportion of class i samples is pi, then the Gini index is: Generally speaking, in order to ensure the calculation speed of the ensemble learning model, the weak classifier should be a relatively simple model. Because CART has the advantages of simple structure and fast running speed, it is often used as a weak classifier in ensemble learning algorithms, for example, the weak classifiers in random forest [23] and XGboost [34] are CART. Therefore, in this study, we choose CART as the weak classifier in the transformer fault diagnosis model.

Linear programming boosting (LPboost)
Ensemble learning (EL) is a branch of machine learning (ML). EL is not a single ML algorithm, but a combination of multiple weak classifiers with some "combination strategy" to form a strong classifier.
LPboost is one of the boosting algorithms in the ensemble learning family. It is a supervised multi-class classification algorithm. By continuously adjusting the weight of each weak classifiers, it integrates the weak classifiers into a strong classifier The LPBoost -CART transformer diagnosis model [28][29][30][31][32][33]. The final strong classifier can be shown in formula (2): where hj(x) is the weak classifier and a is the weight of the weak classifier, j = 1, 2, 3… m. LPboost uses linear programming technology to obtain the best linear combination of all weak classifiers, and the weights of all weak classifiers are adjusted in each iteration. If there is a data set: S = { (x 1 , y 1 ), (x 2 , y 2 ), … (x n , y n )}, then the linear programming can be expressed by formula Where ζ is a slack variable, D is the penalty parameter [32]. Equation (3) can be transformed into the dual problem (4): In this study, we use CART as the weak classifier, construct n different CRAT models, and use the LPboost ensemble method as the ensemble framework. By adjusting the weight of each weak classifiers, we construct a strong classifier, and establish the LPboost-CART transformer fault diagnosis model based on DGA. The process is shown in Figure 1. The number of CART N [1,200] the maximum number of splits of CART D [2,20] The specific process is as follows: 1. First, the transformer fault data set is normalised: where x mn * is the normalized data, x mn is the original sample, and p is the number of input features; 2. Initialize u, β, u = ( 1 l , … 1 l ), β = 0; 3. Training the weak classifiers; 4. To examine whether the termination condition is satisfied, the discrimination formula is as follows: hm is the weak classifiers; 5. If the termination condition is not satisfied, u and β are updated, and the update formula is shown in formula (4); 6. Output the final diagnostic result, and the model formula is shown in formula (2). Where hj (x) is the weak classifiers (CART) and aj is the weight of CART.
The main parameters that affect the performance of the LPboost-CART model are: the number of weak classifiers (the number of CART, N), and the maximum number of splits of CART (D).
Generally, if the number of N is too small, the model is difficult to converge and has poor generalization ability. If the number of N is too large, the model is easy to overfit. D mainly affects the generalization ability of the model. The ranges of these two parameters [23,28] are shown in Table 1.

Firefly algorithm
Firefly algorithm (FA) [35,36] is a heuristic algorithm, it has the advantages of few initial parameters, simple implementation and more powerful ability of global optimization, it shows good results on most optimization problems. In this research, we use the firefly algorithm to optimize the parameters in the LPboost-CART diagnostic model. Fireflies attract other fireflies and interact with each other by flashing light. Fireflies with low brightness move towards those with high brightness. The light source and light intensity obey the inverse square law. In addition, the air will also absorb part of the light, causing the light becoming weaker and weaker with the increasing of distance. These two factors work at the same time, so most fireflies can only be found by other fireflies within a limited distance. In the firefly algorithm, there are two important problems: the change of light intensity and attractiveness. The light intensity formula of fireflies is shown in Equation (7): where I 0 is the light intensity of the initial firefly, r is the distance between two fireflies, and γ is the optical absorption coefficient.
The attraction of fireflies is proportional to the light intensity. The formula is shown in Equation (8): where β 0 is the absorbance at r = 0. Cartesian distance of any two fireflies: Where X i,K are the coordinate values of the ith firefly in the kth dimensional space. Firefly i moves in the direction of firefly j, which is brighter than firefly i: α represents the step parameter.

Improvement of FA algorithm
In FA algorithm, the larger step parameter α ∈(0,1) is, the faster the convergence speed of FA algorithm is, but the optimization accuracy is difficult to guarantee; The smaller α is, the better the convergence effect is, but the convergence speed is difficult to guarantee, and it is easy to fall into the local optimum. In order to achieve global optimization and high convergence speed at the same time, a memetic FA that employs dynamic α is proposed, which is expressed as follows Equation (11): where i is the current number of iterations and T is the maximum number of iterations [36].   The flow chart of IFA optimizing LPboost-CART is shown in Figure 2.
The steps are as follows: 1. The training set and test set are randomly divided according to the principle of 7:3, and the minimum diagnostic error is taken as the fitness function of FA. The parameters to be optimized are the number of weak classifiers (N) and the number of splits (D) of CART. The range of N and D is shown in Table 1; 2. The initial parameters of FA in this paper are shown in Table 2, where the parameters are: the number of fireflies (n), the number of iterations (T), the step parameter (α), the absorbance (β 0 ), and the optical absorption coefficient (γ) [35]; 3. The initial position coordinates of N fireflies are randomly distributed in the range of Table 1 (the range of N and D parameters in the LPboost-CART to be optimized.); 4. According to Equations (7) and (8), the light intensity and attraction of fireflies are calculated; 5. Update firefly's position: Firefly moves towards brighter firefly, and the moving formula is Equation (10); 6. Calculate the firefly brightness and check whether the maximum number of iterations T is satisfied; 7. If the termination condition is not satisfied, α is updated according to Equation (11), and then returns to step (4); 8. If the termination condition is satisfied, the optimal N and D parameters are output and passed to LPboost-CART, and the process ends.

EXPERIMENTAL RESULTS AND ANALYSIS
The fault types of power transformer are mainly divided into external fault and internal fault. The model proposed in this paper is mainly used to diagnose whether there is potential internal fault in transformer. According to IEC 60599-2015, the fault types can divide into normal (N), medium and low temperature overheating (T 1 and T 2 ), high temperature overheating (T 3 ), low energy discharge (D 1 ), high energy discharge (D 2 ) and partial discharge (PD) [4]. The fault data of transformer used in this paper comes from published literature [37][38][39][40][41][42][43][44][45]. There are 475 groups of data samples, including five kinds of characteristic gases: Hydrogen (H 2 ), methane (CH 4 ), ethane (C 2 H 6 ), ethylene (C 2 H 4 ) and acetylene (C 2 H 2 ), and the sample data set is shown in Table 3 (see Appendix: data for 475 samples).

Binary tree diagnosis model
LPboost is a multi-class classification algorithm. Generally, all fault types of transformer can be diagnosed with only one model, however, when the multi-class classification method is used to process the above data, due to the complex nonlinear relationship between transformer fault and dissolved gas content in oil, the algorithm model can not effectively identify each fault type. As shown in Figure 3, with the increasing of iteration times, the training error gradually decreases, but it is still unable to converge. Thus the diagnosis performance of this model is poor.
To solve the problem, the binary tree method is applied to transform the multi-class classification problem into a binary classification problem. The binary tree process is shown in Figure 4.
Therefore, according to the binary tree process in Figure 4, only five fault diagnosis models need to be built to effectively identify all fault types of transformer.  Table 2. The range of N and D parameters in LPboost-CART is shown in Table 1, The default parameters of LPboost-CART are D = 100, N = 1 (the number of CART is 100 and the maximum number of splits is 1). The range of parameters c and g in SVM is interval [0.01,100], and the kernel function is RBF kernel function [16].
Take the diagnosis of normal or faulty transformer as an example (the binary tree process node ①). The sample set contains 125 groups of normal transformer data and 350 groups of fault transformer data. First, the training set and test set are randomly divided according to the ratio of 7:3, namely 330 groups of training set data and 145 groups of test set data. The  Figure 5.
In Figure 5(a), ordinate 1 represents the fault transformer, and 0 represents the normal transformer. The parameters in LPboost-CART are N = 100 and D = 1, the results show that the diagnostic accuracy is 91.724%, which means 133 of 145 groups of test data are correctly identified. To further analyse the performance of the model, drawing the confusion matrix of the model, as shown in Figure 5 The diagnostic performance of LPboost-CART optimized by FA is presented in Figure 6 Figure 5, the diagnostic accuracy is obviously improved. The average fitness curve of FA optimized LPboost-CART population is shown in Figure 6(c). When the firefly iterates 24 generations, it begins to converge, the minimum error on the test set is 0.05517.
After changing the step parameter of FA to dynamic step parameter, the diagnostic performance is shown in Figure 7(a), and the diagnostic accuracy is 95.172%, Figure 7(B) is the confusion matrix of IFA-LPboost-CART. It can be seen that all 109 groups of fault samples on the test set are correctly identified,  Figure 7(c), and the population begins to converge in the 69th generation, the minimum error on the test set is 0.04828.
Before convergence, the fluctuation is large and unstable, this is because LPboost-CART is relatively complicated and the instability of LPboost at the beginning of iteration [28], the traditional FA algorithms tend to fall into local optimum when optimizing LPboost-CART. In the traditional FA algorithm, the step parameter is constant, so when it falls into the local opti- mum, it is difficult to escape the local optimum. After improving the step parameter of the FA algorithm, the step parameter can be adjusted adaptively according to the previous generation population, so it is easier to jump out of the local optimum. The fluctuation in the figure is the process of the firefly jumping out of the local optimum to find the global optimum.
Obviously, compared with the FA-LPboost-CART algorithm in Figure 6, the diagnostic accuracy is further improved, this is because the step parameter in FA is fixed, and the fireflies in the population have a large "moving" distance, although they can seek the optimal solution as soon as possible, as shown in Figure 6(c), they begin to converge after the population reaches 24 generations), however, the accuracy is obviously lower than IFA-LPboost-CART. It can be seen that the IFA algorithm can overcome the defect of local optimum of FA to a certain extent by changing the fixed step parameter to a dynamic step parameter.
The diagnostic performance of SVM optimized by IFA is shown in Figure 8, and the diagnostic accuracy is 90.345%, with the c and g parameters of SVM are 88.633 and 11.41. The confusion matrix is shown in Figure 8(b), it shows that only 98 groups The diagnostic performance of CART is shown in Figure 9, and its diagnostic accuracy is 90.345%. The confusion matrix is shown in Figure 9(b). There are 26 groups correctly identified on the normal samples and 105 groups on the faulty samples.
In order to avoid contingency, the training set and test set are randomly divided according to the principle of 7:3, and the above steps are repeated 30 times. Then the average accuracy of the 30 diagnosis models is taken as the final performance index of the transformer fault diagnosis model, and the standard deviation (std) of these results are also calculated. Finally, the performance of each model to diagnose whether the transformer has fault or not is shown in Table 4 (The binary tree process node ①).
After diagnosing whether the transformer has a fault or not, the electrical fault or thermal fault in transformer is identified according to the process in Figure 4. The performance of each model at this time is shown in Table 5.
According to Figure 4, if the transformer has a thermal fault, the severity of the thermal fault of the transformer is judged. Thermal faults are divided into high temperature overheating (T 3 ) and low temperature overheating (T 1 and T 2 ). The performance of each model to diagnose whether the transformer is high temperature overheating or low temperature overheating is shown in Table 6.
According to Figure 4, if the transformer is diagnosed as an electrical fault, the next step is to diagnose whether the transformer is high-energy discharge (D2). The performance of each model for diagnosing whether the transformer has high-energy discharge is shown in Table 7.
If the diagnosis result is not high-energy discharge, then according to Figure 4, diagnose whether the transformer is lowenergy discharge (D 1 ) or partial discharge (PD). The diagnostic performance of each model is shown in Table 8. According to this process, all types of faults can be diagnosed.
It can be seen from Tables 4-8 that the diagnostic accuracy of the FA-LPboost-CART model is improved compared with the LPboost-CART model. FA can effectively optimize the parameters in LPboost-CART, overcome the shortcomings of LPboost-CART, and further improve the performance of the model; compared with FA-LPboost-CART, the diagnostic accuracy of IFA-LPboost-CART is further improved, which proves that the improved firefly algorithm overcomes the shortcomings of traditional FA and will not fall into local optimum, this is because when the step parameter is too large, the moving distance of the firefly in the population is large, although it can have a faster convergence speed, it is easy to jump out of the global optimum in the moving process. When the step parameter is too small, the moving distance of the firefly is short, and it is easy to fall into the local optimum. When the fixed step parameter in FA algorithm is changed to a dynamic step parameter, the next generation firefly can self adjust the step parameter according to the previous generation firefly, and its adaptive dynamic step parameter can well adjust the relationship between convergence speed and convergence accuracy; compared with IFA-SVM and CART algorithm model, IFA-LPboost-CART ensemble learning model has a significant improvement in diagnostic accuracy, which proves the effectiveness of LPboost ensemble learning method.

Example verification
In order to explore whether the relevant model proposed in this study has practical value, the actual fault transformer is analysed.   Table 9 when the transformer fails on a certain day.
According to the process and parameter selection method described in Section 4.1 and 4.2, the IFA-LPboost-CART transformer diagnostic model is trained and using the contents of five characteristic gases in Table 9 as the input ( Table 9 shows the data collected by the transformer oil chromatographic online monitoring device). At this time, all samples are used as training data, and the fitness function is the average error under 4-flod CV. According to the model diagnosis proposed in this study, there may be high energy discharge fault (D 2 ) in the transformer. After the fault, the transformer was cut off for maintenance and the short-circuit impedance test was carried out. The test results showed that the low-voltage winding was seriously deformed. Then the transformer was returned to the factory for disassembly, and it was found that the primary coil was seriously deformed. After analysis, the reason is that a threephase short-circuit fault occurred in a switch cabinet about 600 m away from the substation, which caused the low-voltage side of the main transformer to bear about 20 kA short-circuit current, resulting in the damage of the low-voltage side coil, resulting in the internal discharge fault of the transformer.The main coil is impacted by short-circuit current as shown in Figure 10. The diagnosis result is consistent with the actual fault.

FIGURE 11
The lead wire and the shielded copper pipe equipotential line are broken Some oil chromatogram data are shown in Table 10 when the transformer fails on a certain day.
Taking the content of these five gases as input to the model, using the process and parameter optimization methods in Sections 4.1 and 4.2, finally, a possible high temperature overheating fault inside was diagnosed in the transformer (T2).
The temperature rise test was conducted on the transformer, and it was found that the average temperature rise was relatively high, and there were differences between the three phases, especially the B-phase data was the most abnormal. Subsequently, the transformer was disassembled and broken of the B-phase lead wire and the shielded copper pipe equipotential line were found. Transformer maintenance experts believe that the internal temperature of the transformer is too high due to the break of the shielded copper wire, as shown in Figure 11. The diagnosis result is consistent with the actual fault.
Subsequently, the IFA-LPboost-CART fault diagnosis model was used to diagnose 220 kV transformers in multiple substations in a certain area. Among them, five transformers are suspected to be faulty. The fault types diagnosed by IFA-LPboost-CART and the fault types determined after the power failure which are shown in Table 11. It can be seen from Table 11 that the fault types diagnosed by IFA-LPboost-CART are consistent with the actual fault types. It can be seen that the diagnostic model we proposed has certain practical value, therefore, it can provide technical support for transformer fault diagnosis.

CONCLUSIONS
In this research, in order to improve the accuracy of transformer fault diagnosis, a IFA-LPboost-CART transformer fault diagnosis model is proposed based on DGA, which is an universal model, any voltage grade and model of oil immersed power transformer is applicable, and the model performance is compared with the existing methods, such as IFA-SVM and CART transformer fault diagnosis model. Through theoretical analysis and verification with actual cases, the following conclusions are obtained: 1. Because the complex relationship between the content of dissolved gas in the oil and the fault type of the transformer, the inapplicability of multi-class classification method to diagnose the transformer fault is analysed, then, the binary tree method is used to convert the multi-class classification problem into a two-class classification problem, which can effectively improve the accuracy of transformer fault diagnosis; 2. The firefly algorithm can effectively optimize the parameters in LPboost-CART, which can overcome the shortcomings of LPboost-CART, and improve the diagnostic accuracy of the model; the improved firefly algorithm overcomes the shortcomings of which will fall into local optimum to a certain extent. Therefore, IFA-LPboost-CART has a higher accuracy in diagnosis; 3. Compared with the conventional and existing methods, such as CART and SVM diagnostic models, the proposed model has a higher diagnostic accuracy, which can provide technical support for the fault diagnosis of oil-immersed power transformers.