Defect identification of wind turbine blade based on multi‐feature fusion residual network and transfer learning

As a key part of the wind turbines (WTs), the blade has a direct influence on the efficiency of WT. Because the defect detection technology of WT blade is not widely used, and the robustness of traditional detection methods is poor, this paper proposes a multi‐feature fusion residual network combined with transfer learning. In this paper, the WT blade image dataset is enhanced and constructed to train the convolutional network. Two residual structures of multi‐feature fusion (two feature fusion and three feature fusion) are proposed and compared. At the same time, transfer learning is used to improve training process and accelerate convergence. Compared with several convolutional neural networks based on indices include training loss and testing accuracy, f1‐score and confusion matrix, the method proposed greatly reduces the time while achieving accurate detection.


| INTRODUCTION
In recent years, the world has been suffering from climate warming and energy crisis, forcing many countries to develop environmentally friendly new energy. 1 Wind energy is one of them. Its acquisition also drives the development of wind power industry. 2,3 The wind power generation technology is becoming more and more mature, and the utilization rate of wind energy has reached 95%. Blade as a key component of wind turbines, its cost accounts for about 20% of the installation cost of WT, and it is also a wind energy capture device. Hence, the detection and maintenance of WT blade is particularly important. 4 Since the WT blade is exposed to the outside all years round, it is easy to be affected by harsh environment and complex working conditions. 5 This leads to many defects in the WT blades, such as oil stains, paint peeling, corrosion, sand holes and so on, which greatly reduces the efficiency and performance of wind turbine. Consequently, the detection of WT blade surface defects is pretty important. 6 At present, there has been a lot of research on the detection of WT blades. The traditional blade defect detection methods are mainly based on nondestructive testing technology and vibration signal. 7,8 There are also abundant sensors installed to detect blade defects, such as dynamic sensor, acoustic emission sensor (AES), multi-dimensional sensor monitoring network, etc. 9 Inspired by the concept of transitivity of frequency response function, W. Yang studied a new method of WT blade condition monitoring (CM). 10 The results show that the method can effectively detect and locate the defects of WT blades. Chin Shun Tsai proposed a new method of blade damage detection based on continuous wavelet transform. 11 This method can be applied to the identification of blade structure damage in many cases. Z. Fu proposed a reliability analysis method of wind turbine blade condition monitoring network based on wireless sensor network. 12 The research provides theoretical and technical guidance for the establishment of highperformance wind turbine blade condition monitoring and control system.
However, some problems generally exist in these methods, the accuracy and safety of traditional detection methods cannot be guaranteed. In recent years, the development of machine learning also promotes the development of feature processing and gradually derived logit boost, decision tree (DT), support vector machine (SVM) and others. 13 L. Wang proposed a data-driven framework for automatic surface crack detection of WT blades. The Haar-like feature is used to describe the crack area, and the cascade classifier is trained to detect the crack in the WT. 14 Gian Antonio Susto proposed a multi classifier machine learning method for predictive maintenance. The proposed PDM method allows the use of dynamic decision rules for maintenance management and can be used for high-dimensional and censored data problems. 15 However, the types of defects diagnosed by signals are limited, and the signal processing is also complicated, at the same time, these traditional detection methods cost considerable time and effort. Since most of the defects of WT blades are visible to the naked eye, the defect detection of WT blades can be regarded as an image classification problem in the field of computer vision.
With the improvement of accuracy from traditional 70% to 80% by AlexNet in 2012 16 and, became the winner of that year, convolutional networks are evolving in depth to improve the accuracy of their models, typical examples such as VGG, GoogLeNet and ResNet. [17][18][19] The effectiveness of the above model has been fully verified in many fields. D. Xu et al. used UAV to take the image of WT blade defect and used the traditional convolution neural network to detect the WT blade. 20 The detection effect is higher than the traditional detection methods. C. Zhang proposed a convolutional network Mask-MRNet to detect blade defects, he established the fault mask data set of WT blade and different types of WT defect data set respectively. 21 Based on the proposed network, the blade image can directly express the mask, boundary box and fault type. However, the application of deep learning in WT blade defect detection is relatively few, and there is still some space in detection time and accuracy.
In order to explore a fast and accurate method to detect the WT blade defects, we improved the residual network with feature fusion, transfer learning method is also used to further improve the detection efficiency. The contributions of this paper are summarized as follows: 1. Deep learning is applied to the task of WT blade defect detection, and data preprocessing work is carried out. 2. A multi-feature fusion residual network structure is proposed. 3. Through experiments, the improved network is compared with 5 models, and its effectiveness is verified. 4. Transfer learning is carried out to speed up the process of training.
This paper is organized as follows: Sec II introduces acquisition and optimization of WT blade defect data. The description of algorithm in detail is given in Sec III. Sec IV focuses on the training and comparison with different algorithms. Finally, the conclusion and future work are summarized and prospected.

OPTIMIZATION OF WT BLADE DEFECT DATA SET
Data are the basis of deep learning and the optimization of data also determines the optimization of deep network model. 22 About 10,000 blade images were taken by UAV in a wind farm in East China under stable state. We used Dajiang mavic2 Pro UAV equipment to shoot WT blades, and the pitch angle of PTZ is controlled by the left wave wheel of the remote control. By deleting some invalid images that do not contain the WT blade parts and a large number of overlaps, the images of 7546 wind turbine blades were collected. The data set is divided into training set and test set. The former contains 6035 images, accounting for about 80%, and the latter contains 1511 images, accounting for about 20%. At the same time, the blade images are divided into five categories: normal situation, paint falling off, gelcoat falling off, surface erosion and oil stain.
Due to the uneven distribution of data between classes in the WT blade defect data set. Some data enhancement operations such as rotation, brightness, flip, color dithering are performed which is more suitable for training of deep network. Finally, the number of images was expanded to 9785, including 7819 training images and 1966 testing images. The resolution of each blade image taken by the UAV is 5472 × 3684, because the image size required for deep convolutional network input is 224 × 224, we adjust the WT blade data set to corresponding size to suit the network input. Figures 1 and  2 show the defect images of the WT blade and a series of WT blade fault enhanced images. The proportion and quantity comparison of each category of image data are shown in Figure 3A,B.

| The network structure
In this section, a multi-feature fusion residual structure is proposed, including two feature fusion and three feature fusion residual structure. Two kinds of network structures are compared with residual structure and GoogLeNet's inception structure. The characteristics of the four network structures are introduced respectively, and the advantages of multi-feature fusion residual structure are described.
The structure of the four networks is shown in Figure 4. The residual network (ResNet) is mainly formed by stacking multiple residual structures. The network can train up to 152 layers. In the shallow residual structure, feature extraction is carried out by two 3x3 convolution kernels, and the input x and output F(x) are superimposed to obtain the final output H(x) = x + F(x), where F(x) is the residual of x and H(x). The deep residual structure consists of two 1 × 1 convolution kernels and a 3x3 convolution kernel. The former is used to raise and lower dimensions, while the latter is used to extract features. Because the residual network can effectively solve the degradation problem of deep network, the residual structure is selected as the research object. Of course, there are many improvements on the residual network. For example, M. Zhao proposed a deep residual shrinkage network to improve the accuracy of fault diagnosis to improve the feature learning ability of high noise vibration signals. 23 Y. Zhang combined the residual network with some adaptive shortcuts to repair high-quality images, and achieved good results. 24 GoogLeNet proposes the Inception structure that incorporates multi-scale features to enhance network performance, this connection structure can extract multiple feature information and effectively solve the overfitting problem caused by too deep network hierarchy. The feature is extracted by three filters with the size of 1 × 1, 3 × 3 and 5 × 5, which is stacked by a series of sparse structures. In addition, two auxiliary classifiers are added to solve the problem of gradient disappearance.
The deep network model adopted in this paper is improved on the basis of the ResNet network structure. Residual structure I nception structure Two feature fusion residual structure Three feature fusion residual structure Inspired by the inception structure of GoogLeNet, the robustness of the model is improved by increasing the width of network and combining with various features. In addition, the proposed fusion feature is used to replace the single feature in the residual structure. Through comparing the performance of two-feature and three-feature fusion residual structure, the latter is better. The residual structure of three-feature fusion is selected to form the final residual structure of network framework. Two-feature fusion residual structure was extracted by feature fusion with convolution kernel with size of 3x3 and 5x5. Feature extraction of three-feature fusion residual structure was completed by convolution kernel with size of 1 × 1, 3 × 3 and 5 × 5 respectively. In both fusion methods, 1 × 1 convolution kernel is used for dimensionality reduction and increase the depth of the network. Finally, a Dropout layer with a probability of 0.2 is connected to prevent overfitting. 25

| Loss function and optimization algorithm
After preprocessing the picture of WT defect by data enhancement method, the definition of loss function and gradient descent algorithm is a crucial part of the whole deep network which can directly affecting the accuracy of the model. The correlation between predicted value and real value can be compared by defining the loss function, and the weight of the model can be continuously updated by defining the gradient descent function. The loss function adopted in this paper is the cross entropy loss function, 26 and the Adam optimization algorithm is adopted to complete the gradient descent. 27 At the same time, Dropout is also used to solve the overfitting problem. 28 The loss function is defined as follows: where m is the total number of samples, h (x (i) ) is the predicted value, y (i) is the true value. The cross entropy function describes the distance between the actual output (probability) and the expected output (probability). The smaller the cross entropy is, the closer the two probability distributions will be. It compensates for the gradient dispersion of the traditional square deviation loss function and is more suitable as a loss function. The final loss function is shown in Equation (4). J( )1 represents the loss of the auxiliary classifier, and J( )2 represents the loss of the network. We refer to the final loss weight distribution of GoogLeNet. The final loss is obtained by multiplying by the weights of 0.2 and 0.8 and adding them.
The gradient descent algorithm is defined as follows: Adam (adaptive moment estimation) optimization algorithm is used to complete gradient descent. This algorithm can maintain the previous square gradient and the exponential decay average of the gradient and calculate the adaptive learning rate of each parameter to update the weight.
In Equations (5) and (6), gt represents the gradient, it averages the gradient and the square of the gradient, so that each updated value is relative to the previous value, mt and vt represent the first time mean of the gradient and the second time non-central variance of the gradient respectively. 1 and 2 are constants, which are used to control the exponential decay. The purpose of Equations (7) and (8) is to correct the large sliding average deviation in the initial stage. When t becomes larger and larger, the denominator approaches 0, and the correction is completed. is the learning rate, in order to make the model tend to converge. In Adam, the learning rate attenuation is realized through the self-adaptive learning rate method. Equation (9) is the final parameter update formula. It can be seen that the learning rate is not fixed, and the learning rate is different for each iteration. Compared with other (1) adaptive learning rate algorithms, Adam algorithm has a faster convergence speed and requires less memory, also it can effectively prevent the learning rate from disappearing and the fluctuation of loss function caused by high variance parameter.

| Transfer learning
The purpose of transfer learning is to apply the knowledge or model learned in one field or task to different but related fields or problems. 29 In deep learning, transfer learning is mainly used in performing new convolution tasks. In the process of training, the input data can be retrained by making full use of the weight learned from previous data to speed up the process of training, some weights of convolution network are no longer updated gradiently, so as to reduce the training time of the model. Yu extracted the feature of the blade image through the convolution neural network trained on the ImageNet data set, and completed the defect recognition of the wind turbine blade by combining transfer learning and convolution network. 30 Yang combined transfer learning and ensemble learning in blade defect detection task. The results show that the use of transfer learning not only improves the ability of feature extraction but also accelerates the training of network. 31 In this paper, the weights trained in Pascal VOC data set are used for transfer learning task. Due to the difference of final output categories between the data set in this paper and Pascal VOC data set, in the convolution network of this paper, all the parameter layers used for feature extraction are frozen and not updated, only the parameters of the fully connected layer used for classification are updated.
In the multi-feature fusion residual network, the weights trained before are saved. In transfer learning, the parameters of feature extraction are not updated, only the parameters of classification layer are updated.

| The network of the multi-feature fusion residual
Residual structure of the multi-feature fusion is shown in Figure 5. In a traditional ResNet34 residual network, each residual structure is cycled multiple times (3, 4, 6, 3). In this paper, our network structure is carried out only once, which greatly reduces the depth of the structure. The final network structure is connected by 4 multi-feature fusion residuals. After completing a multi-feature fusion residual structure, a convolution kernel with size of 3 × 3 is connected to adjust the size of feature map into the next multi-feature fusion residual structure. In addition, before the penultimate residual structure, an auxiliary classifier is added to solve the gradient disappearance problem. Finally, transfer learning is carried out to accelerate the training process.
The feature extraction module is shown in Table 1. Among them, conv1-conv4 is the module of size adjustment and feature extraction, while the module of incep-tion1 -inception4 is the module of feature fusion. Filter size is the size of the convolution kernel. Filter number is the number of convolution kernels.

F I G U R E 5
The network of the multi-feature fusion residual

| TRAINING AND COMPARISON
This chapter mainly introduces the performance of seven neural network models in different situations, including 21 experimental results under the original data set, enhanced data set and transfer learning. This experiment is based on the Windows 10 operating system. GPU acceleration is performed in three configurations: RTX 2080 Ti, Titan XP and GTX 1080ti. Finally, the performance of models are verified by time, test accuracy, training loss, confusion matrix and F1 score. The experiment is carried out under the learning framework of pytorch. The hyperparameters involved in the experiment are shown in Table 2.

| The process of training
We implemented seven classification models and conducted 21 comparative experiments based on the original WT blade data set, enhanced data set and transfer learning. The training loss and test accuracy of different networks are shown in Figure 6. Figure 6 shows the training loss and test accuracy of 7 classification models under the original WT data set, enhanced data set and transfer learning. In the picture, enhanced data (ED) represents the input of enhanced data into convolution network, and transfer learning (TL) represents the combination of convolutional network and transfer learning. All networks are stable after 300 iterations. Figure 6O shows the test accuracy comparison of 7 models, and Figure 6P is the comparison of test accuracy under transfer learning.
We can see from the performance of AlexNet that there is obvious over fitting phenomenon in the test accuracy due to the uneven distribution of types in the original WT data set. After adding data by means of data enhancement, the test accuracy not only increases but also tends to be stable. At the same time, after applying the weight of Pascal VOC data set to transfer learning, the training loss and test accuracy have been significantly improved. The final test accuracy of AlexNet on the data set of WT blade defects can reach about 75%. Similarly, the advantages of data enhancement and transfer learning are further confirmed in VGG, GoogLeNet and ResNet. The increase of data improves the test accuracy of each model, solves the over fitting phenomenon, and makes the curve region stable. The introduction of transfer learning significantly improve the iteration speed of each model, and the training loss and test accuracy tend to be stable after about 50 iterations. It is worth mentioning that the introduction of transfer learning does not reduce the accuracy of the model. In VGG and GoogLeNet networks, the addition of transfer learning further excavates the accuracy of the model. Finally, the test accuracy of VGG and GoogLeNet on the WT blade defect data set is about 90% and 88%, the accuracy of ResNet18 and ResNet34 is 91% and 92%.
Two kinds of multi-feature fusion residual networks are tested, among them, ResNer_2F represents two feature fusion residual network, ResNet_3F is called three feature fusion residual network. It can be seen from Figure 6L-M that the accuracy of both networks can meet the requirements of accurate detection of WT blade defects, and the final curve tends to be stable. It can be seen from Figure 6O that the test accuracy of the proposed multi-feature fusion residual network is slightly higher than that of other networks. Most importantly, from the enlargement trend of Figure 6O, it can be seen that the iterative speed of multifeature fusion residual network is significantly faster than other networks. Therefore, the multi-feature fusion residual network has the characteristics of fast detection speed and meet the requirements of accurate detection of WT blade defects. Which is a method of rapid detection of blade defects in line with our needs. At the same time, the last figure shows that the introduction of transfer learning does not affect the final detection accuracy.  which improves the efficiency of the computer to a certain extent. In addition, the introduction of transfer learning also greatly reduces the detection time and the pressure on the device. To sum up, the combination of transfer learning, and the proposed network is a new method we are looking for to quickly detect the defects of WT blades.

| Comparison of F1 score of each model
F1 score is used to further evaluate the performance of the model. The F1 score of the five networks are shown in Table 4. F1 score is a measure of classification problem, which considers both the accuracy and recall of the model. It is also a harmonic average of accuracy and recall. The calculation formula of F1 score: The value range of F1 score is from 0 to 1. 1 represents the best output of the model, and 0 represents the worst output of the model. As shown in equation (10), precision is for our prediction results, which indicates how many samples with positive prediction are really positive samples. Recall is for our original sample, which indicates how many positive examples in the sample are predicted correctly. It can be seen in Table 4 that the performance of the proposed network is relatively excellent. In addition, the performance of F1 score before and after transfer learning is well.

| Comparison of confusion matrix of each model
Confusion matrix is the most intuitive and simple method to measure the accuracy of classification models. 32 We select 149 WT blade pictures to evaluate the confusion matrix of two feature fusion networks. As shown in Figure 7.
In Figure 7, the diagonal of the confusion matrix represents the number of correct detections in each category. It can be seen that the performance of ResNet_3F is the better which it detects three more WT blade images than ResNet_2F.

| CONCLUSION
In this paper, deep learning is applied to the identification of WT blade defects. In order to quickly detect the defects of WT blades, a residual network structure based on multifeature fusion is proposed. Firstly, the UAV is used to shoot the WT blade image. In view of the uneven distribution of classes, the data set is expanded by data enhancement to form a complete data set. At the same time, transfer learning is used to further compress the parameters of the model and reduce the detection time. Finally, the results show that transfer learning can greatly accelerate the training process of network while ensuring accurate detection, the accuracy of the proposed network can reach 93%, and the detection time is shorter than other networks, at the same time, the comparison of confusion matrix and F1-score further confirm the superiority of ResNet_3F.
Since the defect types of the WT blade are relatively less in this experiment, more blade defect types need to be labeled and input into the multi-feature fusion residual network to verify its performance repeatedly. At the same time, it is necessary to further increase the number of multiple features to improve the detection accuracy and discuss the number of multiple features in optimal network. In addition, the design of WT blade defect detection system interface which can accurately determine the types of defects is also a critical work in the future.