Multi‐attribute quantitative bearing fault diagnosis based on convolutional neural network

China National Key Research and Development Project, Grant/Award Number: No.2017YFE0113200; Natural Science Foundation of university in Anhui Province, Grant/Award Number: No.KJ2019A0086 Abstract Existing bearing fault diagnosis methods have some disadvantages, one being that most methods cannot completely consider all specific fault attributes. Another disadvantage is that the qualitative diagnosis method considers different fault types as a whole, and qualitative diagnosis of a single fault attribute is complicated. A convolutional neural network is proposed for application in the multi‐attribute quantitative bearing fault diagnosis. Multiple combinations of convolutional layers are adopted to directly extract features from one‐dimensional vibration signals. In addition, a softmax layer is designed to realise the simultaneous recognition of different fault attributes. The advantage of this approach is that it can realise diagnostic results for any combination of fault attributes and corresponding types, which overcomes the disadvantage of single attribute recognition in the traditional method. The method is simple but has strong generalisation ability with average diagnostic accuracy of more than 95%. According to bearing data from Case Western Reserve University and laboratory experiments by the authors, the results verify that the method can accurately and quantitatively diagnose bearing faults.


| INTRODUCTION
Bearings are one of the most used and frequently malfunctioning mechanical parts [1,2] in industry. In the past, most methods were only for qualitative diagnosis of bearing faultsthat is, diagnosing the bearing fault location (normal bearing, outer ring fault, inner ring fault and rolling element fault) [3][4][5] or considering fault location and degree but ignoring specific load [6,7] and other factors. Thus, traditional equipment maintenance causes parts loss and serious waste. Yet it is significant for the diagnosis of fault degree and bearing load as well. A method that can simultaneously diagnose fault attributes and types is called quantitative fault diagnosis. This method can realise accurate and specific fault diagnosis that overcomes the shortcomings of traditional diagnosis.
Due to diagnostic difficulty, it is difficult to achieve good results with the traditional pattern recognition method, and thus it has made slow progress in recent years. Nowadays, the extreme learning machine (ELM ) [8] is widely used for diagnosis, and it achieves good results. However, its diagnosis is mostly based on small samples of a few attributes, leading to a lack of robustness [9,10]. In addition, wavelet denoising [11,12] can improve diagnostic effectiveness but gradually tends towards saturation. Some researchers [13,14] have considered fault location and degree but have viewed the combination of different fault locations and fault degrees as a whole. As a result, they cannot obtain diagnostic results for the individual attributes of bearing fault, which affects the recognition effect. Beyond that, other researchers [15,16] have considered the quantitative diagnosis of fault degree or load only at a fixed fault location.
In recent years, deep learning has developed rapidly and achieved great success in speech recognition [17], image recognition [18] and other fields. As an important member of deep learning, the convolutional neural network (CNN) [19] has the following advantages: (1) it can automatically learn features from a large number of samples, thus eliminating the complicated feature extraction process of traditional methods and reducing dependence on expert knowledge; (2) the network structure adopts local connection and weight sharing, which greatly reduces the number of network parameters and training difficulty to improve training speed and enhance generalisation ability. Therefore, researchers have gradually been using CNN to diagnose bearing faults. Zhao Guangquan [14] and Lei Yaguo [20] have proposed a bearing fault diagnosis method based on DBN that does not require artificial feature extraction. However, many network parameters require pretraining, and the training difficulty is high. In addition, the generalisation ability of the network is weak, so the recognition effect is not satisfactory. Luyang Jing [21] has proposed a CNN-based feature learning and fault diagnosis and has compared it with traditional methods. He Qingbo [13] and Zeng Xueqiong [22] have proposed qualitative diagnosis of bearing faults based on CNN. However, the network input is a time-frequency image. It is necessary to convert one-dimensional vibration data into a two-dimensional time-frequency image by wavelet transform, which is complicated and subject to information loss during the transformation process. We [23] proposed a qualitative bearing fault diagnosis method based on CNN by directly training on one-dimensional vibration signals, which improved diagnostic accuracy using fewer network parameters. None of the foregoing research, however, has realised complete quantitative fault diagnosis.
In addition, the traditional CNN has only one category label, whereas a multi-label CNN can output multiple category labels. The multi-label CNN [24,25] is widely used for image annotation and multi-feature recognition of the human body where the number of network labels is not fixed. For human feature recognition, Zhu Jianqing [26] has proposed a multilabel CNN to judge the attributes of humans. However, the image field cannot be directly used for quantitative diagnosis of bearing faults.
Therefore, referring to the multi-label CNN, we propose a CNN for quantitative diagnosis. The traditional CNN is a single-label CNN, which is a special case of the multi-label CNN. The advantages of CNN applied in the quantitative diagnosis of bearing faults are the following: (1) the structure of the network is simple, which means the network uses fewer parameters, and training difficulty is less; (2) automatic feature extraction improves accuracy effectively; (3) without manual involvement, the method is easy to understand and promotes the project; and (4) the method has strong generalisation ability, which contributes to promoting the effect of the quantitative diagnosis of actual faults. According to bearing data from Case Western Reserve University and our laboratory, the effectiveness of this method in the quantitative diagnosis of bearing faults has been proved.

| CONVOLUTIONAL NEURAL NETWORK
The CNN has been widely used, but its application in fault diagnosis is not often attempted. In general, the CNN input is mainly image, and the input size is two-dimensional at least. However, the CNN can also be used for recognising onedimensional signals. Essentially, as long as the signal has translation invariance, it can be learnt by CNN. Bearing vibration signals have translation invariance as can be verified herein. In addition, the one-dimensional signal can be regarded as a special case of the two-dimensional signal where the width of the one-dimensional signal is considered as 1.
The typical CNN structure is shown in Figure 1. A CNN for qualitative diagnosis can be designed on the basis of LeNet [27], AlexNet [28], VGGNet [29], GoogLeNet [30] or ResNet [31], or we can design it ourselves. Its structure includes an input layer, a convolutional layer, a maximum pooling layer, an average pooling layer, and a softmax output layer. Each convolutional layer is followed by an activation layer. The input layer is H � 1 � K, where H is the data length of the sample, 1 is the data width, and K is the data dimension (namely the number of sensors). When the signals collected by multiple sensors are used at the same time for quantitative fault diagnosis, fault diagnosis accuracy can be improved. The softmax output layer of CNN is composed of a score vector, whereas that of the CNN is composed of multiple score vectors. This means that each fault attribute of the bearing is represented by a score vector, and the dimension of each score vector is equal to the number of fault types under the corresponding fault attribute.
Take the data of Case Western Reserve University as an example, as shown in Figure 2. It can be seen that there are three attributes of bearing faults, namely fault location, fault degree and bearing load. Therefore, the softmax output layer has three score vectors. The different positions of each score vector represent the different fault types under the corresponding attribute. The position of maximum value of each score vector is the predicted fault type. Therefore, each fault attribute is diagnosed independently, which means the diagnostic result of any combination of fault attributes can be obtained.
The loss risk of the single-label CNN is a single crossentropy function, whereas the loss risk of the multi-label CNN is the weighted average of multiple cross-entropy functions.
The loss risk of i th sample of the single label CNN is as follows: where L i is the loss risk of the i th sample, vector s is the score vector, y i is the label of the i th sample, and s j represents the score of all categories of the i th sample. The loss risk of the i th sample of the multi-label CNN is as follows: where L i is the sum of the risk of loss of the fault attributes, M is the number of fault attributes, L ik is the loss risk of fault attribute k, and λ k is the weight of fault attribute k. The vector sk is the score vector of the fault attribute k, nk is the number of fault types of fault attribute k, and vector y ik is the label of fault attribute k.

| THE PROCESS OF QUANTITATIVE DIAGNOSIS OF BEARING FAULT
The quantitative diagnosis of bearing fault by the CNN is divided into four steps: create training database, create CNN, train CNN and quantitatively diagnose bearing fault.
(1) Create training database: In the CNN, creation of the training database is divided into the creation of sample data and the multi-label. For the sample data under each working condition, the random method is adopted to create data. The continuous data points of the vibration data that are more than one data cycle length (i.e. the number of data points of one bearing rotation) are intercepted at random locations as sample data. The k � 2 n data points are selected preferentially as sample points, where k = 1 or 3 and n is a positive integer. The random creation method makes the generalisation ability of the network stronger. Repeat the above process to create enough samples, then create a multi-label for each sample. The number of bearing fault attributes is M, so the multilabel of the sample has M scores. label, and multiple samples are diagnosed with multiple labels. The label with the most occurrences is used as the quantitative diagnostic result of bearing fault.

| EXAMPLES OF QUANTITATIVE DIAGNOSIS OF BEARING FAULT
The example of the quantitative diagnosis of bearing faults uses the bearing fault database from Case Western Reserve University, and the experimental device is shown in Figure 3.

| Create database
The When the rotation speed is 1730 rpm and the sampling frequency is 12 kHz, the data cycle is 416. In Figure 4, 512 consecutive data points are intercepted at random locations, which is larger than one data cycle and satisfies the smallest positive integer (k � 2 n ). The vibration signals are shown in Figure 7a-c. Reducing the number of sample data points can be beneficial in reducing the training difficulty and complexity of the CNN. In addition, it speeds up training and reduces test time.
When creating the labels, the ordinal number represents the different fault types. For the fault location, the numbers 1-4 are used to represent the fault type: normal bearing, inner ring fault, rolling element fault and outer ring fault, respectively. The same is true for fault types of the other fault attributes. For example, if the bearing is in the inner ring fault, the fault degree is 14 mils and the load is 0 hp, the label is represented by [2,3,1]. If the bearing is normal and the load is 1 hp, the label is [1,1,2].
For the data of 48 working conditions, 600 samples are created for each working condition. Since there are three installation positions for a single working condition of the outer ring, 1800 samples are created for each working condition. Set the corresponding label for each sample and complete the database creation, as shown in Table 1.

| Create CNN
The network structure refers to VGGNet, which includes the input, convolutional, maximum pooling, average pooling and softmax output layers. Using the average pooling layer instead of the fully connected layer can greatly reduce the number of network parameters and training difficulty, which improves the diagnostic accuracy of the network [33]. The one-dimensional vibration signal is used directly as the input. After the input layer, every two consecutive convolutional layers are followed by a maximum pooling layer. The convolutional layer does not change the size of the feature map, but the pooling layer reduces the size of the feature map to half. Each convolutional layer is followed by an activation layer, where the activation function is used by shifted Rectified Linear Unit [34].
The number of network layers is set as 19, where the convolutional, maximum pooling and average pooling layers have 11 layers, 5 layers and 1 layer, respectively. The size of the input layer is 512 � 1 � 1, and that of the softmax layer is 1 � 1 � 13. The convolution kernel size of the first 10 convolutional layers is 3 � 1, and that of the last convolution layer is 1 � 1, and the step size of both is 1. The pooling window size of the maximum pooling layer is 2 � 1, and the step size is 2. The network structure is shown in Figure 5. The softmax output layer consists of three score vectors (corresponding with three fault attributes). The number of fault types of each fault attribute determines the dimension of each score vector. So the final softmax output layer has 13(=4 + 5 + 4) dimensions. In order to ensure that the size of feature maps remains unchanged after each convolution operation, the feature maps should be filled with padding, with zero added at the beginning and end of the sample points before performing the 3 � 1 convolution operation. The size of the feature map of each layer is 1、12、12、 12、24、24、24、48、48、48、96、96、96、128、128、 128、13、13、13. The loss is the average loss risk of the three score vectors, which is as follows: where vectors s1, s2, and s3 are score vectors of fault attributes, and [ yi1 yi2 yi3] is the corresponding label.

| Set network hyperparameters
Adam is selected as the optimiser, whose learning rate is 0.003. The regularisation coefficient is 0.0005. The mini-batch is 32. The weight is initialised to a random number of the Gaussian distribution whose mean is zero and variance is 0.1. The bias is initialised to zero.

| Test I
The samples are randomly divided into training, verification and test sets according to a 6:2:2 ratio. The training set is used to train the CNN, and the early termination method is used to stop the training at 12,000 times. The average accuracy of the test set is 89.74%, as shown in Table 2.
The results show that the accuracy is above 95% under most working conditions, which indicates that the method can accurately diagnose three fault attributes at the same time.
However, the lowest accuracy is only 21%. By observing the waveform of the vibration signals, it is found that load has little influence on the vibration waveform in some working conditions with low accuracy, so it is difficult to distinguish by network. However, as shown in Table 3, when only the fault location and degree are considered, the average accuracy reaches 98.96%, which is significantly higher than the accuracy of the three-attribute diagnosis (89.74%). Mainly, this indicates that the load is not diagnosed correctly. In particular, for a rolling element fault with 28 mils fault degree, the accuracy is much lower than average accuracy. When the load is not considered, its accuracy is up to 100%, which clearly shows that bearing load affects the diagnosis. Further analysis of the vibration waveform under these conditions finds that the waveform is almost the same under different loads. In a word, excepting the conditions that the load has little influence on the vibration waveform, the network can simultaneously and accurately diagnose three fault attributes.
Particularly, the average accuracy of the test set is 89.74% or 98.96%. Although it is not up to 100%, these are the accuracies for any sample. In the actual test, multiple samples can be created, and the label with the most occurrences is used as the diagnostic result of the bearing. Therefore, the probability of misjudgement can be close to zero.
The test set and training set are from the same sample data, which means the working condition of the test set has been learnt by the CNN during training. In the actual test, this is impossible because the actual test data is different from the training data. To avoid the above problem, test Ⅱ is designed.  In test Ⅱ, only fault location and fault degree are considered instead of load, which is in line with the actual situation. In fault diagnosis, the fault location and fault degree are often unknown, so they must be quantitatively diagnosed. While for the load, it can be calculated through the working state. In addition, it is often not necessary to obtain.

| Test Ⅱ
As shown in Table 4, T represents the test set, and Tr/V represents that the part of data is randomly divided into the training set and verification set according to an 8:2 ratio. The method of early termination is used to stop the training at 10,000 times. The accuracy of the 12 test sets is shown in Table 5.

F I G U R E 5
The convolutional neural network structure for qualitative diagnosis T A B L E 2 Accuracy of bearing fault diagnosis. A represents fault location and degree. B represents accuracy. C represents load Different from test Ⅰ, the working conditions of test Ⅱ do not appear in the training data, which means that the CNN does not learn such working conditions. However, the proposed quantitative diagnosis method can still achieve a great effect with average accuracy up to 96.3%. It further indicates that this method has a very strong generalisation ability.

| Comparison with other diagnostic methods
We select several other fault diagnosis methods as comparison in Table 6, whose samples are all from Case Western Reserve University. The comparison methods include ELM and CNN. In the methods of Tian, Yan and Rodriguez, there are only two fault attributes, which means they ignore the fault load or degree. However, they consider of detailed fault types of each fault attribute (classes number is more than 10). In contrast, Ding's method only contains six fault types, but has three fault attributes. Obviously, the method takes the combinations of the part of fault attributes as a whole. In addition, they are faced with a common limitation: their satisfactory accuracy is based on a small sample. Differently, we build a complete data sets including 3 fault attributes and 48 combination of different fault types. The huge numbers of samples can obviously improve generalisation and robustness of diagnosis model, though they lead to a little decrease in accuracy for some hard working conditions (the reason has been introduced in Section 4.4).
In order to verify generalisation and robustness of our method further, the data from our laboratory are selected as tests in the next section (the accuracy is up to 99.6%). Due to that the most of the above comparison methods do not adopt the actually collected data as test (the others lack complete data), it is difficult to make a direct comparison with ours, which indirectly indicates our advantage of generalisation and robustness.

| EXAMPLE OF QUANTITATIVE DIAGNOSIS OF BEARING FAULTS IN THE LABORATORY
The accuracy of diagnosis can be improved by collecting vibration signals through multiple sensors. For the tradition methods, the vibration data of a single sensor is usually processed separately and then to be considered comprehensively, resulting in that the process is difficult and the effect is not satisfactory. However, for CNN, multi-dimensional data of multiple sensors can be directly as the input, which simplifies the process and improves the diagnosis effect. Based on the quantitative diagnosis of bearing fault data of our laboratory, it verifies the advantages of the proposed method for data fusion of multiple sensors.

| Laboratory bearing data
The device in our laboratory is shown in Figure 6. The sensor A collects the axial vibration signal, which is relatively weak. The sensor B collects the radial vibration signal, which is stronger.

| Creation of the bearing data set
The sampling frequency is 5120 Hz. The speed is 1800 rpm, and the data cycle is 170. According to the principle of k � 2 n , the data length is selected as 256. The vibration signals are shown in Figure 7d-f. The experimental bearing data includes normal bearing, inner ring fault and outer ring fault, which the load is 0, 1 and 2 hp. It is worth noting that there is only 0 hp for normal bearing. Two acceleration sensors are used to collect the axial and radial vibration data. The sample data size is 256 � 1 � 2, which 2 represents the data of two sensors. 600 samples are collected for each working condition, which is a total of 4200 samples. The distribution of samples is shown in Table 7. The samples are randomly divided into the training set, verification set and test set according to the ratio of 6:2:2. -9