Dimension reduction graph‐based sparse subspace clustering for intelligent fault identification of rolling element bearings

Sparse subspace clustering (SSC) is a spectral clustering methodology. Since high‐dimensional data are often dispersed over the union of many low‐dimensional subspaces, their representation in a suitable dictionary is sparse. Therefore, SSC is an effective technology for diagnosing mechanical system faults. Its main purpose is to create a representation model that can reveal the real subspace structure of high‐dimensional data, construct a similarity matrix by using the sparse representation coefficients of high‐dimensional data, and then cluster the obtained representation coefficients and similarity matrix in subspace. However, the design of SSC algorithm is based on global expression in which each data point is represented by all possible cluster data points. This leads to nonzero terms in nondiagonal blocks of similar matrices, which reduces the recognition performance of matrices. To improve the clustering ability of SSC for rolling bearing and the robustness of the algorithm in the presence of a large number of background noise, a simultaneous dimensionality reduction subspace clustering technology is provided in this work. Through the feature extraction of envelope signal, the dimension of the feature matrix is reduced by singular value decomposition, and the Euclidean distance between samples is replaced by correlation distance. A dimension reduction graph‐based SSC technology is established. Simulation and bearing data of Western Reserve University show that the proposed algorithm can improve the accuracy and compactness of clustering.

and efficient means of achieving intelligent machine diagnostics. 4 Methods for identifying rolling bearing faults using vibration may be categorized as unsupervised classification and supervised clustering.
Supervised classification is different from unsupervised classification.
It requires class labels and trains a classifier according to data and class labels. 5,6 Currently, the convolutional neural networks (CNN), support vector machines (SVM), artificial neural networks (ANN), depth automatic encoders (DAE), and other deep learning models are widely utilized for pattern recognition. Numerous intelligent fault diagnostic techniques have been developed on this premise, with some success. 7,8 Although these techniques are extensively utilized in the detection of mechanical faults, they do have certain inherent disadvantages. 9 For instance, the final diagnostic ability of SVM and ANN is mostly reliant on feature extraction and advanced signal processing techniques, 10,11 and the diagnosis accuracy of CNN and DAE is likewise highly dependent on numerous labeled samples. 12 However, a shortage of label training data is a frequent occurrence in actual industrial applications, and acquiring accessible label samples is often time-consuming. As a result, the development of a new intelligent diagnostic technique for rolling bearings has become a priority. 13,14 Clustering data are an extremely powerful analytical technique in machine learning and data mining. 15 Different from the above classification techniques, 16 clustering is to divide an object collection into multiple classes made up of related items. It is an effective method for clustering fault data and distinguishing data associated with a particular fault type from data associated with other fault types. It provides a flexible approach that does not require failure data or prior knowledge of fault types. The method is also viable in the absence of adequate labeled fault samples for supervised training. Clustering algorithms have been developed for many years. For instance, k-means clustering and fuzzy c-means clustering are used for fault identification in wind turbines. 17,18 To assess the noise and vibration of automobiles, the hierarchical clustering technique of the unsupervised clustering algorithm is employed. 19 An intelligent signal processing approach for bearing failure detection is presented based on the propagation of vibration signals through affinity clustering. 13 The spectral graph theory is a major topic in the field of graph theory. 20 A Laplacian matrix with its eigenvalues and eigenvectors is used to describe the properties of graphs. 21 A graph is a kind of data structure that is made up of vertices and the connections between them (i.e., a group of edges). The most often used clustering techniques are feature-based, using the signal feature matrix as input.
Clustering performance is governed by the information included in the input features that contain recognition information. However, because of the incomplete mining of similarity information between features, the feature matrix comprised of perceptual signal features cannot be utilized for optimum clustering.
For example, the block diagonal affinity matrix of the graph is constructed by adaptive probabilistic neighborhood learning. At the same time, a flexible embedding scheme is used to reveal the inherent cluster structure in the low-dimensional subspace, so as to effectively suppress the irrelevant information and noise in the high-dimensional data. 22 Multiple features are used to learn a robust affinity matrix. The distance in low-dimensional space and the optimized affinity weight are optimized together to deal with redundant high-dimensional features. 23 Sparse subspace clustering (SSC) may utilize an optimization technique to maximize similarity information and produce an optimal sparse clustering graph, 24 which can help address the problems mentioned above. 25 The sparse coefficients corresponding to the clustering data are presented in the sparse graph in a block form in which the same clustering data are distributed on the diagonal and different clustering data are distributed in the nondiagonal position. 26 Since the block diagonal structure in the sparse network includes the recognition information of clustering, it is possible to create the optimized sparse graph by using the optimization technique for solving SSC. 27 However, conventional SSC is a kind of spectral clustering subspace learning method based on global linear representation, and the global approach will discover similarities between the target data and all other types of data. This method generates nonzero sparse coefficients between the goal data of a particular cluster and data from other clusters, but does not create tight block diagonal graphs. To address this issue, Ref.
proposes a composite graphbased SSC (CG-SSC) method. The objective function of sparse representation incorporates distance information between data, and the coefficient is inversely proportional to the distance to modify the amplitude of the sparse coefficient.
In this paper, based on the previous research, we propose a sparse subspace representation algorithm to overcome the shortcomings of traditional algorithms, such as poor robustness and compactness. First, the envelope of the sample data is solved to increase the signal-to-noise ratio (SNR) of the sampled signal. Then, to accomplish the goal of dimensionality reduction, the time-and frequency-domain indexes of the sample data are computed for feature extraction and subsequent feature extraction. Following that, the singular value decomposition (SVD) is used to reduce the dimension of the feature matrix for selecting the most important features. When constructing the objective function of sparse representation, the correlation distance between data rather than Euclidean distance information is introduced to adjust the amplitude of the sparse coefficient. If the distance between two data points is considerable, the amplitude of the sparse coefficient will be restricted to a narrow range. Construction of a rigorous clustering block diagonal graph is accomplished via this technique. For a specific data cluster, the eigenvector corresponding to the rigorous block diagonal graph is a constant without amplitude fluctuation. Compact data clusters will be produced when these feature vectors are utilized in conjunction with data clustering, and clustering errors will be minimized as a result.
The structure of this work is as follows: In Section 2, the theoretical basis is briefly introduced and the algorithms of dimension reduction graph (DRG)-based SSC are discussed. In Section 3, through the design of analog signals similar to the actual working conditions, it is verified that this method has great advantages in clustering tasks under different SNRs. Then, through the bearing data of Western Reserve University, the performance of this method in clustering accuracy and compactness is better than the traditional algorithm. Finally, Section 4 draws some conclusions and discussion.

| THEORETICAL BASIS AND ALGORITHM INTRODUCTION
The primary goal of bearing fault diagnosis is to distinguish between the data associated with a specific bearing state and the data associated with other bearing states, which is known as discrimination.
Spectral graph theory 20 is a significant area of study in graph theory, with the goal of characterizing graphs in terms of the eigenvalues and eigenvectors of the Laplacian matrix. Vertices and the links between them (i.e., a set of edges) provide the basis for graph data forms, which are in turn derived from graph theory. The correlation coefficient is a statistical technique for determining the degree of connection between two data. Correlation coefficients are between −1 and 1. The correlation coefficient's absolute value indicates the degree to which the two samples are correlated.
To extract fault features, 29,30 identify abnormalities, 31 and recognize machine structures with forces, 32 sparse representation is employed in signal processing. The sparse subspaces combined with the capacity to recognize signal patterns is an excellent technique for health monitoring. 33 It is possible to cluster data from various bearing states into a specific subspace using the SSC described in Ref.
where each cluster represents a different bearing state, and the distance between clusters identifies the defect. The goal of SSC is to find a subspace suitable for segmenting and sorting data with uncertain class labels. A sparse global similarity matrix is generated to locate the subspace. This matrix represents the similarity of data samples taken from the same mechanical state as the original data sample.
The ability to construct a similarity matrix from a variety of perspectives has been shown to improve the effectiveness of SSC. (1): T denotes the sparse coefficient. Minimize the L1 norm of sparse coefficient α to generate an objective function, 36 as in Equation (2): where N represents the noise signal. We try to find the best sparse subspace to construct the similarity matrix. The above objective function can be solved according to the following transformation The sparse coefficients in the above objective function can be solved by optimization, and then the obtained sparse coefficients are constructed into the similarity matrix of the data set. For a graph, we usually use the set of points and the set of edges to describe it. The similarity matrix represents the graph structure between data samples.
A zero value indicates that two corresponding samples are not connected; a nonzero value indicates that two corresponding samples are connected; a nonzero element corresponds to a vertex in the graph structure; and the amplitude of these nonzero elements represents the weight of the edge connecting the two vertices. Due to the sparsity of the similarity matrix, a majority of the matrix's entries are zero corresponding to the weights of edges of the graph. Due to the absence of connections between vertices, the graph created using the similarity matrix is sparse, referred to as a sparse graph.

| DRG-based SSC
Hilbert transform can demodulate the collected modulated signal . First, the sample data are transformed by Hilbert transform, as in Equation (4): where H [·] is the Hilbert transform operator.
When calculating the similarity matrix in SSC, a global approach is often used. The data sample ⋯ y i n , = 1, 2, 3, , To circumvent this constraint, the dimension of the feature matrix is reduced using SVD, and the most significant k-order is utilized as the input for SSC, 37 as in Equations (5) and (6): In the current research, 30 | 209 34 will be seriously lost. This paper uses correlation distance D y y i j instead of Euclidean distance to measure the similarity, as in Equations (7) and (8) where E(⋅), D(⋅) and Cov(⋅) are mean, variance and covariance of data sets, repectively; ρ y y i j is the correlation coefficient. Second, the idea of data correlation distance is introduced into the construction process of similarity matrix to control the amplitude of sparse coefficients. Therefore, the objective function in Equation (2) can be rewritten as in Equation (9) D y y i j is possible to limit the sparse coefficient by solving the minimal product of correlation distance and the sparse coefficient.
Equation (9) is written in the Lagrangian notation as in Equation (10) To improve the effect of SSC, an ideal sparse coefficient must be obtained before a similarity matrix can be constructed and a constraint is added to the sparse coefficient. Therefore, add constraints to the objective function as in Equation (11): is used in Equation (11) to denote a trade-off between sparse vector α ij and data distance D y y i j . When the distance D y y i j between two data points y i and y j is large, the sparse coefficient α ij will be tiny. This is the primary distinction between the technique described in this paper and conventional SSC, which relies only on global sparse information for clustering. Therefore, we call this method the DRG-based SSC.
The similarity matrix S is built as in Equation (12), utilizing the sparse coefficients of all of the data.
The above similarity matrix shows a DRG of the data to be represented. The weight of the edges in the DRG reflects the amplitude of each element, and it is this weight that is utilized to assess the similarity of these two data sets. On the similarity matrix, normalized spectral clustering is used to get a graph partition. 21 Its objective function is expressed in terms of eigenvalues, as in Equation (13): where a diagonal element of D equals to the sum of all elements on the same row of the similarity matrix, i.e., D S = ∑ ii j ij ; L I D SD = − −1/2 −1/2 is a Laplacian matrix; u and κ represent eigenvectors and eigenvalues, respectively. The eigenvectors u are a low-dimensional representation of the original data. The eigenvector may be used to identify the cluster using techniques like k-means.

| The specific fault diagnosis process of the proposed method
The envelope of the vibration signal is used as the sample signal input. First, the feature of the sample signal is extracted. Then, the correlation distance is used as the evaluation similar distance matrix, and SVD is used to reduce the dimension of the input sample feature.
In this paper, an intelligent fault identification method based on reduced dimension graph SSC is proposed. Figure 1 shows the flowchart of the proposed method. The specific diagnosis process of the algorithm is as follows:

Extraction of characteristics
First, the envelope of the vibration signal is calculated to reduce the influence of noise on feature extraction. According to the 29 calculation indexes given in Table 1

Graph creation and sparse representation
In the coefficient solving stage, the following objective function is constructed: This function is solved to obtain sparse coefficients α i . Then, the adjacency distance matrix is constructed by graph theory, and the similarity matrix S and degree matrix D are constructed by this matrix.

Clustering
According to the similarity matrix and degree matrix, the nor- where n and m represent the number of data vectors and features, respectively. In the paper, it is assumed that the quantity of data samples from each class is equal. Let c be the total number of classes and n c / the total number of data samples from each class. To create an n n × diagonal distance matrix D y y i j , the pairwise distance between a vector y i and an arbitrary vector y j is calculated. This matrix constrains the sparse representation used to construct the DRG-SSC objective function, as in Equation (11). The goal function is solved to provide sparse coefficients for all vectors. The similarity matrix is then built using these coefficients. Table 1 summarizes the procedure for conducting DRG-SSC for machine failure diagnostics, and Figure 1 depicts the operation flowchart.

| Case I: Simulation analysis
Local pitting of the inner and outer rings is a common symptom of early failure for rolling bearings. When the fault part of the rolling bearing contacts other parts, it will produce a transient impact.
Because of the periodic motion of the rotating parts and the ran-  (15) and (16): A with ζ being the coefficient of resonance damping attenuation, f n the natural frequency related to the bearing or system, φ A and C A the arbitrary constants, f A the modulation frequency 40 where y i indicates the signal series and N signifies the number of points included in the signal where y f ( ) j means the frequency spectrum, M denotes the length of the series, and f is the frequency component.  In such a configured environment, the computational time of different clustering methods under different noise standards is shown in Figure 8. It can be concluded that PCA has the least running time and SSC has the longest running time. However, the running time of the proposed method in this paper is not ideal and long.

| Case II: Experimental study
To test the bearing with different types of faults, the bearing data of

| CONCLUSION AND DISCUSSION
In this paper, we propose a DRG-based SSC.  measuring the similarity between samples, the correlation distance rather than the traditional Euclidean distance is used, which improves the robustness of the algorithm with respect to noise and interference. The correlation distance is integrated into the coefficient solution stage of the SSC. The correlation distance constraint can make the similarity matrix constructed by the coefficients have a strict diagonal structure, increase the distance between clusters, reduce the distance within the cluster, and finally achieve high cluster compactness. Compared with the spectral clustering, sparse subspace and PCA algorithms, the simulation sample data and experimental sample data show that the proposed method has strong robustness, improves the clustering compactness, and achieves high accuracy.
Although the proposed algorithm has achieved good clustering results, it still has some limitations. For example, the computational time is relatively long. Also, the robustness of this method needs to be improved for a wider range of applications.

ACKNOWLEDGMENTS
The present work is supported by the National Key R&D Program