Retracted: Manifold learning‐based fuzzy k‐principal curve similarity evaluation for wind turbine condition monitoring

As wind energy is experiencing an unprecedented development in today's world, the condition monitoring of wind turbine systems which can avoid serious accidents and economic losses are gathering more and more attentions. Considering the evaluation methods based on machine learning are complicated and unstable in terms of model training and parameter selection, this paper proposed a novel assessment algorithm based on similarity analysis of fuzzy k‐principal curves (FKPCs) in manifold space. Initially, 38 fusion features containing time‐domain, frequency‐domain, and wavelet node energy features are extracted from the vibration signal of the wind turbine bearing. Then, the nonlinear local algorithm Laplacian eigenmaps was introduced to transform the original features to the projected lower dimensional space and obtain the more typical parameters. Combining fuzzy clustering with the k‐principal curve creatively, smooth principal curves of the feature sets were then extracted according to point distribution in three‐dimensional manifold space. Finally, the Hausdorff distance algorithm was employed to calculate the similarity between the principal curves of healthy and test datasets to assess the running condition of wind turbine. To validate the present method, the experiments of accelerated bearing degradation were carried out. A series of comprehensive and persuasive comparisons and analysis with different feature conversion methods (principal component analysis [PCA] and Isomap), different distance analysis algorithms (Euclidean distance and Mahalanobis distance), and different assessment models (hidden Markov model [HMM] and deep belief network [DBN]) verified that the proposed FKPC technique can monitor the operating status of the machine more accurately without complex model training and parameter selection processes.


| INTRODUCTION
As the problem of environmental pollution and resource depletion becomes increasingly serious all over the world, wind power, water power, and some other clean energy has been experiencing an ever rapid development, among which wind power that produces no known severe atmospheric pollution has gained a lot of attention. However, since usually being located in remote areas such as the desolate steppes and seacoasts with harsh environment, the relatively high cost of operation and maintenance of wind power systems is hindering its advancement. 1 Wind turbines, the most important parts of wind power generation system, are also the most vulnerable parts due to the complicated internal constitution, especially for the key transmission components such as the gearbox and bearings. However, in practice, online condition monitoring of wind power systems often relies on datasets contaminated with outliers. 2 In view of the large amount of expenses of maintenance when wind turbines break down, running condition monitoring and fault diagnosis under certain circumstances are of tremendous benefit for this issue.
In recent years, traditional periodical maintenance of mechanical equipment has gradually been transforming to condition-based maintenance (CBM). As the prerequisite for a reasonable maintenance strategy, the assessment methods of performance degradation received widespread attention and research interest. 3 Local mean decomposition (LMD) technology, 4 stacked multilevel-denoizing autoencoders (SMLDAEs) were all developed for wind turbine gearbox fault diagnosis. 5 And the recently proposed iterative atomic decomposition thresholding (IADT) method which can extract true constituent components of complex signals and suppress background noise interferences also succeeded in fault diagnosis of wind turbine. 6 As the statistical theory and artificial intelligence are becoming the most widely adopted methods for fault diagnosis and condition monitoring of mechanical equipment, 7 researchers in this field have achieved a wealth of results. A local mean decomposition method, 8 the multimodal deep support vector classification, 9 and the synchronously averaged electric motor signals 10 were applied for gearbox fault diagnosis. The method combining the hidden Markov model (HMM) with Mahalanobis distance successfully reflected the running state of rolling bearings. 11 In addition, a novel prediction method based on least squares support vector machine (LS-SVM) was presented to estimate the degradation trend of a slewing bearing with small sample data. 12 More recently, in a new area of machine learning, deep learning algorithms have successfully been applied in the related fields, too. For example, a novel hierarchical diagnosis network that can collect deep belief networks (DBNs) by layer was proposed for the hierarchical identification of mechanical systems. 13 However, although have seen great developments in both theoretical research and practical applications, statistical methods and artificial intelligence still require a large amount of data for training the mathematical models, 14 and the evaluation results are very sensitive to the parameters of the model. 15,16 The signal features extracted from mechanical equipment for performance evaluation are usually high-dimensional, nonlinear, information-redundant, and mutual coupling. 17 Manifold learning is a real low-dimensionality reduction method that can extract the real low-dimensional manifold structure of target data in the original observation space. 18 As one of the classical manifold learning algorithms, Laplacian Eigenmaps (LE) algorithm has been successfully applied to dimensionality reduction and data representation. 7 In contrast, the principal manifold describes the original dataset through constructing an embedded manifold structure (curve or surface) by using a geometric interpretation of principal component analysis (PCA) and approximating the curve or surface to the data distribution and curvature degree of the manifold. 19 The Hausdorff algorithm is generally used to evaluate the similarity among curves, which has been widely studied in the field of object matching and recognition. Also, an improved Hausdorff algorithm with properties of rotation and scale invariance has achieved successful image comparison compared with Mahalanobis distance and Euclidean distance. 20 In view of the deficiencies about model training and parameter selection of statistics and artificial intelligencebased evaluation techniques, this paper proposed a novel assessment method of analyzing the similarity of fuzzy kprincipal curves (FKPCs) in manifold space. The remainder of this paper is organized as follows: Section 2 describes the feature extraction in time domain, frequency domain, and wavelet packet node energies. In addition, using Laplacian Eigenmaps for manifold space projection is explained in detail. In Section 3, the novel FKPC method is proposed and Hausdorff distance is employed to assess the similarity of curves. In Section 4, the experiments comparing various accelerated bearing degradation data among methods like single feature, HMM, and DBN were conducted. Conclusions are presented in Section 5.

| Original feature extraction
Vibration signals whose amplitude and frequency change with the performance of bearings are becoming the most commonly used test parameters in bearing experiments. 21 Since it is difficult to extract feature sets that can comprehensively and effectively reflect bearing running state from a single signal analysis method, 22   Time-frequency domain features are generally used in the analysis of bearing vibration signals. Time-domain features, which are intuitive and intelligible, constitute the raw data of the bearing running state. 23 While frequency-domain features are usually used to describe variation in the frequency band in the signal spectrum and spectrum energy distribution. The time-frequency domain features from p 1 to p 24 adopted in this paper for further analysis are shown in Tables 1 and 2.
Wavelet packet transform (WPT) can make an even finer decomposition of the high-frequency portion of the signal without redundancy and oversight, based on decomposition of the low-frequency portion by a discrete wavelet transform. Thus, it can provide better local time-frequency analytic ability than a wavelet transform in a signal process containing middle and high-frequency information about rotating machinery. Wavelet packet node energy is one of the most widely applied features in mechanical fault analysis. The energy value of each node can be solved by Equation (1).
where E ij , m, x jm are the corresponding signal energy to the reconstructed signal of jth band in kth layer, the discrete node and the amplitude of the discrete node respectively.
The feature vector of the wavelet packet can be obtained through normalizing the characteristic parameters calculated by the following formula: where E = ∑ l k=1 E jk is the total energy of the whole signal that equals to the sum of the energy of each subband. After selection, this paper extracted 14 WPT original features for further research.

| Laplacian Eigenmaps feature space projection
The original high-dimensional feature matrix extracted in Section 2.1 is usually nonlinear, information-redundant and mutual coupling, leading to dimension disaster and over-fitting, which will increase the algorithm's space and time complexity. It is thus necessary to generate an essential representation of the original feature matrix by space conversion. The sample data of high-dimensional space (the D dimension) is actually in a lowdimensional manifold (the L dimension), of which the manifold structure is able to retain the geometrical characteristics of the original data. As a classical manifold learning theory, Laplacian Eigenmaps algorithm can obtain a low-dimensional representation of the dataset and can also maintain local features with robust performance when the sample has outliers. 24 Given a set X = [x 1, …, x M ] of M points in R D , the purpose of Laplacian eigenmaps is to find an optimal embedding Ω for the manifold. Supporting set Y = [y 1 , … , y M ] ∈R L is lowdimensional representation of X·Y = F(X), F represents the embedding map.
A good map F make the neighboring points in highdimensional space R D still close to each other in low-dimensional space R L through minimizing the objective function as Note that minimizing Equation (3) can be transformed into a generalized eigenvalue problem: where Γ is the Laplace Beltrami operator. Let f 0 , f 1 , …, f M be the solutions of Equation (4), ordered according to their ei- (4) ΓF = −div∇(F).

Features
Equations Features Equations

Features Equations Features Equations
s(a) is signal spectrum, a is spectral line. f a is the frequency of ath spectral line, A is the total number of spectral lines. a = 1, 2, … A. R e t r a c t e d map can be described as The main procedures of Laplacian Eigenmaps feature space conversion of rolling machine vibration signals is represented in Figure 1. The extracted time and frequency domain features and WPT features constitute the original high-dimensional (D dimension) feature matrices of the healthy sample and test sample, M D 1 and M D 2 (t) respectively. Then, M D 1 is transformed to the low-dimensional manifold space expressed as matrix M L 1 by Laplacian Eigenmaps, and meanwhile, the project map F is also obtained, which is then employed to calculate M L 2 (t), the low-dimensional representation of M D 2 (t).

| Extraction of fuzzy k-principal curves
The principal curve is a nonlinear generalization of the first principal component, which is a curve that passes through the center of the data distribution. 25 To calculate the principal curve from the dataset is to select a curve satisfying certain optimization objectives and constraints from a set of curves. The principal curve is essentially a one-dimensional manifold embedded in Euclidean space. 26 Setting the probability density of random variable Y = (Y 1 , Y 2 ,…, Yp) to g y (y), if the curve f(s) through the center of Y data distribution satisfies the following expression then f(s) is one of the principal curves of Y, where s y (y) = sup{s ||y − f(s)|| = inf||y − f(τ)||} is the projection value from data point Y to point s in curve f(s). The projection of the principal curve is shown in Figure 2.
The k-principal curve, also known as the principal curve with length constraint, stresses the length of the constraint curve to ensure good condition rather than abandoning the differentiable conditions of curves, 27 is the only existing proven principal curve. Based on literature, 28 this paper proposes a new method of extracting the principal curve using fuzzy k-means clustering. Fuzzy k-means is a classical dynamic clustering algorithm that constantly minimizes the square error and updates the membership degree of each data sample point to the cluster center, 29 it is, therefore, the cluster with the highest membership degree is actually the classification of the data sample.
For N samples with p variables after normal transformation and standardization, a fuzzy matrix of the initial m groups of each sample is given as where u ij ∈[0,1], ∑u ij = 1, u ij is the membership degree of sample i for class j. Let v j be the initial cluster center of class j where v j = [v j1 ,v j2 , … , v jp ], then determining an objective function J z (U,V)  This problem can be attributed to solve the extreme value of the objective function on the condition of ∑u ij = 1. When m > 1, x i ≠ v j , by using Lagrange multiplication, which can be proved as follows 30 The formation of k-principal curve is to minimize Equation (10) continuously, then, a 2-opt algorithm is used to optimize the Hamilton path formed by line segments until they are connected into a smooth curve. 31 where x denotes a point, s i the ith line segment, d(x,s i ) the distance from x to s i , and k the total number of line segments.
On account of the above illumination, the proposed method for extracting principal curves can be summarized as: (a) Using Fuzzy k-means clustering algorithm to divide the original data points into m classes; (b) Iterating continuously to minimize Equation (10) and obtaining k line segments; (c) Optimizing the Hamilton path to connect k line segments into a smooth curve.
The contrast effect of the first principal component and the k-principal curve is displayed in Figure 3, which shows that the principal component reflects the overall distribution of dataset, while k-principal curve is the nonlinear extension of the fitting and principal component with the ability of describing the trend of dataset to the greatest extent.

| Curve similarity evaluation
The judgment analysis of curve similarity is one of the foremost research contents of computer graphics and pattern recognition, of which the techniques mainly include the eigenvalue method and the similarity functions definitionbased method. The eigenvalue method applies machine learning to compare characteristic parameters, whereas the similarity functions definition-based method analyzes similarity through calculating the distance between two curves, and according to the previous researches the later is a little superior to the former.
As a widely adopted similarity measurement theory between two-point sets, the Hausdorff distance was successfully applied in object retrieval, 20 information processing 32 and some other fields. The Hausdorff distance is a nonlinear distance, which only calculates the degree of similarity (maximum distance) between two-point sets without establishing the corresponding relationship between the points. 33 For points sets A = {a 1 , a 2 , … , a p } and B = {b 1 , b 2 , … , b q } then the Hausdorff distance based on average distance value is Note that ||a − b|| is the Euclidean distance between points a and b.

| Machine performance assessment based on the present FKPC
In this work, the original high-dimensional feature matrix is transformed to the three-dimensional space with the help of Laplacian eigenmaps, then the fuzzy k-principal curve f 0 of the healthy samples as well as the f 1 (t) of the test samples are extracted, respectively, note that t is the running time of the machine. Next, the outcome of machine performance assessment can be calculated through resolving Dis(t) = H MHD ( f 0 , f 1 (t)), of which the algorithm procedures are presented in Table 3.   Step 1: Extracting the feature matrix, Healthy samples, Step 2: Space conversion based on Laplacian eigenmaps Calculate Equation (4) Step 3: fuzzy k-principal curve extraction Calculate the fuzzy k-principal curves of healthy samples f 0 and test samples f 1 (t) base on 3.1 Step 4: Similarity assessment,

R e t r a c t e d | 733
YUAN et Al.

| Experimental setup
To simulate the scene in which the wind turbine gradually fails day-by-day in practical applications as realistically as possible, the bearing run-to-failure experiment was implemented on the test devices shown in Figure 4. The experimental platform is composed of the rotating part, the degradation generation part (with a radial force applied on the tested bearing) and a measurement part. The rotating par includes the asynchronous motor with a gearbox and its two shafts: the first one is near to the motor and the second one is placed at the ride side of the incremental encoder. Components of the loading part are grouped in a unique and same aluminum plate, which supports a pneumatic jack, a vertical axis and its lever arm, a force sensor, a clamping ring of the test bearing, a support test bearing shaft, two pillow blocks and so on. To obtain the accurate vibration signal data, the test bearing was fixed with two high sensitivity miniature accelerometers positioned at 90° to each other, and the NI DAQ Card 6062E was also applied in the data acquisition system. The parameters of the bearings and operation conditions are listed in Table 4.
In order to avoid propagation of damages to the whole test bed (and for security reasons), experiments were stopped when the amplitude of the vibration signals (horizontal and vertical) overpassed 20 g. Figure 5 depicts an example of the bearing components before and after the test. When the experiment was finished, 985 individual ASCII format data files were got, of which each file consists of 2560 data points with the recording interval of 10 seconds. And the outer ring of selected bearing was found to be faulty after the test.

| Experimental results
After the data preprocessing of the signal data collected from the above experiment, four data files were found to be abnormal and removed, the rest 981 files were adopted for further analysis. For every 256 data points, time and frequency domain analysis and WPT were utilized to extract the 38 features in original feature space, here a 9810 × 38 array composed of original features was obtained. The eight representative original features that have been widely used for further research are displayed in Figure 6, in which all the waveforms of the eight features have an obvious mutation at time point 7000 second, however, only four of them (MF, WPT1, WPT5, and WPT6) show the slight abnormality at about time point 5300 seconds, while other four features (Skewness, Kurtosis, Crest, and SDF) seem unable to detect the early slight abnormality.
Then, the original feature matrix of the healthy bearing state was transformed into three-dimensional space with preserved projection map information by using Laplacian Eigenmaps, while the feature matrix of the test bearing was also transformed to three-dimensional space according to the projection map. Figure 7 shows the feature distributions after space conversion, where the bearing samples can be clearly distinguished.
The fuzzy k-principal curve is extracted from the feature sets after space conversion, and then the similarity between the principal curves of test samples and healthy samples is calculated by using Hausdorff during each time period. Figure 8 is a schematic diagram of the fuzzy k-principal curve of a bearing running for 5000 seconds, in which the red line and blue dots represent fuzzy k-principal curve and original dataset of the bearing running on 5000 seconds respectively.

R e t r a c t e d
The normalized trend chart for similarity assessment of the principal curve after space conversion is plotted as Figure 9, it can be discovered that no significant variance in the distance appears from 0 to 5100 seconds, which indicates that there is no fault in the early running period and the bearing is under normal condition. However, the distance begins to increase after 5100 seconds, indicating the appearance of subtle wear and a gradual deviation from the normal operating condition. After 7020 seconds, the distance suddenly changes and then falls back with a slight fluctuation, which suggests that the bearing fault deepened and the bearing was gradually polished. While after 9600 seconds, the rapid increase of the distance points out a serious bearing degradation as the bearing reached the failure state. Comparing with the actual experiment situations, it can be discovered that the assessment result of the proposed FKPC method in this study R e t r a c t e d did be consistent with the real operation status of the bearings, and is able to portray the degradation trend of bearing performance accurately.

| Comparisons with different feature conversion methods
The Laplacian Eigenmaps method adopted in this study is a nonlinear local feature space conversion method, in order to make comparative analysis and highlight its effectiveness for the work, several contrast experiments with linear space conversion method PCA and global nonlinear space conversion method Isomap that are proverbially applied for feature transformation were carried out. The feature distributions after dimension reduction of PCA and Isomap are displayed in Figure 10, compared with the result of Laplacian Eigenmaps in Figure 7 one can find that it is more obviously to distinguish the bearing samples by using Laplacian feature mapping and PCA, while Isomap is not able to distinguish the samples effectively.
The k-principal curves of the PCA and Isomap features were extracted after feature conversion, and then the principal curve similarity analysis was also carried out through Hausdorff theory during each time period, of which the results is shown in Figure 11. It can be learned that although the methods of Laplacian and PCA can both portray the general degradation trend of bearing performance apparently, the Laplacian features are more stable before 5100 seconds and can reflects the distance variation after 7020 seconds more obviously. While the distance fluctuation without obvious tendency indicates that the Isomap method is not capable for describing the trend of bearing degradation, which may be caused by the fact that Isomap is suitable for the low-dimensional manifolds whose interior is flat but not appropriate to the manifold with larger internal curvature.

| Comparisons with different distance analysis methods
In addition to Hausdorff distance, Euclidean distance and Mahalanobis distance are also commonly employed in the analysis of curve similarity. Next, the comparison experiments adopting these two kinds of methods to evaluate the similarity of the k-principal curves between test bearing and healthy bearing were carried out. Figure 12A,B are the normalized assessment curves of Euclidean distance and Mahalanobis distance, respectively, from which it can be learned that both Euclidean distance and Mahalanobis distance is able to evaluate the performance degradation of the machine, especially

R e t r a c t e d
for the serious faults. However, they only detected the early faults occurred near 6100 and 6400 seconds, respectively, which indicated that both of them are not as sensitive to the early machine failure as Hausdorff distance who successfully detected the early minor faults at about 5100 seconds.

| Comparisons with HMM and DBN
In the existing research achievements, the HMM and DBN algorithms are successfully and widely employed in pattern recognition, data processing, machine state assessment, and some other fields. Typically, HMM uses the normal feature sets to train the model and then uses the test feature sets and the welltrained model to calculate the probability value, during which the greater the deviation away from the healthy state is, the smaller the probability value becomes. While as a probabilistic generative model composed of a series of restricted Boltzmann machines (RBMs), DBN is commonly utilized in combination with a back propagation neural network (BPNN) in the field of mechanical fault analysis. To highlight the advantages of the proposed FKPC method in this paper, the similar comparative experiments with the above two artificial intelligence models were conducted in the following study.
In the comparison experiments, the feature space conversion method remained unchanged as Laplacian Eigenmaps, while the evaluation models were replaced by HMM and DBN, respectively, of which the results after normalization are represented in Figure 13. The inconspicuous tendency of performance degradation and the slight abnormality, which did not appear until 5900 seconds in Figure 13A, reflected the fact that HMM is not sensitive to early machine failure. Furthermore, the curve did not feature the obvious mutational rising fluctuation as seen in Figure 9 at around 7000 seconds either, so it can be considered that HMM cannot accurately describe the running state of a machine from fault deepening to total failure. As described in Figure 13B, DBN can effectively detect the serious degradation such as crackle, fatigue spalling etc. occurred at 7000 seconds, but it failed to detect the early slight faults of wear, pitting, or overheat began in the vicinity of 5100 seconds, and the undulate forepart as well as the disorganized end of the assessment curve also further indicated that DBN is not competent enough for this task. Additionally, the complicated parameters such as the number of hidden states and Gaussian mixture functions of HMM and the number of hidden layers and neurons of DBN, and the erratic training processes of these two models would make the applications of them more difficult.
In order to carry out a more accurate quantitative analysis of the assessment effectiveness of the above three comparative algorithms, the time points when the relative change rate rises to 2.0%, 5.0% and 30.0% of each result curve were calculated as displayed in Table 5. The relative change rates were adopted to represent the early slight  fault, normal fault, and serious fault respectively. It can be learned from the quantitative analysis table that the present FKPC method is the quickest one to achieve the change rate at 2.0%, 5.0%, and 30.0%, while the DBN achieved a little worse and HMM performed worst, which is pretty in line with the analysis results of the trend curves. The above comparison analysis further indicated that the proposed FKPC method in this paper can portray the degradation trend of wind turbine bearing performance more accurately and possesses greater sensitiveness to early slight faults than the HMM and DBN.

| CONCLUSIONS
The condition monitoring and performance assessment of wind turbine has important pragmatic significance. This work reports a novel assessment algorithm of wind turbine bearing performance degradation based on fuzzy k-principal curve similarity in manifold space. First, three popular approaches including time and frequency domain analysis as well as WPT were applied to extract 38 features of the vibration signals collected from machines in the original high dimension space. Next, the nonlinear local algorithm Laplacian eigenmaps was introduced to transform the original features to the projected lower dimensional space and obtain the more typical parameters. Then, the FKPCs of the feature sets after space conversion were extracted. Finally, the Hausdorff distance theory was applied for calculating the similarity between the curves of the test samples and healthy samples to analyze the degradation tendency of the wind turbine bearing performance. In order to validate the effectiveness of the proposed algorithm, the bearing runto-failure experiment was carried out. A series of comprehensive and persuasive comparisons and experiments with different feature conversion methods (PCA and Isomap), different distance analysis algorithms (Euclidean distance and Mahalanobis distance) and different assessment models (HMM and DBN) verified that the proposed FKPC technique can monitor the operating status of the machine more accurately without complex model training and parameter selection processes.