Wind turbine condition monitoring based on double‐layer ensemble KNN method

An ensemble K‐nearest neighbor model based on the double‐layer sampling method was proposed for the condition monitoring of wind turbine (WT) gearboxes. The distance function was improved to affinity distance, which helped overcome the limitation of obtaining local optimums. The feature priority was calculated based on the regularized mutual information, and a double‐layer sampling method combining data and feature sampling was designed. On the basis of the statistical process control technology, the warning threshold was designed and the real‐time residual was analyzed. The condition of the gearbox was monitored by combining the failure rate. Experimental results with real supervisory control and data acquisition (SCADA) data demonstrated that the proposed method improved the estimation accuracy of the model and could realize early fault warnings of a WT gearbox approximately 20 days earlier than the SCADA system.


| INTRODUCTION
Owing to the increase in global warming, China has committed to minimize carbon dioxide emissions by 2030. 1 The rapid development of renewable energy power generation technology and reducing the use of fossil energy are important measures in achieving the goal of "carbon neutrality."China has an abundance of wind energy resources, which significantly contributes to the research of wind power generation technology. 2 To obtain wind power resources, wind turbines (WTs) are often set in offshore or mountainous areas.The harsh natural environment may cause WT failure, especially the rotating mechanical parts, such as the gearbox, and the shutdown of WT significantly increases maintenance expenses. 3Therefore, the real-time condition monitoring (CM) of WTs is of great research significance.
The main signal sources of existing CM methods can be divided into the following categories: vibration signal, acoustic emission, lubrication oil parameter, strain, electrical signal, and Supervisory Control and Data Acquisition (SCADA) data. 4Collecting the signals listed above requires additional professional sensors, which incurs additional expenses.Because almost all WTs are equipped with a SCADA system, and SCADA systems record the operation and external data of WT components, such as the gearbox oil temperature, generator speed, impeller speed, and wind speed, these data robustly support subsequent analysis. 5Therefore, wind turbine condition monitoring (WTCM) based on SCADA data has attracted extensive attention.
The basic idea of CM based on SCADA data is to establish a normal behavior model (NBM) through a data-driven method to monitor the operating parameters, and subsequently input the real-time data (observed value) into the model to obtain the model output (estimated value).If the residual between the observed value and the estimated value exceeds the set threshold, the equipment may fail. 6ata-driven methods are the core of the NBM, and scholars have studied this in-depth.According to whether the model has a clear objective function, datadriven methods can be divided into parametric modeling methods and nonparametric modeling methods.However, the gearbox of WTs is a complex system, and it is challenging to obtain a universally applicable function model using parameter-based methods to model SCADA data in batches. 7Nonparametric methods do not require the establishment of general inference formulas based on samples.Instead, they directly utilize stored data to address classification or regression problems, demonstrating the potential for flexible handling of massive amounts of data. 8onparametric modeling methods include multivariate state estimation technique, 9,10 Auto-Associative Kernel Regression, 11 and K-nearest neighbor (KNN). 12KNN is a classical nonparametric method often used to solve classification and regression problems. 13,14As mentioned above, KNN does not rely on the pretrained model and is more suitable to help solve the CM problem of WT with complex working conditions. 15Currently, the utilization of the KNNs algorithm in WTCM is limited.Zhang et al. 16 proposed integrating ensemble learning (EL) with the KNN regression algorithm, leading to approximately a 30% improvement in computational efficiency without compromising accuracy.Bao et al. 17 introduced a WTCM approach that combines active learning with the KNN algorithm.This method leverages active learning algorithms to select high-quality samples and constructs a training set, resulting in a significant enhancement in the accuracy of the KNN model.Although the application of ensemble KNN (EKNN) in WTCM has been studied, the research in this area lacks sufficient depth.
EL is a strategy used to improve the generalization accuracy of the model by combining multiple algorithm modules, in which the algorithm module is called the base learner. 18Wang et al. 19 proposed a multiple-kernel EL method to predict software defects by combining EL and multiple-kernel learning based on the characteristics of the metrics mined from the software.Feng et al. 20 combined EL with undersampling to obtain higherquality balanced sets for each base classifier, which handles imbalanced classifications.Chen et al. 21overcame the disadvantage that the base learner of EL requires to sample all the samples through edge calculation and improved the operation efficiency of the ensemble model.
In this study, an improved EKNN (IEKNN) model is proposed for WTCM.Aiming to overcome the limitation that the distance calculation formula of the conventional KNN (CKNN) algorithm does not consider the feature weight, the affinity distance considering the feature weight is introduced as the distance calculation function.Considering that the KNN algorithm as a base learner is not sensitive to the construction of a base training set using the bootstrap sampling method, a double-layer sampling method based on feature priority is designed.Aiming at solving the problem that early deterioration is not easy to detect, a sliding window method is designed to calculate the fault rate of the WTs to identify faults early.The SCADA data of a field WT in the Hebei province were used to verify the proposed method.The experimental results demonstrate that the proposed CM method can be used for early fault diagnosis of WTs before they are shut down.
The rest of this paper is organized as follows.Section 2 introduces the definition of the improved KNN algorithm and feature priority based on regularized mutual information (RMI).Section 3 presents the double-layer sampling method and the proposed SCADA data-based CM method in detail.Section 4 reports the validation results of the proposed CM method for the anomaly detection of field WTs using SCADA data.Finally, Section 5 concludes the paper.

| KNN regression algorithm
The KNN regression algorithm selects K training samples that are most similar to the testing sample in terms of their similarities in space and then calculates the regression value of the testing sample.
For the training set X x x x = ( ; ; …; ) and a testing sample x = t x x x y ( , , …, , ) , the steps of KNN regression algorithm are as follows 16 : (1) Calculate the Euclidean distances between the testing and training samples: where x x d ( , ) i t is the Euclidean distance between x i and x t .
(2) Find the K training samples X x x x = ( , , …, ) that are most similar to x t .(3) Calculate the regression value of x t : In addition to distance measurement, the selection of K is also a crucial factor that determines the performance of the KNN algorithm. 22Currently, there is no advanced theory or method to definitively determine the optimal value of K.Because this study is not focused on optimizing the value of K, a trialand-error approach is employed to select the most suitable value of K.

| Affinity distance
The KNN algorithm commonly uses the Euclidean distance to measure the similarity between samples.However, the Euclidean distance does not address well the issues of special data distributions and is sensitive to outliers; moreover, it may lead to local optima by not completely using all the internal information of the sample set.Conversely, the affinity distance 23 takes into account the influence of samples on the sample set, and it can be used to adjust the unfavorable biases of traditional distance measurements.The formula for calculating the affinity distance between the test sample and the training samples is as follows: where is the sum of the distance between the testing sample X t and all training samples on the jth feature, is the sum of the distance between the training sample X i on the jth feature and other training samples.

| Feature priority ranking method based on RMI
Feature reduction is a crucial step in improving model performance.Traditional filter-based feature selection methods mostly consider the correlation between input and output features, without considering the redundancy between input features.In this study, we propose a method based on RMI to calculate the selection priority of features used to guide the order of feature selection.
RMI 24 is a form of normalized mutual information, and its value range is [0,1].For any two features v i and v j , their RMI is defined as where I v v ( ; ) i j is the mutual information between v i and v j , H v ( ) i and H v ( ) j are the information entropy of v i and v j .
For an input feature set containing m features and the output feature v y , the calculation of feature priority is described as follows: Algorithm 1. Pseudo code of feature priority calculation.
, the output feature set v y Output: The input feature set arranged in descending order of priority S 1 1.

3.
Calculate R v v ( ; ) i y according to Equation (5) 4. End 5. Note the input feature with the highest RMI as v′ 1 and add it to S 1 , then eliminate it from S.
%q is the number of features in S 1 %J v ( ) j is the selection priority of v j 9. End 10.Designate the feature with the highest J as v′ q+1 and incorporate it into S 1 , while excluding this feature from S. 11.End

| CM BASED ON IEKNN MODEL
This section introduces the conventional EKNN model, the proposed IEKNN model, and the framework of the IEKNN-based CM method.

| Conventional EKNN model
EL mainly has two structures: boosting and bagging.Boosting is a serial structure predominantly used to deal with binary classification problems.Multiple base learners are connected step by step, and there is a strong dependency between them.Bagging is a parallel structure, and the base learners do not affect each other.It can be used to deal with multiple classifications, regression, and other problems.The complexity of training a bagging model is the same order of magnitude as that of training a single base learner, so it has high practicability. 20igure 1 shows the framework of the bagging ensemble algorithm based on KNN and provides an indepth description of the conventional EKNN model.
Bagging is based on the random sampling method, that is, bootstrap sampling.For the training set containing m samples, one sample is randomly selected each time and put back, and the sampling process is repeated m times to form a base training set.N operations were conducted according to the above step to form N base training sets containing m training samples, and then, N base learners were trained based on each base training set.Finally, these base learners were combined to obtain an ensemble model.For the combination of base-learner devices, the voting method is often used to solve classification problems, and the average method is often used to solve regression problems.
KNN model, neural network, decision tree, and other models can be used as the base learner of the conventional EL model.However, experiments show that the EL model using the KNN model as the base learner shows no distinct improvement in generalization ability by using the bootstrap sampling method to conduct the base training set.According to the literature, KNN is a stable learner.Compared with unstable learners such as neural networks and decision trees, KNN is insensitive to the base training set generated by bootstrap sampling, so the advantage of EL in generalization accuracy cannot be reflected.

| IEKNN method based on double-layer sampling
As mentioned above, the generalization accuracy of the conventional EKNN model is not as good as expected because the KNN algorithm, as a stable learner, is not sensitive to the base training set formed by bootstrap sampling.The key to improving the EL model is selecting base learners with a high generalization accuracy and increasing the difference between the base learners.Because the KNN algorithm as a base learner can meet the requirements of high generalization accuracy, the key to improving the generalization accuracy of the EKNN model is increasing the difference between the base training sets.The method of raising the difference between the base training sets can be considered from the following aspects: (1) input different training samples, (2) input different features, and (3) adopt different input methods.Considering both input samples and input features, in this study, a double-layer sampling method combining bootstrap sampling and feature sampling based on feature priority is designed to construct the base training set (Figure 2). 1.

4.
Extract the corresponding features to form a training set

5.
for j = 1 → j n = %Second layer: sample sampling It can be seen from the pseudocode that the first layer sampling of the double-layer sampling method is to sample the features regularly based on the feature priority.Unlike the case of randomly extracting features to construct models, regular feature sampling can increase the difference between input features and increase the generalization accuracy of each base learner.Simultaneously, the generalization accuracy of the base learner is always maintained within a certain range, and the generalization accuracy of the EL model after combining the base learners is also maintained within a stable range.The sample sampling method based on bootstrap sampling in the second layer increases the difference between the input samples of the base learners and further increases the difference between the base learners on the basis of the sampling in the first layer.In conclusion, compared with the EKNN solely using bootstrap sampling, the generalization accuracy of this sampling method is further improved, and the accuracy of the EL model is retained in a relatively stable range.

| The framework of CM based on IEKNN method
The basic idea of CM based on data-driven method is to establish a model of monitored features using historical normal data, and the diagram is shown in Figures 3 and 4.
The input variables of the model can encompass external factors such as environmental temperature and wind speed, as well as state variables including active power and generator-bearing temperature.The output variable represents a metric that reflects the operational state of a WT, such as the gearbox-bearing temperature.Subsequently, the residual is computed by comparing the observed value of the output variable with the model's predicted value.If the residual between the two values is large, the monitored feature may be abnormal.Because the residual sequence is a continuous curve, we generally use the method of setting the warning threshold to convert the continuous variable into a binary variable to achieve a real-time alarm.The CM method can be divided into three parts: data preprocessing, offline model establishing, and online monitoring.
Data preprocessing: First, the raw data collected from the SCADA system were preprocessed, including eliminating abnormal and missing data, selecting features for modeling, and normalizing the retained data, to obtain a high-quality original training set.
Offline model establishment: According to the proposed RMI-based feature priority calculation method, the features of the original training set were sorted, and then, several base training sets were obtained using the doublelayer sampling method based on the feature priority.Finally, the optimal parameters of the model were selected through the experimental results.
Online monitoring: First, the real-time SCADA data were preprocessed and input into the established IEKNN model.After obtaining the model output, the residual sequence between it and the observed value was calculated.According to the residual sequence of the verification set, the early warning threshold was set and the residual curve was analyzed.Finally, the sliding window method was used to define an equipment failure rate index in units of days to realize WTCM.

| CASE ANALYSIS
This section considers the data collected by the SCADA system of a field WT as an example to demonstrate the effectiveness of the method proposed in this study and obtain the CM results of the WT.Note that the experiments shown in this paper are offline experiments based on MATLAB.

| Data description
The research object of this paper is a WT with a rated power of 1.5 MW located in Hebei Province, China.Its basic parameters are as follows: the cut-in wind speed is 3 m/s, the cut-out wind speed is 25 m/s, the rated wind speed is 12 m/s, and the sampling interval is 5 min.According to the alarm record of the SCADA system, at 8:31 on November 17, 2017, WT had a fault of "gearbox oil temp higher than the upper limit."The SCADA data from 2017/09/07 0:00 to 2017/11/17 8:31 is exported from the SCADA system as experimental data.

| Data preprocessing
The first step of data preprocessing was to eliminate the missing and abnormal data.The abnormal data includes the following categories: attribute missing, active power ≤0, wind speed ≤ the cut-in wind speed, wind speed ≥ cut-off wind speed, and WT status code marked as "fault." After the SCADA data was denoised via the Laida criterion, 25 12,000 samples were available, which were divided into three parts: training, verification, and testing sets.Samples 1-6000 were divided into training sets for establishing an IEKNN model; samples 6001-7000 were used as validation sets to determine the parameters and early warning thresholds of the EKNN model; samples 7001-12,000 were considered the testing set to verify the validity of the model proposed in this study.
Because the WT used in the experiment has the fault that the gearbox oil temp was higher than the upper limit, the gearbox oil temp (v 7 /°C) was used as the output feature.There were as many as dozens of status features in the SCADA system; however, to avoid the occurrence of a "dimensional disaster," according to the modeling experience, the active power, generator speed, ambient temp, wind speed, main shaft speed, gearbox drivebearing temp, and gearbox nondrive-bearing temp were selected as the modeling features.Among them, the Pearson coefficient of the generator speed and the main shaft speed was 0.99, which exhibited a strong correlation.If they were both used for modeling, it may cause feature redundancy; thus, the generator speed was reserved to contribute to the modeling.To summarize, the auxiliary features required for modeling were the active power (v 1 , kW), generator speed (v 2 , rpm), ambient temp (v 3 , °C), wind speed (v 4 , m/s), gearbox drive-bearing temp (v 5 , °C), and gearbox nondrive-bearing temp (v 6 , °C) (Figure 5).
The training, verification, and testing sets were normalized, and the formula is as follows: where x is the raw data, x min is the minimum of data, and x max is the maximum of data.

| Selection of K
First, we proceed with the selection of the value of K.As the focus of this paper is not on the determination of K, we compared several commonly employed values of K, namely, 10, 20, 40, and where N is the number of the training samples.At this stage, the KNN model being used was a CKNN model with an affinity distance metric, and rootmean-square error (RMSE), mean absolute error (MAE), and R_square (r 2 ) were used as the evaluation index where y i is the observed value of the output feature, y ˆi is the output value, and y ¯is the mean value of the observed values (Table 1) Through the experimental method of trial and error, it was determined that as the value of K increases, the performance of the model reaches its optimal state at N [ ].Therefore, in this study, we set the value of K to N [ ].

| The determination of the number of base learners
This section predominantly describes the selection of relevant parameters of the IEKNN model according to the estimation accuracy of the validation set.The sorted feature set S, according to the method outlined in Section 2.2, is as follows: , , , , ).In the step of determining the EKNN model parameters, the proposed sampling method based on feature priority was used to generate the base training sets.Assuming that the number of base learners was from 1 to 10, owing to the randomness of the sampling method, the experiment was repeated 200 times for each EKNN model experiment on the validation set.
According to Figure 6, the trend of the interquartile range (IQR) in the box plots changes with an increasing number of base learners.Initially, the IQR tightens at a higher rate, and later, it gradually fluctuates within a specific range, approaching a stable state.overall distribution of the RMSE for each EKNN model.Specifically, when the number of base learners is ≤5, the IQR is larger.However, when the number exceeds 5, the IQR reduces, indicating that the stability of the EKNN model improves with a larger number of base learners.
In practical applications, the determination of the number of base learners should consider both computational efficiency and model accuracy.When the number of base learners was 9, the IQR was at its lowest, indicating optimal model performance.However, compared with using six base learners, the computation time increased by approximately 33%, whereas the IQR of the model only increased by 7%.
The average RMSE value of the 200 experiments on a model comprising six base learners was 0.0286 and the average RMSE value of the 200 experiments on a model comprising nine base learners was 0.0285, which decreased by 3%.Therefore, the number of base learners in this study was set to 6.

| Controlled experiment
This section conducts horizontal and vertical analyses of the proposed EKNN model.Horizontal comparison refers to a comparison of CKNN and a few other modeling methods, such as longitudinal analysis refers to the comparison of several EKNN models with different sampling methods.All the above experiments were conducted on the validation set.

| Horizontal analysis
The models used for comparison are (1) conventional KNN (CKNN1) model with the Euclidean distance; (2) conventional KNN (CKNN2) model with affinity distance; (3) Gaussian process regression model, the kernel function of which is a square exponential function; (4) least-squares support vector machine, the kernel function of which is Gaussian function; (5)  proposed EKNN model.The realization of the parameter method relies on the regression linear toolbox of MATLAB.RMSE, MAE, and r 2 were selected as the evaluation basis (Table 3).
(1) For CKNN2, the RMSE decreased by 9.8%, MAE decreased by 2.8%, and r 2 increased by 3.4%, compared with CKNN1, indicating that the accuracy and the fitting degree of the KNN model were improved by using affinity distance.(2) Compared with the above four models, RMSE and MAE of the EKNN model are the lowest, and  4) proposed bootstrap and priority combined double-layer sampling (KNN4).Each model was tested 200 times, and the number of base learners was 6.
On the basis of the IQR values of the various KNN models, KNN1 had the lowest IQR, indicating superior stability.However, its model performance was significantly poorer compared with those of the other three models, suggesting that using solely bootstrap sampling did not significantly improve the performance of the KNN model when the number of base learners was the same (Figure 7 and Table 4).
KNN2 exhibits a decrease in Q3 and Q1 compared with KNN1, but its Q3, Q1, and IQR are higher than those of KNN3 and KNN4.KNN3 and KNN4 have similar values for Q2, but KNN4 has a smaller IQR than KNN3, indicating superior stability.This is particularly important in practical applications.Therefore, the EKNN model designed in this study is more suitable for WTCM. Figure 8B shows the residual between the observed and estimated values.
However, the curve of the residual was a continuous quantity.Although the trend change was observed, it was impossible to determine the specific abnormal occurrence time by the continuous quantity alone.Therefore, statistical process control (SPC) technology was introduced to set the threshold.

| Threshold setting and CM of testing set
SPC 26,27 is a tool used to count and control the production process using statistics.By setting the control threshold, it can timely find the signs of systematic factors according to the feedback information, and take measures to eliminate their influence, so the process can be maintained in a controlled state only affected by random factors.SPC believes that when the process is only affected by random factors, the process is in a controlled state; when there are system factors, the process is out of control.Owing to the statistical regularity of process fluctuations, when the process is controlled, the process characteristics generally follow the stable random distribution, and when the process is out of control, the process distribution changes.Because this study focused on the fault manifestations of gearbox bearing or elevated oil temperature, this study only considered setting an upper threshold value.The specific formula for this threshold was as follows.Assuming that the quality characteristic X follows the normal distribution with the mean value μ and the standard deviation σ, the probability of the quality characteristic X in  μ σ (− , + 2.326 ] is about 0.99, and the formula is as follows: In practical application, the sample mean X ¯and standard deviation S are used to replace the mean and variance of normal distribution.The threshold calculation formula is The mean X ¯of the residuals of the verification set is −0.0034, the standard deviation S is 0.0271, and the threshold T is 0.0596.If the residuals of the sample exceeded the threshold, it was considered that the sample may be abnormal, and an alarm was triggered.However, owing to the poor working environment of the WT, sensor failure, and other problems, sporadic false alarm points may occur. 28herefore, to avoid misjudgment, the case where the residual continuously exceeds the threshold value for a long time was abnormal was generally considered, that is, multiple alarms were triggered continuously.
As shown in Figure 9A, for the first 1000 samples, the fitting degree of the observed and estimated values is high, and no obvious difference is observed.After about the 1000th sample, the observed value suddenly increases sharply and significantly fluctuates, indicating that there may be abnormalities at this time, but it cannot be determined.The green straight line shown in Figure 9B represents the threshold, the blue straight line represents the residual of each sample, and the red rectangle represents the triggered alarm condition.Its width can indicate the continuous alarm duration, and the wider it is, the longer the alarm time is.According to Figure 9B, there is no long-term alarm exceeding the threshold at the first 1000 samples, but after approximately 1000 samples, there are several long-term alarms.At this time, the residual significantly exceeds the threshold.With time delay, the single continuous alarm time continues to increase.It can be determined that the gearbox is operating abnormally at this time.

| Failure rate analysis
Figure 9 helps determine whether the WT is abnormal by examining the condition that each sample exceeds the threshold, and converts the continuous quantity into binary quantity, thereby realizing fault early warning.However, the discrete alarm signal cannot reflect the current fault degree and the overall change trend in the WT.To solve this problem, a failure rate calculation method based on the day is designed.
Let the length of the sliding window N be 576, that is, each window contains data for two consecutive days and advances by one step each time, then the failure rate of each window can be defined as where M is the number of samples whose residual exceeds the threshold (Figure 10).According to the failure rate curve, the deterioration process of the WT can be roughly divided into three stages: At approximately the first 400 sliding windows, the failure rate remained stable at a low level, indicating that the WT was operating normally at this time.At about the 400-2250th sliding window, the failure rate rose rapidly to about 60% first, and then fluctuated between 40% and 60%, indicating that the WT was in the early failure stage.After about the 2250th window, the failure rate rose to approximately 80% and did not decrease, always maintaining a high level, indicating that it had entered the stage of serious failure.After consulting the relevant information, the 400th sliding window was 2017/10/27, denoting that the proposed method alarms about 20 days earlier than the SCADA system.

| CONCLUSIONS
In this study, an EKNN CM method with an improved distance function and sampling method was proposed.Moreover, the SCADA data of a real WT was used as an experimental object.The experimental results demonstrated the following: (1) As a distance function, the affinity distance could overcome the limitation that the Euclidean distance easily fell into the local optimum and improved the accuracy of the KNN model.(2) The RMI-based feature sampling method to calculate the feature priority effectively solved the problem that the KNN model is insensitive to Bootstrap sampling.Among several EKNN models with different sampling methods, the EKNN model proposed in this paper exhibited the best performance and highest generalization accuracy.(3) Through case analysis, it was observed that the EKNN CM method proposed in this paper can help realize early warning for WT failure, and the data reported in this study demonstrate that the alarm occurs approximately 20 days earlier than that of the SCADA system.(4) The method proposed in this paper is universal and can be used as a reference for the failure warning of important subcomponents of a WT, and can be used to solve similar problems, such as WT generator CM.
However, the proposed method has certain limitations: (1) This method specifically targets gradual anomalies, allowing for the early prediction of a fault, and providing guidance for predictive maintenance.However, using this method, several types of gearbox faults cannot be detected, such as sudden impact overload or tooth surface fatigue.(2) The current method involves batch modeling of historical data.For objects with long operating times and varying conditions, it is essential to consider using real-time learning algorithms to periodically update the model and adapt to the latest operating conditions.This will be our next research objective.

F
I G U R E 1 EKNN model based on bootstrap sampling.EKNN, ensemble KNN; KNN, K-nearest neighbor.

Algorithm 2 .
Pseudocode of double-layer sampling method.Input: The original training set X m n × , the number of base learners N and sorted feature set v v v v = ( , , …, ) m 1 2 based on priority.Output: Base training set X X X , ,…, N 1 2

F
I G U R E 2 EKNN model based on double-layer sampling.EKNN, ensemble KNN; KNN, K-nearest neighbor.F I G U R E 3 The diagram of the condition monitoring method.IV, input variable; OV, output variable.F I G U R E 4 Flowchart of EKNN model based on double-layer sampling.EKNN, ensemble KNN; KNN, K-nearest neighbor; SCADA, supervisory control and data acquisition.

4. 5 |
Online monitoring4.5.1 | Online CM of verification setWhen the WT was under normal operation without faults, the gearbox oil temp should fluctuate within a normal range.The observed and estimated values of the NBM have a high fit, and the residual is small and does not exceed a certain set threshold.If the gearbox oil temp rises abnormally beyond the controllable range, the estimated value of the NBM deviates greatly from the observed value, and their residual rises significantly.Figure8Ashows the curves of the observed values of the verification set and the estimated values of the model, which can reflect the relationship between the estimated and observed values under the normal operation state.

8
Curves of observed and estimated values of the validation set.(A) Curves of observed and estimated values of the validation set and (B) curves of the residual observed value and estimated value.

9
Condition monitoring of the testing set.(A) Curves of observed and estimated values of the testing set and (B) residual of testing set and alarm condition.
Table 2 lists the values of Q 3 , Q 1 , and IQR for each box, reflecting the Model performance with different Ks.
F I G U R E 5 Some typical supervisory control and data acquisition data before gearbox failure.T A B L E 1 Box diagram of root-mean-square error (RMSE) with different numbers of base learners (Q 1 , the lower quartile; Q 2 , the median; Q 3 , the upper quartile; IQR Q Interquartile range (IQR) for different numbers of learners.
F G U R E 6 T A B L E 3 Result of horizontal analysis.