Noise‐robust gas path fault detection and isolation for a power generation gas turbine based on deep residual compensation extreme learning machine

One of the major challenges facing fault diagnosis tools is their exposure to noise. The presence of noise may cause false alarms or the inability to detect a progressive fault in the early stages of its occurrence. Continuing previous efforts to address such a problem, in this paper, a noise‐robust diagnosis system for an industrial gas turbine is presented. The proposed structure employs a set of deep residual compensation extreme learning machines (DRCELMs). In this model, an optimal number of compensating blocks are trained to recover some of the lost useful information in the face of noise. Training and testing data required to develop the fault diagnosis model are generated by a performance model of the studied gas turbine. The t‐distributed stochastic neighbor embedding algorithm is employed for visualizing the gas path faults. Furthermore, the performance of the DRCELM is evaluated by comparing it with six other diagnosis models. The results indicate higher robustness of the DRCELM compared to other fault diagnosis systems. The proposed model presents a classification accuracy of >97% in noisy data and an accuracy of >98% in noise‐free data and combined data, while the average of fault positive rate and fault negative rate in noisy data is less than 2.5%.

examining the relationship between these deviations, gas path faults can be detected, isolated, and identified. 3ommonly, the efficiency and flow capacity of the gas path components are considered as health index.These parameters are inherently unmeasurable and therefore need to be derived indirectly from measurable parameters.In this scenario, the process of identifying the aforementioned health parameters from the measurement parameters relies on the utilization of system identification techniques.This identification process is executed through two distinct approaches: physics-based models that leverage the governing physical laws of the problem, and data-driven models.Several physics-based methodologies have been suggested, encompassing techniques such as Kalman filters, 4,5 weighted least squares, and influence coefficient matrix. 6In a recent study, Li et al. presented a physics-based method for diagnosing gas turbine faults within a power plant, considering the variable geometry of the compressor and operating under transient conditions. 7The data-based method has garnered significant attention due to its inherent advantage of not requiring an explicit mathematical model of the turbine.In this regard, methods such as fuzzy expert systems, 8 genetic algorithms, 9 and a variety of neural network and deep neural network-based methods have been used for performance monitoring and availability improvement in gas turbines. 10,11In this regard, Yazdani and Montazeri proposed the use of type-2 fuzzy logic to detect, isolate, and identify gas turbine faults. 12Given the propensity for type-2 fuzzy logic to exhibit superior performance in tackling intricate nonlinear problems characterized by significant data pattern overlaps, Montazeri and Yazdani have adopted this approach for the identification of gas path faults in industrial gas turbines.The employed method demonstrated commendable performance when confronted with uncertainties that fall beyond the range of the training data.
Artificial neural networks have emerged as a prevailing technology in data-driven approaches, notably utilized for fault detection and isolation across a broad range of equipment and devices.These approaches have gained substantial traction due to their exceptional capabilities.Recent applications include predicting the remaining useful life of drilling pumps, 13 identifying sensor anomalies in internal combustion engines, 14 enabling fault detection and isolation of water reactors, 15 and vessels, 16 and even facilitating sensor fault diagnosis of autonomous vehicles. 17In the context of equipment like gas turbines, which exhibit nonlinear behavior, these machine learning tools offer the capability to accurately simulate the inherent nonlinear nature of the data. 18In this context, Fentaye et al. recently estimated the magnitude of gas path faults by using a multilayer perceptron. 19To achieve a higher success rate (SR) in fault classification, Togni et al. recently used an artificial neural network in combination with a fuzzy system and the Kalman filter method to detect faults in a two-shaft engine.The system had an SR of about 95% for the nominal noise level. 20Extreme learning machines (ELMs) stand as a prominent method derived from artificial neural networks.By eliminating the weight learning process in the middle layer, ELM offers a significantly faster training process compared to backpropagation-based methods.ELM-based approaches have also shown good accuracy in many applications and provide proper generalization. 21Unlike conventional backpropagation networks, these networks are not susceptible to the issue of getting trapped in local minima.The problem of local minima poses a weakness for the conventional backpropagation networks during the training process.Additionally, unlike support vector machines (SVMs), the hyperparameters of these networks do not require meticulous fine-tuning.Recently, an example of this type of fast network was adopted by Zhao and Chen in a transfer learning approach to solve the problem of data deficiency in the fault diagnosis of an aero engine. 22Also, Montazeri-Gh and Nekoonam applied a bank of online sequential ELMs (OSELM) to develop a gas path fault diagnosis system for a power generation gas turbine.This system could adaptively improve its performance when a change in loading conditions occurs. 23he development of fault diagnosis tools faces a fundamental challenge related to their inadequate performance in the presence of noise and outliers.In instances where the system fails to robustly handle noise, the likelihood of false alarms or misdiagnosis increases, subsequently hindering the achievement of gas turbine monitoring plan objectives.As a result, researchers have consistently sought models capable of effectively managing high levels of data uncertainty and noise, considering them highly appealing.Wang et al. introduced a hybrid intelligent technique for diagnosing bearing faults, denoted as wavelet kernel network-bidirectional long-short-term memory-attention mechanism (WKN-BiLSTM-AM).This approach integrates WKN, BiLSTM, and AM, to effectively handle challenges associated with the temporal nature of data, noise interference within bearing fault diagnostics gathered from industrial contexts, and the issues of slow convergence speed and suboptimal diagnostic precision encountered by prior methods.This model provided an accuracy of 98.40% in the bearing fault data set. 24Amirkhani et al. applied the series-parallel structure of nonlinear autoregressive exogenous models to tackle the fault detection challenge within a heavy-duty gas turbine to achieve more robustness and stability against uncertainties and perturbations. 25Zhang et al. recently introduced a model, referred to as residual compensation learning machine (RCELM).These models, built upon the foundation of ELMs, incorporate residual compensator blocks into their structure. 26In this model, each of the compensating blocks should recover some of the useful information lost due to poor performance of the baseline ELM.According to their research, this network is more robust than other methods such as SVM, ELM, and backpropagation neural network (BPNN) in solving the problem of gas utilization ratio prediction and device-free localization.Considering the network's inherent robustness against outliers, Montazeri-Gh et al. 27 employed it for fault diagnosis in an industrial gas turbine.They conducted a comparative analysis of its performance against fault diagnosis systems based on SVM, regularized ELMs (RELM), and BPNNs.To address the issue of overfitting encountered in the RCELM model with more than two blocks, Chen et al. introduced an enhanced model by further expanding the previous model.They evaluated the performance of this model for the gold price forecasting and the airfoil self-noise forecasting problem.The results demonstrated the robustness of the model when compared to other commonly used models. 28uilding upon previous endeavors aimed at developing a noise-robust diagnosis system, this study introduces a novel model for the first time.By employing a bank of deep residual compensation extreme learning machine (DRCELM) networks with an optimized number of compensating blocks, this model effectively handles data noise while maintaining satisfactory speed in both the training and testing processes.This diagnostic system is developed for a two-shaft industrial power-generating gas turbine that can detect the fouling and erosion of the main components.Training and testing data for setting up the fault diagnosis system are created by a model of the studied gas turbine developed in the T-mats toolbox.Moreover, the t-distributed stochastic neighbor embedding (t-SNE) method is adopted and compared with principal component analysis (PCA) to visualize the gas path fault patterns.The performance of proposed system is compared with gradient descent backpropagation (GDBP), the Levenberg-Marquardt backpropagation (LMBP), SVM, ELM, OSELM, and RCELM.The contributions of the paper are summarized as follows: 1. Introducing a robust gas turbine fault detection and isolation (FDI) system using DRCELM: This paper presents a noise-robust fault diagnostic system for gas turbines, leveraging a bank of DRCELMs with optimized compensating blocks.The remainder of the paper is organized as follows: In Section 2, the studied gas turbine and gas path components are introduced and, in Section 3, the general structure of the proposed fault diagnosis system and its different components are displayed.In Section 4, the simulation of performance deterioration, data generation in various fault classes and data visualization are presented.In Section 5, the model estimating the health parameters based on DRCELM is introduced and its optimal structure is determined.Section 6 presents the results of the performance comparison of the DRCELMbased fault diagnosis system with other systems and the conclusions are drawn in Section 7.

| INTRODUCING THE STUDIED GAS TURBINE
The purpose of this study is to develop a noise-robust fault diagnosis system for identifying gas path faults in the SGT600 turbine.The SGT600 turbine is a two-shaft industrial gas turbine used for both mechanical drive and electrical power generation.However, this study specifically focuses on its application in electrical power generation.With a pressure ratio of 14:1, the turbine produces a net power of 24.5 MW under design conditions.Detailed specifications of the SGT600 turbine can be found in reference. 29Figure 1 illustrates the main gas path components, including the compressor, combustion chamber, gas generator turbine (GGT), and power turbine (PT).The study does not consider the performance degradation of the combustion chamber, as faults affecting the overall gas turbine performance are infrequent in the combustor.Additionally, any malfunction in the combustion chamber would result in increased emissions, leading to a violation of international regulations that prohibit the operation of the gas turbine under such circumstances.

| PROPOSED GAS PATH FAULT DIAGNOSIS SYSTEM
In Figure 2, the overall structure of the gas path fault diagnosis system is depicted.This research investigates seven fault detection and isolation models: ELM, RCELM, SVM, LMBP, OSELM, GDBP, and DRCELM.
Each model consists of six subsystems that map the delta vector of measurement parameters ( Y ∆ ) to the delta vector of health parameters ( X ∆ ).The Y ∆ vector is derived from the difference between the measured values of the gas turbine and the values obtained from the gas turbine model developed using the T-mats toolbox, under the same control and environmental conditions.
The model's output indicates the deviation of engine health parameters from their normal state.By following the principles outlined in Table 1, the model can identify the specific type of fault that has occurred.The degradation levels for each component were established based on findings from published experiments.In this investigation, we utilized the values cited in Zhao and Chen 22 as a basis for estimating the implanted faults.For instance, a ratio of 3:1 in the deviation of flow capacity and efficiency of the compressor might indicate a potential fouling fault in the compressor.
To generate the necessary training data, the T-mats model is adopted in this research.From the nine available measurement parameters, the optimal measurements are selected, which are then used as inputs for the models estimating health parameters.This research utilizes the variable length genetic algorithmextreme learning machine (VLGA-ELM) method to achieve the optimal sensor selection.The approach involves the variable length genetic algorithm suggesting diverse sensor combinations to a basic fault diagnosis model built upon the ELM framework, for the purpose of training and evaluation.Following this, the objective function compressor fouling (CF), formulated with respect to the fault diagnosis SR and the number of sensors N ( ) m , is employed to evaluate the effectiveness of the current sensor combination within the optimization cycle In the aforementioned equation, Weights W W and 1 2 are employed to achieve a balance between the SR and the number of sensors.Since CF is a composite function involving both the SR and the number of sensors, the genetic algorithm aims to discover an optimal solution that maximizes the SR while minimizing the number of sensors.The mechanism of the VLGA-ELM algorithm is further detailed in previous research. 23The optimal parameters selected by VLGA-ELM include GGT speed (NGG), compressor outlet pressure (P 2 ), inlet pressure to the PT (P 4 ), and PT output temperature (T 5 ).Additionally, the proposed structure encompasses a component for visualizing gas path faults based on the t-SNE algorithm.

| GAS TURBINE PERFORMANCE DETERIORATION
There are two approaches for training and evaluating data-driven models: utilizing experimental data or simulated data.As experimental data can be challenging to obtain and are typically accessible to gas turbine manufacturers, this study employs the second approach, which involves using simulated data.The use of simulated data is further elaborated upon in the following section.

| Modeling of the gas turbine in T-mats
In this study, the T-mats toolbox is utilized for the development of a performance model of the gas turbines, conducting performance deterioration simulations, and generating data.To assess and validate the performance of the gas turbine model, the approach outlined in Chapman et al. 31 is implemented.This evaluation process involves three levels, with the outputs at each level compared to the results obtained from the simulation in Gasturb software.
The first level of evaluation focuses on the component level.At this stage, the inputs are provided to each component of the gas path, including the compressor, combustion chamber, and turbines, to examine the validity of their respective outputs.The componentlevel evaluation results are presented in Table 2.As T A B L E 1 Relationship between gas path faults and health parameters. 30oving to the subsequent level, the model's performance is examined at the system level after connecting the blocks of gas path components.At this stage, no solver is utilized.The modeling results at this level are presented in Table 3.Generally, the errors are slightly higher compared to the previous level.This could be attributed to the fact that the output of one component, which may contain some error, serves as an input to the subsequent component.The highest error is observed in the outlet temperature of the combustor, amounting to 0.42%.

Gas
At the third level, a solver is introduced to the model.This enables the system to converge toward the final solution, starting from an initial point utilizing the Newton-Raphson numerical method.Selecting an appropriate initial point significantly impacts the convergence rate.Hence, to expedite the model, the steady-state working line and performance deterioration simulation adopt the suitable solution from each operating point as the initial conditions for the subsequent point.As demonstrated in Table 4, the errors pertaining to temperatures and output power are reduced at this level compared to the previous level.The largest error is observed in the output temperature of the PT and the net output power, both registering a value of 0.12%.

| Simulation of performance deterioration
Once the overall functionality of the model is verified, the simulation of gas turbine degradation is conducted in the subsequent step.The T-mats compressor and turbine blocks can simulate performance deterioration by incorporating the percentage deviation of health parameters, namely flow capacity and efficiency.Since the focus here lies on examining fouling and erosion faults in components, the deviation ratio of the health parameters is applied to the input of the specific component block.Based on Table 1, these faults are generated at varying severities for each component.For instance, Figure 3 illustrates the generator turbine block, where the inputs "SF_det_m_ggt" and "SF_det_etta_ggt" represent the percentage deviation from the normal condition for the flow capacity and efficiency of the generator turbine, respectively.
To model a healthy turbine, these inputs are set to zero.Alternatively, based on Table 1, the ratio and degree of deviation for these parameters can be adjusted to simulate the desired fault with a specific severity.In this study, it is assumed that the maximum severity of fouling and erosion in the compressor occurs when the efficiency deterioration in this component reaches 5%.Similarly, the maximum severity for the generator turbine and PT is set to 4%.In Figure 3, the HPT refers to high pressure tur or gas generator turbine.
T A B L E 3 Model evaluation at the system level without the Newton-Raphson solver.Modifying the health parameters and introducing faults in the aforementioned sequence will result in deviations in the performance parameters of the model from their normal state.However, the manner and extent of these performance parameter deviations caused by faults are not solely dependent on the fault type.They also vary based on the gas turbine's control mode and operating conditions.For instance, when the gas turbine operates in turbine inlet temperature (TIT) control mode, the fuel valve adjustment by the controller ensures that the TIT remains within the set point, even in the presence of performance faults.Thus, under such conditions, the occurrence of erosion faults in the compressor is not expected to significantly affect the turbine outlet temperature, which directly relates to TIT.

Output
Since the investigated turbine is analyzed in its generator application, the simulation of performance deterioration is conducted under PT speed control mode.In other words, the PT speed is assumed to remain constant, ensuring the generation of electric power at a consistent frequency of 50 Hz, as typically required in power plants.Moreover, various loading conditions of the turbine are examined, specifically ranging from 100% to 50% loading.
Figure 4 illustrates the simulation results displaying the degradation of the studied gas turbine due to fouling and erosion faults in the compressor and turbine.For example, Figure 4A depicts the variations in the performance parameters of the gas turbine following the occurrence of 100% fouling in the compressor.Notably, the turbine loading conditions significantly influence the magnitude of performance parameter deviations.Consequently, when developing a datadriven gas path fault diagnosis system, it should ideally be used only under the conditions for which it has been trained (e.g., solely at 100% loading).Alternatively, the system should be pretrained to accommodate all potential operating conditions, or new conditions should be gradually introduced to the system by incorporating new data.

| Simulation of sensor noise
The data produced by the performance model is in its raw form and free from noise.However, to incorporate the level of uncertainty specified in Table 5, Gaussian noise is introduced to each measurement parameter.To ensure the accuracy of the simulation process, the data generated by T-mats is compared to real data.Figure 5 illustrates the comparison between some of the noisy simulated data and actual data obtained from a similar gas turbine during test conditions for the CF fault.As depicted in Figure 5, the results exhibit a reasonable level of consistency with the test data.Consequently, this simulated data is utilized for training the fault diagnosis system in the subsequent stages.

| Gas path fault visualization
In this study, t-SNE is utilized to extract significant features from the measurement parameters.For the specific problem at hand, there are a total of nine input features.By reducing the dimensionality, three components are derived from these features and visualized in a three-dimensional space.Figure 6A,B shows the visualization of gas path faults using PCA and t-SNE methods, respectively.If we want to evaluate the performance of these methods qualitatively, upon comparing Figure 6A,B, it becomes evident that the t-SNE method offers superior separability compared to PCA.While distinguishing classes at lower fault severities is challenging in both methods, it proves to be more difficult with PCA.Furthermore, it is observed that the t-SNE method is less impacted by noise across all severities.For the quantitative evaluation of the t-SNE model, the extracted features from this model are used as inputs to an SVM classifier.The classification accuracy of this combined model is then compared with that of a combined PCA-SVM and Kernel PCA-SVM (KPCA-SVM).For this purpose, a data set of 786 gas path samples belonging to seven different classes is divided into training and testing sets with an 80:20 ratio.The hyperparameters of the dimensionality reduction models are optimized using a grid search.t-SNE is equipped with three hyperparameters, namely the learning rate, perplexity, and distance metric.In this study, the learning rate is assigned a value of 10, while the perplexity is set to 17.The distance metric, which quantifies the similarity between input data points, can be specified as either Euclidean distance or cosine distance.For this particular investigation, the cosine distance is chosen as the optimized parameter.Regardless of the specified number of components in the PCA and KPCA models, which remain fixed at three for this particular study, there are two additional parameters specific to the KPCA.One of these parameters is the kernel function, which has been chosen as sigmoid due to its optimal performance in this specific context.Moreover, the gamma parameter has been carefully tuned and set to a value of 0.014 to achieve optimal results.Table 6 provides a comparison of the t-SNE-SVM, PCA-SVM, KPCA-SVM, and SVM classifiers without dimensionality reduction.The results presented in this table provide quantitative confirmation of the superior performance of t-SNE model, as observed qualitatively in Figure 6.As the results demonstrate, the t-SNE-SVM model achieves the highest accuracy at 96.8%, while the PCA-SVM method presents the lowest accuracy at 89.9%.Explaining the reason for the lower accuracy of the PCA-SVM model, which performs even weaker than a simple support vector machine model, it may be due to the nonlinear nature of the problem.This nonlinearity causes the features extracted by the PCA-based model not to encode the necessary information for gas path fault classification effectively.Nevertheless, the findings of this section confirm that the t-SNE method outperforms both PCA and kernel PCA methods in the visualization of gas path data.

| Extreme learning machine
The ELM network, which was introduced in 2006, 32 is renowned for its fast training process and excellent generalization performance.The structure of an ELM network is illustrated in Figure 7.
This network consists of a hidden layer and an output layer, which can be computed analytically by randomly assigning the network parameters in the hidden layer.Given N distinct samples of ordered pairs x y ( , ) i i , where

∈
, the  output of a single-hidden-layer feedforward neural network (SLFN) with L nodes in the hidden layer can be expressed as follows: where a a a = [ , …, ] represents the weight vector connecting the input nodes to the ith hidden node, represents the weight vector connecting the ith hidden node to output nodes, b i represents the bias of the ith hidden node, and G (.) represents the activation function of the hidden layer.During the training process of an SLFN network, the objective is to determine the parameters a i , b i , and β i by solving the following optimization problem: where ∈ , and matrix H is referred to as the hidden-layer output matrix .
Equation ( 3) is typically solved using gradient-based methods.However, in an ELM network, the matrix H remains constant throughout the training process due to the random determination of a i and b i .Consequently, the aforementioned equation is transformed into the following optimization problem: (5) According to this equation, the ELM aims to minimize both the norm of the training error and the norm of the output weights simultaneously.In line with Bartlett's theory, it demonstrates superior performance compared to gradient-based networks.Equation ( 5) can also be expressed in the following manner:  In the given equation, ξ i represents the training error of sample x i and C is a regularization parameter that balances the trade-off between the network error and the model complexity.This optimization problem is solved using the Lagrange method and Karush-Kuhn-Tucker conditions, as illustrated below In this case, I denotes the identity matrix.Ultimately, the output of the ELM, determined by the value of β, can be computed as shown below It is important to note that one drawback of this network is its performance dependence on randomly assigned weights.

| DRCELM
One of the challenges faced by fault detection and identification systems is their susceptibility to noise.
The presence of noise can lead to false alarms or failure to detect faults in their early stages.To develop a noiserobust fault diagnosis system, RCELM has recently been employed for gas turbine fault detection and identification. 26However, this network does not perform well in cases where more than two compensating blocks are required.To address this limitation, Chen et al. proposed a new network by deepening its structure. 28The revised network, with slight modifications compared to Chen et al.'s paper, is presented in Figure 8.This structure, similar to the previous model proposed by Zhang et al., consists of a baseline ELM and multiple compensating blocks.One notable difference between this model and Zhang's model is the input to the compensating blocks.In addition to the input from the baseline ELM, each compensator also takes the last output from its preceding network as an input, resulting in a deep network architecture.In this structure, the residual is defined as the difference between the desired output in the training process and the last estimated output.For a more detailed explanation of the DRCELM network theory, readers are referred to Chen et al. 28 To design this network for gas turbine fault diagnosis, an optimal baseline ELM is first established based on accuracy and F 1 index measurements.As shown in Figure 9A,B, increasing the number of hidden layer neurons in the baseline ELM leads to improved accuracy and F 1 scores.However, beyond 10 neurons, these indices show little change.Consequently, the baseline ELM is configured with 10 neurons in the hidden layer.

| 4011
After determining the optimal structure of the baseline ELM, the next step is to determine the appropriate number of compensating blocks.By setting the number of neurons in the hidden layer to 50, the accuracy and F 1 scores are examined for each compensating network as the number of blocks increases.Based on Figure 10A,B, deeper networks exhibit higher accuracy and F 1 scores.However, for more than 45 compensating blocks, there is minimal improvement in these metrics, while the model's complexity increases.Hence, in this study, based on the available data, the optimal number of compensating blocks for the gas turbine fault diagnosis model is determined to be 45 units.It is worth mentioning that the activation function for the hidden layer of the baseline ELM and all the compensators are set to the logistic sigmoid function.Furthermore, the hidden layer's weights and biases have been randomly generated from the normal distribution.

| Benchmark FDI systems
This paper aims to compare the performance of the DRCELM-based FDI system with other conventional approaches, including ELM, SVM, GDBP, OSELM, LMBP, and RCELM.Each method offers certain benefits while also facing some drawbacks.The OSELM algorithm, derived from ELM, utilizes a recursive equation to learn the training data set sequentially.As a result, the OSELM proves to be a suitable solution for nonstationary systems like gas turbines.The theory of OSELM closely resembles other ELM-based algorithms discussed in the previous section.For more in-depth information on the OSELM, please refer to Montazeri-Gh and Nekoonam. 23n the other hand, SVM is a promising solution for small and high-dimensional data sets.This algorithm focuses on optimizing the empirical risk (ER) rather than the network error 33 In the given equation, t i represents the target value, y i represents the model output corresponding to the ith sample and N represents the number of data points.The parameter ε serves as a threshold that defines two boundaries and determines the maximum allowable deviation between the model outputs and target values.Using this ER, the optimization problem can be formulated as follows: In the provided equation, the parameters C SVM , α i + , and α i − are referred to as Lagrange multipliers, and K x x ( , ) i j is denoted as the kernel function, which can be expressed as follows: Due to the nonlinear nature of the system, this article utilizes kernel SVM.The performance of the SVM model is greatly influenced by its hyperparameters (ε, σ, C SVM ).Consequently, a genetic algorithm is employed in this study to optimize the parameters of the SVM-based FDI system.The outcomes are presented in Table 7.
GDBP and LMBP are multilayer perceptron models commonly utilized for regression and classification tasks.The backpropagation method is typically employed to train multilayer perceptrons. 34If we define the cost function as follows: where t represents the input, p denotes the network weights, σ y i signifies the measurement error for data point y t ( ) i , and W is a diagonal matrix with diagonal elements W σ = 1/ ii y i .Therefore, the gradient of this cost function in the GDBP model can be expressed as follows: The parameter update that guarantees movement in the direction of steepest descent is given by the following equation: In this equation, α is referred to as the learning rate, which is set to 0.015 in this paper.One of the main limitations of the gradient descent technique is that the activation functions of the model must be differentiable.In LMBP, the Gauss-Newton method is combined with gradient descent as shown below where λ represents the damping factor and h LM repre- sents the update parameter in the Levenberg-Marquardt algorithm.The Gauss-Newton method demonstrates satisfactory performance when the initial condition is sufficiently close to the optimal solution.Therefore, λ is initially set to a large value to enable the algorithm to behave like gradient descent.Subsequently, the damping factor gradually approaches a smaller value to These tables encompass seven fault patterns: CF, compressor erosion, gas generator turbine fouling, gas generator turbine erosion, power turbine fouling, power turbine erosion, and healthy.The closer these matrices are to the diagonal matrix, the better the performance of the diagnostic system.By analyzing the confusion matrices, the accuracy and other indexes can be calculated for the DRCELM model, as well as for other models, as shown in Figures 11  and 12A. Figure 11A compares the accuracy of model classification in the testing process.The DRCELM-based fault diagnosis system provides an accuracy of over 97% in noisy data and over 98% in noise-free and combined data.It is evident that the accuracy of DRCELM-based fault diagnosis systems, LMBP, OSELM, and to some extent RCELM, is noticeably higher than other models.Furthermore, the results show that for other models, there is a significant difference in performance between noise-free and combined data categories, while the performance of the DRCELM system has been nearly consistent across both data categories.This observation is also valid for the F 1 index, as shown in Figure 11B.It is noticed that the LMBP model achieved the highest F 1 index value for noise-free data, while the same index reached its maximum value for combined data in the case of the DRCELM model.For this model, the F 1 index is over 98% for noisy data and over 99% for noise-free and combined data categories.Abbreviations: CE, compressor erosion; CF, compressor fouling; DRCELM, deep residual compensation extreme learning machine; GGTE, gas generator turbine erosion; GGTF, gas generator turbine fouling; PTE, power turbine erosion; PTF, power turbine fouling.
Figure 12A illustrates the mean values of FNR and FPR.This index is below 2.5% for noisy data and less than 2% for noise-free and combined data.Such low values indicate the system's capability to accurately detect faults at the early stages of anomaly development in gas turbines.Notably, the DRCELM and LMBP fault diagnosis systems demonstrate superior performance compared to other models across all three data sets.Additionally, the results indicate that the DRCELM model achieves the same level of accuracy as the LMBP model and other models in the noise-free and combined data sets.
In this study, we aimed to assess the robustness of the DRCELM, OSELM, RCELM, and LMBP models, which exhibited favorable performance with in-domain data compared to other models.To evaluate their performance under different conditions, we conducted tests using outof-domain data sets.The outcomes of this evaluation are depicted in Figure 13.For the evaluation, the mentioned fault diagnosis systems were subjected to 26 diverse data sets, each with signal-to-noise ratios ranging from 5 to 30 dB.The 30 dB noise level was characterized by mild noise, making it nearly indistinguishable from noise-free data during testing.Conversely, the 5 dB noise level posed a substantial challenge to fault diagnosis systems due to its noticeable noise interference.The DRCELM-based fault diagnosis system demonstrated an impressive accuracy of 98% at a 30 dB noise intensity.However, this accuracy gradually declined to 62% as the noise intensity increased to 5 dB.Nonetheless, it is worth noting that the DRCELM system consistently outperformed other models across various noise levels.Surprisingly, the LMBP model, which | 4015 had previously shown satisfactory performance within the specified range, experienced a significant decline in accuracy when exposed to higher noise levels outside its domain.Its accuracy dropped from 98% at a 30 dB signalto-noise ratio to 46% at a 5 dB signal-to-noise ratio.
Furthermore, the OSELM model demonstrated weaker performance compared to DRCELM and LMBP in the presence of light noise, achieving an accuracy of 94% at a 30 dB signal-to-noise ratio.Nevertheless, it displayed relative resilience in higher noise environments, maintaining an accuracy of 61% at a 5 dB signal-to-noise ratio.
Regarding the RCELM model, its most noteworthy characteristic was its consistent accuracy, remaining relatively stable within the range of approximately 30-19 dB.In contrast, the other models consistently experienced performance degradation as noise levels increased.The results highlighted the unstable behavior of the other models when confronted with varying noise levels, particularly evident in the case of the RCELM model.The bar chart in Figure 12B compares the training and testing times of the models.It should be noted that, for better visualization, the vertical axis is presented in a logarithmic scale.The fault diagnosis models based on OSELM and ELM exhibited faster performance both during training and testing, whereas the SVM network presented the weakest performance in this regard.
Due to the inclusion of compensatory blocks, the RCELM-and DRCELM-based models are slower compared to OSELM and ELM.However, they still outperformed other methods in terms of training speed and, as previously demonstrated, provided higher accuracy in both noisy and noise-free conditions.

| CONCLUSION
In this paper, we introduced a noise-robust fault detection and isolation system based on a set of DRCELM models, and compared its performance with fault diagnosis systems based on ELM, RCELM, OSELM, SVM, LMBP, and GDBP.First, we generated the required data to train and evaluate the diagnostic model by simulating the deterioration in gas turbines.We then explored the effectiveness of the t-SNE algorithm in visualizing the gas path fault diagnosis problem and found that it provided better separability in the three-dimensional space compared to PCA and KPCA.Next, we established an optimal DRCELM network with 45 blocks to detect gas path faults and evaluated its performance using various metrics including accuracy, F 1 score, mean of FPR and FNR, and execution time.The results demonstrated that the DRCELM-based fault diagnosis system achieved over 97% accuracy in noisy data and over 98% accuracy in noise-free and combined data sets.Moreover, the proposed system achieved F 1 scores above 98% for noisy data and above 99% for noise-free and combined data sets.The mean of FPR and FNR was less than 2.5% for noisy data and less than 2% for noise-free and combined data sets.Furthermore, a comparison of evaluation indicators between the DRCELM model and other fault diagnosis models developed in this study highlighted the superior robustness of the DRCELMbased fault diagnosis system in different signal-to-noise ratios.Although the execution time of the DRCELM model was slightly higher than ELM, OSELM, and RCELM due to the larger number of compensating blocks in its structure, it still exhibited better training speed compared to other methods.Overall, the findings of this study demonstrate the effectiveness of the proposed DRCELM-based FDI system in achieving accurate and robust fault diagnosis in gas turbines.Future research may focus on further improving the execution time of the system without compromising its performance.For future studies, we propose exploring the implementation of OSELM blocks within the DRCELM structure, as an alternative to ELM units.This modification could enhance the fault diagnosis system's potential by providing it with incremental learning capabilities.

F I G U R E 1
Main gas path components of the monitored gas turbine and measurement parameters.GGT, gas generator turbine; PT, power turbine.F I G U R E 2 Structure of the gas path fault diagnosis system.

F I G U R E 3
Applying the fault to the gas generator turbine block in the T-mats toolbox.GGT, gas generator turbine.F I G U R E 4 Deviation of performance parameters due to gas path faults, (A) compressor fouling, (B) compressor erosion, (C) gas generator turbine (GGT) fouling, (D) GGT erosion, (E) power turbine (PT) fouling, and (F) PT erosion.AMF, air mass flow; FMF, fuel mass flow.

F I G U R E 5
Comparing noisy data obtained from T-mats model with test data.AMF, air mass flow.

F I G U R E 6
Abbreviations: PCA, principal component analysis; SVM, support vector machine; t-SNE, t-distributed stochastic neighbor embedding.

F I G U R E 8
Structure of deep residual compensation extreme learning machine (ELM).NEKOONAM ET AL.

F
I G U R E 9 Effect of increasing the number of hidden-layer neurons on the performance of the fault diagnosis system based on the baseline extreme learning machine.(A) Accuracy and (B) F 1 score.F I G U R E 10 Effect of increasing the number of blocks on the performance of the fault diagnosis system based on a deep residual compensation extreme learning machine.(A) Accuracy and (B) F 1 score.

F
I G U R E 12 Evaluating the performance of fault gas path fault diagnosis systems.(A) Mean of false negative rate (FNR) and false positive rate (FPR).(B) Training and testing time.DRCELM, deep residual compensation extreme learning machine; ELM, extreme learning machine; FDI, fault detection and isolation; LMBP, Levenberg-Marquardt backpropagation; OSELM, online sequential ELM; RCELM, residual compensation learning machine.F I G U R E 13 Performance of deep residual compensation extreme learning machine (DRCELM), Levenberg-Marquardt backpropagation (LMBP), online sequential extreme learning machine (OSELM), and residual compensation learning machine (RCELM) in out-of-domain data sets.SNR, signal-to-noise ratio.
Model evaluation at the system level with the Newton-Raphson solver.