Simplified automatic fault detection in wind turbine induction generators

This paper presents a simplified automated fault detection scheme for wind turbine induction generators with rotor electrical asymmetries. Fault indicators developed in previous works have made use of the presence of significant spectral peaks in the upper sidebands of the supply frequency harmonics; however, the specific location of these peaks may shift depending on the wind turbine speed. As wind turbines tend to operate under variable speed conditions, it may be difficult to predict where these fault-related peaks will occur. To accommodate for variable speeds and resulting shifting frequency peaklocations, previousworkshave introduced methods to identify or track the relevant frequencies, which necessitates an additional set of processing algorithms to locate these fault-related peaks priorto any fault analysis. In this work, a simplified method is proposed to instead bypass the issue of variable speed (and shifting frequency peaks) by introducing a set of bandpass filters that encompass the ranges in which the peaks are expected to occur. These filters are designed to capture the fault-related spectral information to train a classifier for automatic fault detection, regardless of the specific location of the peaks. Initial experimental results show that this approach is robust against variable speeds and further shows good generalizability in being able to detect faults at speeds and conditions that were not presented during training. After training and tuning the proposed fault detection system, the system was tested on ‘‘unseen’’ data and yielded a high classification accuracy of 97.4%, demonstrating the efficacy of the proposed approach.

machine terminal quantity analysis, for detecting induction machine faults. [10][11][12] Stator current is commonly used in MCSA since it is sensitive to the rotor faults, and it is a suitable method to obtain a diagnostic index allowing the discrimination between faulty and healthy conditions. 13 Rotor asymmetry has been shown to induce a change in the generator stator current spectral content at slip-dependent sidebands of the dominant supply frequency harmonic components. Closed-form analytical expressions have been derived as to the specific locations of these sidebands 11,[14][15][16][17][18] that can be monitored for diagnostic purposes for machine operating in steady state. MCSA was expansively investigated for rotor asymmetry detection in WRIGs under steady-state conditions, ranging from analysis of experimental data only 19,20 to investigation on the basis of both simulation and experimental data. [21][22][23] In the WT industry, electrical signal-based CM has been gaining increasing attention, as it requires almost no additional capital expenditure. 24 This is due to the fact that electrical signals are already available in existing WT control and protection systems, and no additional sensors or data acquisition devices are needed. However, despite the increasing concern about WT electrical component reliability and growing attention in electrical signal-based CM, monitoring generator electrical faults has not yet become standard practice in the wind industry. The majority of WT CM systems (CMS) are mainly based on monitoring high-frequency vibration in gearbox and generator bearings. 25 In WT time-varying operating conditions, as the speed and then the slip changes, the frequency components associated with rotor faults are spread in a frequency range proportional to the operational speed variation range. This represents one of the main limitations associated to the effective implementation of MCSA techniques for WT generator CM as accurate information about the machine speed and specialist knowledge, with advanced signal analytics experience, are required to interpret large amounts of complex monitoring data. 3 Further to this, if the machine speed is unknown, it may not be possible to predict where these fault-related peaks will occur. In order to achieve a good spectral resolution, signal should be also sampled at high sampling rates, requiring large memory space for data storage and processing. To use CM information successfully for optimizing the O&M strategies, systems that can automatically analyse and interpret large volumes of CM data are required.
Recent works have dealt with the problem of WRIG rotor fault detection under nonstationary conditions. Gritli et al 26 and Vedreño et al 27 proposed approaches on the basis of detecting the faults through increases in signal energy in specific frequency ranges, including the fault component of interest. However, they rely on the use of the computationally intensive discrete wavelet transform (DWT) as filtering tool to isolate frequency intervals of interest and require the knowledge of the machine speed to identify in which band of the DWT decomposition the fault components appear. The harmonic order tracking analysis method presented by Sapena-Bano et al 28 rearranges the information in the current time-frequency spectrograms into simplified graphs displaying a unique pattern for each type of fault. However, this methodology strictly relies on the knowledge of the rotor position in every sampling time, which is usually not available on operational WTs. All these approaches, besides requiring the knowledge of the machine speed or rotor position, imply complex additional calculations to characterize the machine fault signatures for every machine operating condition. This makes their industrial implementation difficult. This paper proposes an automated fault detector of WT induction generator rotor electrical asymmetry that requires minimal prior knowledge of the machine operating conditions and is invariant to the issue of shifting slip-dependent fault-related peaks in the stator current spectral content.
Building on previous research, 14,16,17,20,23 the proposed method is a simplified but effective approach that negates the need for supplementary processing to identify operational parameters and is based on the combination of simplified signal processing tools for feature extraction from the machine stator current signal and the use of a linear classifier for fault detection. Unlike previous works on artificial intelligence (AI)-based algorithms for automatic induction motor fault detection, such as Fernandez-Temprano et al, Frosini et al, and Haroun et al [29][30][31] and those reviewed in Riera-Guasp et al, 12 tested under machine steady-state conditions, this work has been specifically developed to improve the fault diagnostic reliability during WT nonstationary load and speed operating conditions. In general, the approach for developing an automated detection/classification system is to first extract a set of features from input data (eg, images and speech signals) and then use these features to train a classifier. 32 Subsequent data can then be classified by applying the same feature extraction approach and then inputting the features into the previously trained classifier to arrive at a decision about the class of the input data (eg, healthy or faulty), or even a categorization of the level of the fault severity, if a fault exists. The selection of an appropriate set of features is undoubtedly critical in this process towards implementing a successful detection system, as it is possible to extract features that do not contain information relevant for classification. Therefore, the suitability of the proposed set of features for automated fault detection is also investigated in this work, by applying a form of supervised dimensionality reduction to allow for both visual and numerical analyses of experimental healthy and faulty data. The applicability of the proposed approach to detecting different fault levels is experimentally validated in a laboratory test rig.
The main advantages of the proposed method are where f is the fundamental supply frequency, s is the induction generator fractional slip, p is the machine pole pair number, k = 0, 1, 2, 3, ..., and j = 0, 1, 2, 3, ... relate to air-gap field space harmonics resulting from the layout of the machine and supply time harmonics in the current, respectively.
Given these findings regarding the manifestation of significant spectral peaks due to rotor asymmetry and the variability of the peak locations with the machine operating speed, the extracted features should incorporate information across the various frequency bands of interest. To this end, a set of bandpass filters is proposed (hereafter referred to as a filter bank), where the cut-off frequencies of the bandpass filters are determined on the basis of an expected range of slip-dependent sideband peak locations, derived from both theory and actual experimentation.
The average spectral magnitude contained in each frequency band of the filter bank is computed and concatenated to form a vector of features, whose length is equal to the number of bandpass filters.
With the introduction of these bandpass filters, any fault-related information contained within the frequency bands of interest can be found without the need for any frequency tracking or identification of the relevant spectral components. Each bandpass filter is expected to cover the range of where the relevant spectral peaks occur, and taking the average magnitude in each band reduces the spectral content contained in that range to a single metric in which the fault-related information is naturally captured. These metrics, which will be used as the ''features,'' can then be combined (ie, concatenated) for usage in automatic classification.

Feature extraction
The mathematics for computing the features from the input signals are as follows. For all data, the features are extracted using a filter bank approach on the stator current spectra whereby a set of R bandpass filters, H r for r = 1, … , R, are applied (ie, multiplied by the spectrum). Each filter encompasses a frequency band of range f range = f r,max − f r,min , which can discretely be represented as where f r,min and f r,max are the lower and upper cut-off frequencies, respectively, for the r th bandpass filter; N is the length of the signal being analysed, where N ≥ N w · f s (with the inequality accounting for optional zero-padding), and k ∈ N. Application of these bandpass filters is essentially an extraction of the spectral content in the frequency range of interest (ie, zeroing out the spectral content outside of the range of the bandpass filter, while ''passing,'' or multiplying by 1, the spectral content contained within the range of the bandpass filter). Different stator current bandpass filters are constructed under the guidance of work done by Zappalá,34 which states that in faulty spectra (compared with ''healthy'' spectra), higher amplitude peaks will appear at the 2sf upper sidebands of the supply frequency harmonics hf, where h = 1, 2, 3.. is the supply harmonic order. Therefore the filter bank will be constructed such that each bandpass filter encompasses this 2sf upper sideband of each supply frequency harmonic order h: where f shift marks the start of the bandpass filter. The frequency range containing the sideband peaks, f range , can be computed from the expected machine operational speed range. Note that while there could be an equivalence of r ≡ h if there is a filter for each harmonic, in some cases it may not be necessary to have filters placed at each harmonic so that r ≠ h. 23 From the frequency spectra computed from each windowed signal, a feature vector, v, will be constructed with each element in the vector v[r] representing information from each bandpass filter and is generated by the following equation: where k 1 = ⌊f r,min (N∕f s )⌋ and k 2 = ⌊f r,max (N∕f s )⌋ (ie, the start and stop indices of the range in which the filter is nonzero), and v[r] ∈ ℜ R . A flowchart of this proposed feature extraction approach is shown in Figure 1.
Note that there are several parameters that can be varied.
• Length of the analysis time window, N w .
• Frequency range of the bandpass filters, f range .

FIGURE 1 Feature Extraction Process
During experimentation, these parameters will be tuned (ie, the ''best'' values will be determined) by classifying the data using a held-out development set (not to be used in testing), in part to quantify the effects of these parameters on the resulting classifications.

Dimensionality reduction and classification
To investigate the suitability of these features for automatic fault detection, Fisher linear discriminant (FLD) will be applied to the features since its usage permits visualization of the proposed features in a lower dimensional subspace and can help identify whether or not a distinct separation between healthy and faulty data can be attained. Note that although FLD is used and discussed herein, other popular classifiers such as support vector machines or artificial neural networks could also easily be applied, but the choice of classifier is not the primary focus of this work. FLD is merely used as a tool here to investigate the feasibility and demonstrate the efficacy of the proposed feature extraction approach. With FLD, the ability to achieve the desired distinctive separations can also be further analysed to determine whether or not such linear classifiers may be capable of detecting faults at multiple levels (eg, the fault detector should ideally be capable of detecting various levels of rotor imbalances). FLD further provides an added benefit of dimensionality reduction, as the application of FLD computes a linear function of the input data as follows: where y is the (one dimensional) FLD output, w is an R-dimensional weight (ie, projection) vector, and v is an R-dimensional input vector (eg, a feature vector derived from an input data sample). The weight vector w is determined using labelled ''training'' data to solve an optimization problem that minimizes the within-class variability (ie, spread of the data) and maximizes the between-class separation of the projected output data.
A derivation of FLD will be described here to provide additional details on how FLD works.
Mathematical representations should first be defined for two quantities of interest: the within-class variability and the between-class separation of the resulting projected FLD outputs. While FLD has a multiclass variant, 32 for simplicity, the following description of FLD will be restricted to two classes. The within-class variability of the projected output from Equation (6) will be denoted as S p k and can be defined as the total sample variance, which is given by 32 where y n is the projected data point of the n th input vector for n ∈ C k , where C k is the class and k = {1, 2} is the class index, and m p k is the mean of the set of projected data points for class C k .
The between-class separation of the projected data for classes C 1 (ie, class 1) and C 2 (ie, class 2) will be denoted as m p 12 and can be defined as a distance between the means of the projected data from each class: where m p 1 and m p 2 are the means of the projected data belonging to C 1 and C 2 , respectively, and are computed using their respective sample mean: where N k is the number of samples in class C k . To both maximize the between-class separation and minimize the within-class variability, a ratio between the two measures can be constructed to formulate the following optimization function 32 for w, the R-dimensional weight (ie, projection) vector: where S p 1 and S p 2 are the variances of the projected data belonging to classes C 1 and C 2 , respectively. The projected means and variances on the right-hand side of Equation (10) can be rewritten in terms of the original data and the weight vector w (making the dependence of the function on the weight vector explicit) to obtain the following final form of the desired optimization function: where m 1 and m 2 are the sample means of the input data (feature vectors) and are given by where v n is from Equation (6) but explicitly for n ∈ C k , and S W is the total within-class covariance: Maximizing J(w) from Equation (11) with respect to w results in the following closed-form solution for w 32 :

Experimental rig and data curation
The proposed approach was tested using data collected from a small-scale CM test rig used for WT drive train analysis. The rig was designed to act as a model for a WT drive train with the purpose of producing signals comparable with those encountered on an operational WT. It features a 54-kW direct current (DC) motor, operated as a prime mover and simulating the WT rotor input, driving an industrial four-pole, three-phase, 50-Hz, 30-kW WRIG. The WRIG is driven at either constant speed or nonstationary, variable speed conditions to reflect the stochastic effects of wind torque driving, via a commercial DC machine drive. A schematic diagram of this experimental facility is shown in Figure 2, and two photographs are shown in Figure 3.
Details of the test rig are given in Crabtree 33 and Zappalá. 34 Seeded-fault conditions can be induced or removed from the test rig drive train as required, enabling several electrical and mechanical faults to be implemented repeatedly on demand and under controlled driving conditions.
Rotor electrical asymmetry was simulated on the test rig WRIG by using a resistive load bank externally connected to the rotor circuit via the machine slip rings to vary the resistance into one rotor phase winding circuit. For experimental purposes, to represent the development of rotor electrical faults on an induction generator, such as brush-gear or slip-ring wear, two seeded-fault levels were implemented on the test rig by successively adding two additional external resistances of 0.3Ω and 0.6Ω, respectively, to phase 1 of the rotor circuit through the external load bank. The corresponding levels of rotor electrical asymmetry, given as a percentage of the rotor balanced phase resistance, were 21% and 43%, respectively. These values compare very favourably with other studies. 17,21 Data was collected at both steady-state speeds, ranging from 1520 to 1600 rpm and at wind-like variable speeds. In each constant speed test, the rig was driven for 300 seconds, while in each variable speed test, it was driven for 450 seconds to allow for sufficient data acquisition. Variable speed machine testing was performed according to speed profiles derived from a 2-MW variable speed WT model. This model, developed by the  A variety of wind speeds and turbulence intensities, defined as the measure of the overall level of turbulence, 35 was applied to the model. The driving conditions were then scaled to the test rig on the basis of the generator speed data from the model as described in detail in Crabtree. 33 The use of the 2-MW variable speed WT driving model has allowed the simulation of the different dynamic speed behaviours that a full-size WT four-pole double-fed induction generator (DFIG) exhibits both below and above rated wind speed. The scaled generator variable speed signals used for testing, shown in Figure 4, are: 1. 7.5 m/s mean, 6% turbulence intensity, representative of a low mean wind speed with low turbulence, with the WT operating at or below rated wind speed under generator speed control (hereafter denoted as ''7.5m6t'').
2. 15 m/s mean, 20% turbulence intensity, representative of a high mean wind speed with high turbulence, with the WT operating above rated wind speed under blade pitch control (hereafter denoted as ''15m20t'').
Three-phase generator stator terminal voltages and currents are measured using transducer boards with a bandwidth of DC-100 K Hz. A Magtrol TMB 313/431 torque transducer, capable of outputting 60 pulses per revolution, is used measure the shaft torque but also as shaft pulse tachometer. The signal acquisition is performed using an NI 6015 data acquisition pad at a rate of 5 kHz. The pad is in turn connected via shielded universal serial bus (USB) connection to the NI LabVIEW environment that also operates as control environment of the rig. Only one line current signal is presented and analysed here, as is usually the case for MCSA. 33 Three main sets of experimental data were curated and processed in this work that encompass a set of constant speeds spread across the experimental range and also include variable speed data. Table 1 shows the details of each experimental data set. TrainSet is used to train FLD, DevSet is used to determine reasonable feature extraction parameters using the trained FLD weight vector, and EvalSet is used to test the final fault detection system. Note that EvalSet is never used during training or ''development'' phase (during which the system parameters are tuned); the idea is that at least one data set should be held out to test the generalizability of the proposed detection system (ie, how does the detection system perform on never-before-seen data). Also of note is that during training, only the ''healthy'' and ''21% rotor asymmetry'' data are used. This is to test how well the proposed fault detector can distinguish between different levels of fault (eg, ''43% rotor asymmetry'') even when the different fault levels are not present in the training data.
The above data sets are curated in a way to demonstrate the robustness of the scheme against different and variable speeds (ie, to demonstrate the speed invariance of the proposed scheme). The training data set only contains one set of data collected at a single, constant speed, yet the approach is tested on five to six other sets of data (within the DevSet and EvalSet) that are collected at other speeds that are not contained in the training set and also include variable speed data.

RESULTS AND DISCUSSION
A single line current stator signal has been used in the study of the proposed approach to WRIG fault detection. With an expected range of speeds between 1520 and 1600 rpm, each bandpass filter was designed to start at each supply frequency harmonic +1 Hz (ie, for this set of experimental data, f shift = 1 Hz) and has an f range of 7 Hz to capture the upper sideband. The number of bandpass filters used was initially (arbitrarily) selected to be 10 (ie, one filter is placed at the upper sideband of the 1st through 10th supply frequency harmonics), along with a time window of N w = 10 s.
After extracting the proposed set of features and training the classifier, the training data (ie, TrainSet) was projected to determine an appropriate threshold for classifying ''healthy'' and ''faulty,'' which was selected as the halfway point between the means of each projected class.
Numerical measures of the system performance were taken to be the system accuracy (ie, the percentage of correctly identified samples) and the false positive rate (FPR) (ie, the false ''alarm'' rate), which is computed as the number of ''healthy'' samples incorrectly categorized as ''faulty'' over the total number of samples that was determined to be ''faulty.'' The FPR is reported in addition to the system accuracy, as it is an important measure in health monitoring,since declaring a fault when the system is healthy would likely result in unnecessary expenditure of time and money to investigate a non-issue, and the FPR should be extremely low. The resulting classifications for the DevSet (categorizing both ''21%'' and ''43%'' as faulty) are shown in Table 2 for the initially selected feature extraction parameters.
The initial system accuracy and the FPR show the efficacy of the proposed solution in detecting faults in the stator current spectra as the resulting accuracy is quite high with a low FPR. In addition, this approach was coded and timed in MATLAB, and the average processing time to determine system health (ie, healthy or faulty) for one 10-second window of data, at a sampling frequency of 5 kHz, was 9.4 ms. This time includes extracting a set of features and classification, so in addition to high accuracy, there is also very little computational cost required.
It may also be possible to achieve better system performance by ''tuning'' the feature extraction parameters described in Section 2.1.
Tables 3--5 show the system performance results for varying the length of the time window, the number of bandpass filters, and the frequency range of the bandpass filters, respectively.
Note that in Table 5, the lower cut-off frequency (ie, the supply harmonic +1 Hz) is the same for all frequency ranges; it is only the higher cut-off frequency that is varied. For the other parameters, • In Table 3, 10 bandpass filters were used with a frequency range of 7 Hz each (ie, initially selected feature extraction parameters).
• In Table 4, a 7-second time window was used, as this yielded a ''best'' result shown in Table 3, and each filter had a frequency range of 7 Hz.
• In Table 5, a 7-second time window was used with 20 bandpass filters, as this yielded a ''best'' result shown in Table 4.
The results shown in Tables 3 through 5 show that (1) longer time windows yield more accurate results, which is to be expected as better spectral estimates are obtained with more data; (2) adding more filters (ie, generating more features) helps but having too many filters (in the    case of 25 filters) results in lower accuracy-it is likely that at much higher order harmonics, there is not much additional information (more noise) that negatively impacts the ability to discriminate; and (3) the frequency range of the bandpass filters can significantly impact the classification rate. As for the last point regarding the filter frequency range, this should be selected on the basis of on the expected range of machine speeds in order to include relevant slip-dependent fault indicators. Too small of a range will miss capturing fault information, but too large of a range (beyond the expected range of speeds) may include too much noise. This range should be set dependent on the field equipment specifications, but further studies can be done to study the effects of applying noise reduction techniques on the spectra to allow for wider frequency ranges.
The resulting FLD projections for the DevSet are shown in Figure 5.
These features were extracted using a 7-second time window and 20 bandpass filters with a frequency range of 7 Hz. A clear delineation between the healthy and faulty data can be seen. Of interest is also the faulty data with 43% rotor asymmetry can be seen at even a different level (ie, range of projected values) compared with the ''21% rotor asymmetry'' data even though the ''43%'' data was not included during training.
This result further highlights the potential of the proposed approach to be generalized to detect alternate fault levels beyond those present during training.
It should also clearly be noted that the final results presented on the DevSet are tuned (by varying the system parameters) so that better performance can be obtained. When there is a need or desire for tuning parameters, a typical approach in pattern recognition is to use cross validation 32 within the training set, where rotating portions of the training set are held out to test the various parameter settings. The parameters that yield the best system performance when tested on the held-out data are selected as the parameters that should be used on future test data.
The approach used in this work follows this same concept, where the EvalSet can be considered the held-out portion, and the DevSet is used to tune the system parameters. This is also reflected in the naming of the curated data sets; the DevSet, or development set, is used to develop the system, and the EvalSet, or evaluation set, is used to evaluate the system. Therefore, without any further system tuning, using the ''best'' parameters found from the DevSet, the proposed approach was tested on the remaining held-out EvalSet. The EvalSet results are shown in Table 6, and the FLD projections for the EvalSet are shown in Figure 6.
The accuracy still remains relatively high with a 2% relative decrease in accuracy, although the FPR has a significant increase. It can be seen in Figure 6 that one particular set of test data included in the EvalSet (namely, the data on the left-hand side of Figure 6, which was collected at 1530 rpm) does not exhibit as much of a separation between classes as can be seen in the rest of the EvalSet data. It is possible that the data collected at this particular speed contain more noise than the others; future investigations may include noise-reduction techniques and further analyses on potential overfitting.
The proposed methodology can easily be scaled to the higher field WT power levels. WTs are monitored by commercial CMSs, and both electrical and mechanical signals are readily available to the operator, so no additional sensors/data acquisition systems would be required. Its application in the field would probably imply an increase in the spectral background noise because of the higher complexity of WT drive trains compared with the small-scale laboratory rig.

CONCLUSIONS
This paper proposes an automated WT induction generator fault detector that requires very little training data, is robust against variable speeds, and achieves high accuracy at very little computational cost. The novelty of the approach includes a new way to analyse the data, in that other methods require frequency tracking or identification or conventional manual comparison of the spectra (as reported in previous works), but this method does not. The proposed methodology has been validated experimentally on a WT drive train test rig with two rotor fault levels under both constant and variable speed driving conditions, representative of WT generator field operation. The following specific conclusions arise: • A set of bandpass filters is proposed to capture the stator current fault-related spectral information and to train a classifier, making the approach independent on the machine instantaneous speed.
• Unlike previous studies on the basis of the analysis of single speed-dependent fault-related frequencies in stator current, this work is more robust as it draws information from multiple frequency components into a single fault indicator.
• The proposed approach can provide clear differentiation between healthy and faulty conditions, under both constant and variable speed operating conditions, even though the classifier has been only trained on a single fault level and a single constant speed condition.
• Experimental results have initially shown clear discrimination between fault levels, with an almost linear response, even at variable speed.
This suggests the ability of the proposed approach to provide early fault detection of developing damage, crucial for effective maintenance optimization.
• The developed method can be easily implemented into WT CMSs for efficient real-time analysis of data captured without requiring expert knowledge.
Future work in this area includes further investigations into the robustness of the proposed approach, including the impact of different training data (eg, training under variable speed conditions) on the ability to differentiate between different fault levels and healthy data. A comparative study using other classification approaches may also be undertaken as these different training conditions are explored.