Classification of multiple power quality disturbances based on continuous wavelet transform and lightweight convolutional neural network

Aiming at the problems of noise interference and too many network parameters for power quality disturbances' (PQDs') classification based on deep learning, the lightweight convolutional neural network combining maximum likelihood Kalman filter and continuous wavelet transform is proposed. In this proposed method, the disturbed PQD signals are denoised by maximum likelihood Kalman filter, and then the denoised PQDs are converted to time‐frequency diagrams, which can provide rich time and frequency domain information, and finally the lightweight convolution neural network is used for automatically extracting and classifying multiple PQDs. To verify the effectiveness and superiority of the proposed method, a variety of PQDs were tested under different noise levels, the experiment results indicate that the average classification accuracy can reach more than 99% even in the case of 10 dB noise. Compared with the existing classification methods, the accuracy and noise immunity ability are improved. Additionally, the proposed method has decided advantages, as evidenced by its low parameter count of 0.73M and short average test time with only 0.7 ms.

interruptions in power supply. The other is that multiple PQDs are generated due to mutual interference of various disturbance sources, which makes identification more difficult with complex fault features due to more disturbances combined, this will seriously threaten the safety, stability, and economic operation of the power system. Therefore, it is of great importance to accurately identify and classify disturbances signals. By accurately identifying and classifying these disturbances, we can better understand their characteristics, determine their sources, and take appropriate measures to mitigate their impact.
Generally, there are two main steps in PQD classification and recognition: feature extraction and classification. With the advancement of technology and the improvement of theory, many signal processing methods including Fourier transform (FT), 3 wavelet transform, 4 S-Transform (ST), 5 Hilbert-Huang Transform, 6 and Kalman filter (KF) 7 have been widely used for feature extraction. Signal processing techniques are specialized in enhancing the temporal and spectral resolution of signals. However, these techniques face challenges when it comes to adapting to variations in multiple PQDs. Furthermore, when PQD signals are interfered with noise, discerning the characteristics of the original signal from those of the noise becomes significantly arduous.
Regarding the classification aspect for PQD, the support vector machine (SVM), 8 artificial neural network, 9 decision tree (DT), 10 probabilistic neural network, 11 and several others are among the frequently employed as classifiers. Despite achieving relatively high classification accuracies, the efficiency of these classification techniques depends on the extraction of valid features 12 and the selection of appropriate classifiers. In Yuan et al., 13,14 the signal was analyzed using multiscale entropy and the improved multiscale sample entropy values were computed to measure irregularity and complexity as quantitative characteristics of the bolted joint monitoring. Therefore, for a given classifier, it is crucial to decide which features are used for training. However, most of the feature selection mainly depends on the experience and lacks reference, which may result in the loss of information owing to human factors. Also, the manual extraction of features cannot deal with noise PQDs. Thus, the current automatic classification methods for multiple PQDs need substantial improvement in terms of accuracy, flexibility, and consistency.
Deep learning, a closed-loop process, has recently emerged as a technique that can automatically learn optimal features from raw signals for the classification. With high classification accuracy, deep learning approaches such as stacked autoencoder, long short-term memory (LSTM), and convolutional neural network (CNN) have gained widespread usage for PQD classification.

| RELATED WORKS
Over the past few years, researchers have introduced many kinds of classification methods for the identification of multiple PQDs. In Mahela et al., 15 a hybrid combination of indices calculated using the Hilbert transform and Stockwell transform (ST) was proposed for recognizing and classifying the complex PQDs. In Borges et al., 16 combination of ST and DT was proposed to classify complex PQDs, such as second-, third-, and fourth-order disturbances. In Biswal and Dash, 17 a fast ST-and DT-based classifier was proposed to detect and classify multiple PQDs. In Zhao et al., 18 a novel approach, which combined variational mode decomposition signal decomposition and a two-layer multilabel extreme learning machine (ELM) neural network, was proposed for the detection and classification of multiple PQDs. In Thirumala et al., 19 an automated recognition approach based on adaptive filtering and a multiclass SVM is proposed for the classification of PQDs, which include eight single disturbances and seven twocombination disturbances.
As the number of mixed disturbance types increases, the growth rate of the classification labels will become exponential, rendering the complexity and computational burden of the classifier unmanageable. Aiming at this, various deep networks, such as convolution neural network (CNN), stacked sparse denoising autoencoders (SSDAEs), gated recurrent units (GRUs), and LSTM, are proposed for PQD classification. In Deng et al., 20 for the recognition of combined PQDs and their temporal locations, a sequence-to-sequence model based on bidirectional GRU was proposed. In Xiao et al., 21 the detection, feature extraction, and classification of PQDs were investigated by using maximal overlap discrete wavelet transform, space phasor, and improved SSDAEs models. In Cai et al., 22 A hybrid approach, which combines the Wigner-Ville distribution with a CNN, has been proposed for PQD classification. Focusing on the fusion of features from multiple sources, multifusion CNN, as an automatic identification method, was proposed for the recognition of complex PQDs. To solve the low accuracy and poor noise immunity, a novel full closed-loop approach based on a deep convolutional neural network (DCNN) was proposed for classifying multiple PQDs in Wang and Chen. 23 To perform the categorization of both single and combined PQDs in realtime, a method integrating self-adaptive variational mode decomposition, DCNNs, and online-sequential random vector functional link networks has been proposed in Sahani and Dash. 24 To address the issues of low convergence speed, low accuracy, and poor generalization in PQD identification and classification in microgrids, a novel deep convolutional network structure was proposed in Gong and Ruan, 25 and the proposed architecture comprises a five-layer one-dimensional modified inception-residual network (ResNet) and a three-layer fully connectiontier. CNN can achieve very high performances, but too many parameters can result in heavy computation. To address the challenges of noise interference and artificial feature extraction in PQD's classification, a novel method that combines adaptive wavelet threshold denoising with a deep belief network fusion ELM has been proposed in Gao et al. 26 Garcia et al. 27 investigated the effectiveness of various deep architectures (CNN, LSTM, and CNN-LSTM) for PQDs detection and classification.
The use of advanced deep learning models not only enhances the accuracy of classification, but also reduces human intervention and simplifies the overall process. However, they have disadvantages such as poor noise immunity, high computational complexity, and low accuracy for identifying multiple PQDs (trip and quadruple PQDs). First, deep learning models utilized for PQD classification typically only incorporate the time-domain waveform of PQD signals (a one-dimensional signal) as  input, and do not consider additional feature groups, such as frequency-domain information. This results in insufficient feature information and limits improvements in classification accuracy, particularly for multiple PQD due to the complex interactions among various components. In contrast to one-dimensional time series, two-dimensional images have the advantage of visually detecting PQDs and the presentation of distinctive features. 28 Second, too many network parameters with the increase of network structure depth lead to an overabundance of computation and overfitting. Third, the classification accuracy deteriorates in the presence of noise.
Considering the abovementioned limitation of classification methods, this paper presents a hybrid method combining adaptive Kalman filtering and lightweight CNNs for classifying multiple PQDs in realistic noisy environments. Figures 1 and 2 give the block scheme and the flowchart of the proposed classification method, respectively. This paper presents a novel PQD classification method that can achieve classification with low computation and high accuracy in the presence of noise. In this regard, the main contributions of this paper are listed as follows: 1. The continuous wavelet transform (CWT) is utilized to facilitate the transformation of data from the time domain to the time-frequency domain. By treating the time-frequency energy matrix as the pixel matrix of digital image, a time-frequency map is constructed. On the one hand, the time-frequency maps provide an enhanced representation of signals, enabling the extraction of more critical characteristics, and information. On the other hand, converting signals into two-dimensional images, which are used as input for deep learning models, allows it to extract high-level fault features and classify the faults visually, and this cannot be easily achieved with one-dimensional signals. Thus, the proposed method offers a promising avenue for advancing the utilization of image processing technology in the identification of single and multiple PQDs.

A lightweight CNN model based on SqueezeNet,
which has less parameters and model size, is proposed for the classification of multiple PQDs. Due to its compact architecture, SqueezeNet requires less training time and lower computations. Moreover, it can still achieve good classification accuracy in comparison with other large networks. 3. Performance comparison between the proposed method and several existing classification methods is done to confirm the superiority of the proposed method.
The remainder of this paper is as follows. Section 1 details the introduction, and Section 2 describes related works. The time-frequency method based on CWT is introduced in Section 3. The SqueezeNet CNN is proposed for the classification of multiple PQDs in Section 4. Experimental results and analysis are described in Section 5. Section 6 is the conclusion of this paper.

| TIME-FREQUENCY ANALYSIS BASED ON CWT
The CWT is a mathematical technique used to analyze nonstationary and nonperiodic signals. Unlike the FT, which decomposes a signal into a sum of sine waves of different frequencies, the CWT decomposes a signal into a set of wavelets that are localized in both time and frequency. The CWT calculates the inner product of the signal with the wavelet at each time and scale, producing a two-dimensional time-scale representation of the signal, which is known as the scalogram.
The CWT formula can be interpreted as a convolution of the signal with the wavelet function at different

C10
Sag with oscillator

C14
Impulsive with oscillator Oscillator with flicker values of a and b, in which a and b represent the scale parameter and the translation parameter, respectively. By varying the values of a and b, we can obtain different views of the signal in the time-frequency domain, allowing us to detect and analyze different features of the signal. the CWT of signal x t ( ) is defined as follows: where ψ t ( ) is the mother wavelet function. In this paper, Section 5 will provide a detailed analysis of the time-frequency characteristics of PQD signals by constructing a time-frequency diagram based on the amplitude spectrum. This diagram is effective in reflecting the intensity and distribution of signal moments and frequencies.

| THE SqueezeNet CNN
To obtain good accuracy with low computation, the lightweight CNN called SqueezeNet was proposed by Iandula, 29 in which utilizing 1 × 1 filters to compress convolutional feature maps can effectively reduce the number of convolution weights and preserve high accuracy. The SqueezeNet network 30 depends on the building blocks named Fire modules shown in Figure 3.
In Figure 3, the Fire module is comprised of a squeeze layer and an expand layer, where the squeeze layer has only 1 × 1 convolution kernels and the number of convolution kernels is s1, and the expand layer has a mix of 1 × 1 and 3 × 3 convolution kernels and the number of convolution kernels is e1 and e3, respectively. Normally, the squeeze layer is designed with a fewer number of convolution kernels compared to the expand layer, namely,  s e e 1 1 + 3. This compression technique can bring a substantial reduction in both parameter quantity and computational load, thereby improving the efficiency of the CNN.
The SqueezeNet model is constructed by arranging Fire modules in a stacked form, as illustrated in Figure 4, which has 26 convolutional layers. The SqueezeNet incorporates a series of convolutional layers and Fire modules, namely, convolution layer (conv1), eight Fire modules (fire2-fire9), and convolution layer (conv10) successively. SqueezeNet performs max-pooling with a stride of two and average pooling after layers conv1, fire3, fire5, and conv10. Average pooling, which lacks weight parameters, can map the features extracted from the final convolution layer to individual class, and thus reduce the overfitting in the process of training. In a word, SqueezeNet network can achieve the same level of accuracy as AlexNet but with more than 50 times fewer parameters. 31

| Data generation
To prove the effectiveness of the proposed method, 20 different types of power quality distortions signals, including seven kinds of single disturbances and 13 kinds of multiple disturbances, are generated using MATLAB based on the IEEE standard 1159-2009, and their mathematical models are listed in Table 1. To generate diverse sample sets of PQDs, each category has 1500 sets of PQDs, which are characterized by randomly choosing distinct parameters, including onset time, magnitude, duration, and frequency. Meanwhile, considering that the actual PQD signals are interfered with different noises in real operation situations, various levels of Gaussian noise from 10 to 40 dB are added to the generated PQD signals. Therefore, there are a total of 150,000 groups of PQDs, of which 90,000 are used for training, 10,000 are used for validating, and 50,000 are for testing. each PQD signal includes five cycles and 1000 sampling points, and the sampling frequency is 10 kHz.
To improve the noise immunity performance of the proposed classification method, the Kalman filter based on maximum likelihood (KF-ML) 7,33 is used for denoising the raw PQD signals in this paper. For demonstrating the efficacy of KF-ML in reducing noise, Figure 5 gives the comparison of PQD waveforms before and after denoising. Obviously, the estimated signals by KF-ML (the green) are smoother than the original PQD signals (the blue), which are disturbed by 20 dB noises. Therefore, the denoised signals can yield more precise characteristics in comparison to the original PQD signals.
To prove the effectiveness of KF, UKF (unscented particle filter) is used for dealing with PQDs corrupted by 20 dB Gaussian noise. In Figure 5, the curves of estimated signals by UKF (the red) are almost as smooth as those by KF (the green). Moreover, Table 2 summarizes the performance of KF and UKF in terms of root mean square error (RMSE). As seen in Table 2, the RMSE values of the KF are slightly less than those of the UKF. Maybe this is because the noise covariance matrices are optimized by maximum likelihood, which improves the KF algorithm. Figure 6 illustrates the time-frequency diagrams of denoised PQD signals using CWT. It can be seen that the regions exhibiting significant and concentrated amplitude are situated around the band of 50 Hz, that is, the energy is namely concentrated in the fundamental frequency of the PQD signals. In addition, there are significant high-frequency components in C2, C6, C10, C11, C12, C13-C20, and different frequency components exhibit energy fluctuations in the entire time domain. For different PQD types, there are significant differences in frequency distribution and energy change. In Figure 6, the large amplitude of C2 is distributed between 0 and 0.02 s, while the amplitude is small after 0.02 s. This coincidences with the moments when the signal of C2 in time domain decreases drastically after 0.02 s in Figure 5. PQD signals of C9, C15, and C18 drop at a certain time period in Figure 5, and the wavelet timefrequency diagrams in Figure 6 show the same decrease in amplitudes at the same time period. Compared with F I G U R E 6 The wavelet time-frequency diagrams of denoised power quality disturbances. single time domain or frequency domain information, the time-frequency diagrams based on CWT contain richer state change information, which will contribute to the improvement of classification accuracy. It is worth mentioning that the color bar is positioned on the right of the subgraph, where different hues correspond to different magnitudes. Specifically, the more yellow the color is, the greater the absolute value indicated.

| Simulations analysis and experimental results
During the training process, Relu function is used as the activation function, and the training period is 15 epochs, and the number of steps per epoch is 4500 (the maximum number of iterations is 67,500). The experiment uses stochastic gradient descent with momentum, whose value is 0.9. To determine the parameters of learning rate and mini-batch size, various experiments with different values are conducted to train the model. The mini-batch size ranges from 10 to 64, and the learning rate is changed from 0.0001 to 0.01. Table 3 lists the classification results of the validation set with different learning rates and mini-batch sizes. In Table 3, the highest accuracy is up to 98.56% as the values of learning rate and mini-batch size are 0.001 and 20, respectively. Thus, the values of the two parameters are set to 0.001 and 20, respectively. Figure 7 shows the loss and the accuracy of the model during the process of training and validation. As seen from Figure 7, as the number of iterations increases, the loss values converge to a relatively small value, and the two curves of training loss and validation loss almost overlap. This demonstrates that the model is well-fitted and possesses a strong ability for generalization.
For a detailed evaluation of classification performance, Figure 8 presents the confusion matrix obtained from the test, where the x-axis represents the label of the predicted classification and the y-axis represents the label of the actual classification. The elements of the confusion matrix are categorized as diagonal and off-diagonal, corresponding to the accurate and inaccurate classification of a specific type of PQD, respectively. It can be observed from the confusion matrix that most of the classifications are accurate when signal-to-noise ratio (SNR) varies from 40 to 20 dB or there is no noise. When SNR decreased to 10 dB, misclassified samples increase. Especially for C2 (interruption), the number of misclassified samples reaches 106. Maybe this is because the voltage of interruption instantly drops below 10%. Thus, noise severely hinders the feature extraction of C2. To conduct a comprehensive assessment of the proposed model, various evaluation indexes including accuracy rate, precision rate, recall rate, and F1 score were chosen as in Equation (2):   TP TN  TP TN FP FN  TP  TP FP  TP  TP FN  RR PR  RR PR   TP  TP FP FN where TP denotes the accurately labeled positive signals, TN represents the accurately labeled negative signals, FP denotes the erroneously labeled positive signals, and FN represents the erroneously labeled negative signals. Table 4 summarizes four corresponding evaluation indexes and their average values. The classification results in Table 4 clearly show that the highest classification accuracy of this proposed method is 99.76%, and noise has little effect on classification accuracy. Even in the case of 10 dB noise, the classification accuracy is as high as 93.95%, which verifies that it has good noise immunity. When there is no noise or SNR changes from 40 to 20 dB, the precision, the recall, and F1 of any class are higher than 95%, and their average values are more than 99%. As the noise reduces to 10 dB, their average values are more than 93%. The  aforementioned experiments demonstrate that the proposed method exhibits a satisfactory level of classification efficacy. Also, Figure 9 depicts the average values of four corresponding evaluation indexes under different noise levels. Apart from test accuracy, a comprehensive evaluation of the proposed model incorporates model size, parameter count, and average test time. Moreover, other four CNN models including AlexNet, 34 GoogleNet, 35 ResNet-50, 36,37 and VGG-16 38 are selected as the contrast. These experiments were conducted on a computing platform comprising an Intel Xeon Gold 5220 CPU and NVIDIA GeForce RTX 2080 Ti GPU, utilizing Matlab2022a as the programming environment. Table 5 lists the comparison results for classifying PQDs corrupted by 20 dB Gaussian noise. Clearly, the proposed model obtains relatively high test accuracy among all these models. Although the accuracy of our model is slightly lower than that of GoogleNet, our model exhibits superior performance in terms of computational complexity and test time.
Notably, it has a smaller model size of 1.72 MB and a shorter average testing time of 0.7 ms per sample, which are far less than those of the other four models.
It is worth mentioning that the Gaussian noise assumption may not hold, and it may be non-Gaussian, such as Laplace noise and Cauchy noise in practical Grayscale image of power quality disturbance signal.
applications, which exhibit heavy-tail characteristics and deviate from the standard Gaussian distribution. In this case, KF is sensitive and cannot suppress the effects of the non-Gaussian noise and large outliers. Thus, the proposed classification method will degrade. To assess the superiority of the proposed method, some existing classification methods [39][40][41][42][43] are implemented for comparative analysis, and the results are presented in Table 6 and Figure 10. Table 6 shows that the proposed method has almost the same performance as other methods under the condition of low-intensity noise (40 dB noise or no noise). However, in high noise conditions (e.g., 30 and 20 dB), the proposed method outperforms other methods in terms of performance. Even under the condition of 10 dB, the classification accuracy reaches more than 90%. The comparison results further demonstrate that this classification method has better noise immunity and higher accuracy, which is more suitable for practical environment.
To further illustrate the superiority of time-frequency diagrams to the classification effect, PQD signals are converted to grayscale images shown in Figure 11, in which white and black are divided into several grades according to the logarithmic relationship. Here, the darker color corresponds to a larger grayscale value (amplitude). Similarly, the grayscale images are fed as input to the network in Figure 4. Table 7 and Figure 12 give the classification results of test data. It can be seen that the precision, the recall, F1, and accuracy are between 98% and 99% when SRN varies from 40 to 20 dB, which demonstrate that the proposed method has a good classification accuracy and is resistant to noise interference. However, compared with the results in Table 4, the values of four corresponding evaluation indexes in Table 7 are less than about 1 percentage point. Thus, the grayscale images can provide relatively low classification precision. This is maybe because the wavelet time-frequency diagram combines time and frequency domain information.

| CONCLUSIONS
Aiming at the problems of noise interference and too many network parameters for PQDs' classification based on deep learning, a novel approach is proposed for the classification of multiple PQDs, in which adaptive Kalman filtering and lightweight CNNs are integrated. Moreover, PQD signals are encoded to images with CWT to take benefits from the undeniable power of the two-dimensional CNN framework. Twenty types of PQDs including seven kinds of single and 13 kinds of combined PQDs are simulated to investigate the effectiveness of the proposed hybrid technique. To verify the efficacy of the proposed approach, a comprehensive performance evaluation is conducted by comparing it with several other classification methods. The main conclusions are obtained as follows.
(1) One-dimensional PQD signals are transformed into twodimensional images used for input, which allows the deep model to extract high-level fault features from the power quality signals and offers a promising avenue for exploring the application of image processing technology in the identification of both single and multiple PQDs. (2) This paper proposes the lightweight CNN with a maximum likelihood Kalman filter to classify PQDs under noisy conditions, and the hybridization of these techniques can help improve the accuracy and noise immunity with less computational burden. (3) To investigate the effectiveness of the proposed method, 20 different types of PQDs were considered in the simulation, and simulation results demonstrate that it can obtain 93.95% accuracy for classifying multiple PQDs even when the 10 dB noise level is added. (4) This study presents a comparative analysis of the proposed lightweight CNN with several CNN models, including AlexNet, GoogleNet, ResNet-50, and VGG-16, and the comparison results demonstrate that the proposed method outperforms the existing CNN models in terms of classification accuracy while maintaining a lower computational complexity. (5) The comparison between the proposed method and the existing methods shows the superiority in classification accuracy and noise immunity.
Despite extensive research, the complexity of power systems remains a challenging issue, and limits the consideration of certain complex PQDs that remain unknown. Additionally, in practical applications, noise may be heavy-tailed non-Gaussian noise besides Gaussian noise. Future work will focus on more complex PQDs under non-Gaussian conditions.