Fault detection and classiﬁcation of an HVDC transmission line using a heterogenous multi-machine learning algorithm

This paper presents a novel integrated multi-Machine Learning (ML) system architecture for the protection of bipolar HVDC transmission line in which different ML models of Support Vector Machine (SVM) and K-Nearest Neighbours (KNN) are used for fault detection and classiﬁcation. The KNN fault type classiﬁer is designed as a dual-purpose module, which not only detects the fault type but also acts as a redundant module for unsure fault declaration from the startup unit. Gradients and standard deviations of DC current, voltage, harmonic current, and a correlation coefﬁcient between the aerial and zero modes of DC current are appropriate feature vector extracted from single-end signal measurement. Overall, 154 training cases and 53 main test cases are obtained by simulating various fault and non-fault states on a ± 650 kV-1000 km Current Source Converter (CSC)– HVDC using an EMTDC/PSCAD platform. The ML modules are trained in MATLAB and tested under different severe conditions with a total of 2220 test cases. Thanks to the appropriate feature vector and the proposed system architecture, the obtained results show that the proposed algorithm is effective enough to detect and distinguish a variety of internal faults and pseudo-faults/external faults. Also, it needs low training data requirements.


INTRODUCTION
In the last few decades, two kinds of HVDC transmission which are based on different types of converters have been introduced as a solution for challenging issues like long-distance and offshore transmission, due to the power electronics technology developments.A fast and flexible control, large transmission capacity, the economic justification for distances over 500 km (depending on power electronics technology), and less occupied Right of Way (RoW) comparing to HVAC transmission [1] are some advantages of using Current Source Converter (CSC)-HVDC transmission.Like other electrical systems, protection is an integral part of every transmission system.Previously, some protection methods which had been proposed for AC systems which hammered to get adapted for DC systems.A statistical analysis of an HVDC transmission system in China shows that 36.8% of 114 valve group outages were caused by the line protection zone faults [2].Similar events in which the fault detection algorithm does not satisfy the security or depend-This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.© 2021 The Authors.IET Generation, Transmission & Distribution published by John Wiley & Sons Ltd on behalf of The Institution of Engineering and Technology ability of protection, lead to hazardous hardware damages or unwanted trip and customers' power outage as mentioned in [3].Thus, a reliable protection method can prevent wrong fault detections which leads to a decrease in the total power outage.One of the well-known protection methods is differential protection, which is a traditional solution and has been employed for AC transmission too.In [4], a commercialized HVDC line protection configuration is studied, and a differential protection unit is introduced as a backup protection unit.An improved differential protection scheme of CSC-HVDC transmission lines is proposed in [5] where a combination of a blocking unit and a defined differential current criterion is used for the fault detection.[6] introduces a signal distance between the rectifier side and inverter side currents which determines external and internal faults.The effect of capacitive current is mitigated in [7,8] by considering a delay block.Transient power and other combinations of the two-side voltage and current measurements are studied in [9][10][11].However, all the mentioned studies rely on two-side measurements, resulting in expensive and low-reliability protections.Since the function of the protection system in such schematics depends on the restricted conjunction of healthy statements of three segments (communication channel and two-side measurements), it can cost a lot to provide segments with a low probability of transition from a healthy state to a defected state.Although, keeping up a fast and reliable communication channel with a low probability of failure and delay is a matter of concern for protection schemes, it can also be used for an accurate fault location estimation [12].On the other hand, there are protections based on one side measurements.The presence of smoothing reactor and DC side filters at both ends of a CSC-HVDC link allows some methods to be implemented such as using the impedance characteristic of the mentioned elements in faulty condition [2,13,14].It should be noted that these inductors are also used to define boundaries in DC grids [15].Traveling Wave Protection (TWP), which is used as primary protection [16], can benefit from some methods such as Principal Component Analysis (PCA), Wavelet Transformation (WT), etc. [17][18][19].However, the attenuation and distortion of traveling waves caused by fault resistance and fault location should be considered.It is worth mentioning that almost all of the mentioned studies have used the concept of threshold level as a criterion that is usually obtained via worst-case studies.Usually, there is a trade-off between security and dependability in protection systems which may affect the threshold-based methods.For instance, the event that happened on March 21st, 2005 at the Tian-Guang HVDC system can be referred [20].In this paper, a new composed Machine Learning (ML) algorithm based on one side measurements is proposed that omits the use of threshold in protection.The suggested solution contains two different ML methods: a binary Support Vector Machine (SVM) as a startup unit, and a K-Nearest Neighbourhood (KNN) classifier for fault classification and redundancy task.This new combination of two different ML models through the proposed system architecture provides new insight into the contribution of integrated ML system approaches and power system problems.The article is outlined as follows: In Section 2, the characteristics of internal faults on a bipolar CSC-HVDC system are studied and the appropriate fault features for the proposed algorithm are introduced.Section 3 expresses the proposed algorithm and the principles of utilized ML models.In Section 4, the proposed algorithm is applied to the test model, the effectiveness of selected features is endorsed, simulation is carried out, and the obtained results are presented.Finally, Section 5 is dedicated to the conclusion.

DC LINE FAULT CHARACTERISTICS
The occurrence of a fault can be modelled as a step voltage source at the fault location [16], which results in some characteristics.These characteristics can be used to distinguish the faults from normal operation and other disturbances.Among them, some are more salient and contribute to perfect distinction criteria.The following subsections introduce the test system and clarify appropriate features for the proposed algorithm.

Case study area
In this paper, a ±650 kV, 12 pulses bipolar CSC HVDC link, as the study system, is simulated in PSCAD/EMTDC with a sampling frequency of 20 kHz.As illustrated in Figure 1, measurements are located at the rectifier side, after DC side filters.The Maximum current passing through the transmission link at normal condition is 2 kA.The length of the HVDC link is 1000 km, the transmission capacity is 2600 MW, and the frequencydependent model of the transmission line is adopted.

Appropriate feature vector for fault detection
Usually, the protection systems based on one side measurement, use current and voltage derivations or wavelet transforms of arriving traveling waves to detect the occurrence of fault [16,17].The characteristics of internal and out-of-zone faults for positive pole faults are depicted in Figure 2. As can be seen, the internal fault makes the current and voltage tend to vary sharply compared to the external faults.In addition, the sign of gradients can determine forward and backward faults; hence, it can be an efficient feature that specifies the fault direction and severity.It is worth noting that choosing the length of the sampling window properly can avoid protection system malfunctions because the mentioned gradient is calculated through a predefined sampling window.In addition to voltage and current gradients, it can be understood that the external faults cause less harmonic current distortion compared to internals.The reason behind this is the existence of smoothing reactors at both sides of the line where the reactors play the role of low pass filters while DC side filters provide a low impedance path for harmonic current, especially for (Kq × f ) harmonics; Kq represents a multiply of q pulse HVDC system and f is utility frequency.Figure 3 illustrates the input impedance of internal and external faults.According to the figure, it is obvious that high-frequency harmonic current made by the external faults faces a higher impedance path (almost 60 times) comparing to the internal ones.Also, the low impedance path can be seen as local minimums at the frequencies of 600, 1200, and 1800 Hz caused by the DC filters.Therefore, the presence of high-frequency harmonic currents can be considered as a criterion for distinguishing the external and internal faults.

Appropriate feature vector for fault classification
There are three types of faults in HVDC transmission lines that should be considered: Positive pole to Ground (PG), Negative pole to Ground (NG), and Positive to Negative pole (PN) faults.Since there is electromagnetic coupling between the negative and positive poles, the traveling wave signal arisen from the fault can be detected on the healthy pole too.To mitigate this problem, the phase mode transformation matrix is used to decouple the current and voltage of the two poles from each other [21].The phase mode transformation matrix can be expressed as follows: In Equation ( 2), i + 1 , i + 0 are the aerial and zero modes of the fault current of the positive pole, and i − 1 , i − 0 are the aerial and zero modes of fault current of the negative pole, respectively.Whenever a pole-ground fault such as PG or NG occurs, the aerial and zero mode components of backward traveling wave show the same polarity of variations on the faulty pole, in contrary to the healthy pole.Also, for a symmetry fault (PN), the zero mode of fault current is almost zero.By using Pearson's correlation coefficient for zero-mean real-valued samples [22], the linear correlation between the aerial and zero mode components can be obtained as below: where i 0 j and i 1 j are the fault current components for positive or negative pole respectively, and k is the length of the sampling window.Thus, the criteria of  i + 0 ,i + 1 and  i − 0 ,i − 1 can be used to determine the faulty pole.

PROPOSED PROTECTION ALGORITHM
The diagram of the proposed protection algorithm is shown in Figure 4, in which a binary SVM classifier is suggested for the startup unit.A six-sample window is chosen for calculating the gradients of pole current, voltage, and harmonic current.Simultaneously, the Standard Deviations (SDs) of pole current and harmonic current are computed.Once the SVM startup unit detects faults in the protection zone, it activates the KNN classifier by changing its output signal from 0 to 1. Then the trigger signal immediately activates the KNN classifier, and after that, another six-sample window begins to calculate the zero and aerial modes of fault currents and associated correlation coefficient.The SDs of fault current, harmonic current, and the correlation coefficients of both poles are considered as the KNN classifier inputs.According to the trained data, the KNN module predicts the type of fault.In addition to the three classes of different faults (P-G, P-N, and N-G), another class is considered as pseudo-fault which includes disturbances that can be mistaken for real faults.This additional class results in a more reliable and secure protection system by re-examining the instances with low membership scores using a different ML method.In this case, the KNN module compensates for the startup unit's probable errors and increases the total accuracy.The criterion for calculating the membership score is studied in the following subsection.

SVM startup unit
The startup unit is supposed to detect fault occurrence as soon as possible.As this unit must repeat the whole detection process for each upcoming sample, a fast response ML method should be utilized.Support vector machine is a supervised learning model, which is used for tasks such as classifications and regressions analysis [23].In this method, a line, plane, or hyperplane separates the training data regarding their labels, and a new query sample can be classified into defined classes with respect to the obtained separating hyperplanes in the training process.
To understand the SVM classifying process, consider a 2D-twoclass dataset of {x 2 are the values of jth observation along the 2D, and N is the total number of observations.According to Equation (4), there are various hyperplanes that can separate two classes correctly: where w and b are weight vector and bias term.If the training dataset is linearly separable, we can consider two parallel lines that partition the classes.With a standardized training dataset, these two hyperplanes can be described as below: The area between these two lines is called "margin", which is equal to 2 ‖w‖ .Adding the following constraint will prevent data from falling into the margin: for data belong to class 1 or The hyperplane that maximizes the margin (or minimizes ‖w‖) is the best separating hyperplane that results in minimum gen-eralization error.In the training phase, the SVM model tries to maximize margin by minimizing ‖w‖ as follow [23] max When two classes are not linearly separable as we face such a classification in this paper, the condition for optimal hyperplane can be relaxed by adding an extra term  j to the constraint, and the optimization problem can be written as Equation ( 8): where  is a slack variable that expresses how deep a data falls into a wrong class, and C is a penalty factor that explains the cost of misclassification.By solving the optimization problem with the penalty factor for misclassification as described in ( 8), the best separating hyperplane can be achieved.The higher penalty factor prevents misclassification but results in overfitting.The penalty factor C = 20 is considered in this paper through studies that are done at the ML validation phase.By applying the Lagrangian optimization theory to the maximum-margin classifier optimization problem, a dual view of maximum margin classifiers can be obtained [23], which has some salient consequences such as the capability of generalization of a linear SVM to a nonlinear one.This task is done by applying what is called the Kernel trick to a linear SVM that maps the input data as i n → i m where n ≤ m, and results in a nonlinear SVM.There are popular kernel functions like linear, RBF, sigmoid or polynomial.Since the proposed classification in this paper is nonlinear, we use the polynomial kernel in the following section as below: As we know, the SVM method is based on the statistical learning theory and the output of this method is not just limited to an estimated label [24].Once the query instance data is provided to the algorithm, the posterior probability that X query belongs to the class y i can be calculated.The posterior probability is a kind of conditional probability, which is defined for an event after observing the relevant evidence; in this paper, it explains the certainty of assigning an observed sample to a specific class.By considering P (Y i ) as the prior probability of the class Y i , the posterior probability of P (Y i |X query ) can be obtained by multiplying the prior probability and normal density function as follows: where, P (X query ) is normalization constant equal to the summation of P (X query |Y i ) P (Y i ) over all classes (i = 1, 2, … , n), and P (X query |Y i ) is the normal density function with mean  Y i and covariance Σ Y i as expressed in (11): The value of P (Y i |X query ) that we call it "membership score", can be an appropriate probabilistic criterion to be utilized to activate the fourth class for the KNN method and let KNN double-check the occurrence of the fault via a different ML method.This scheme will overcome the low certainty caused by the SVM method and increase the total security by redundancy.It is worth noting that SVMs are considered as eager learners, where explicit learning is done.Unlike lazy learners, eager learners do not postpone the learning process until the appliance of a query instance.although eager learners need more time for the training process, they predict query data class rapidly.This is the most important reason to use an eager learner for a startup unit where a fault prediction for every upcoming sample of measuring is needed.

KNN classification unit
KNN is a lazy learning model, which is based on the principle that the set of instances with the same properties occur close to each other in an N-dimensional space, where N is the dimension of features describing the position of instances [25].Once the instances are labelled corresponding to their classes, the label of a query instance can be predicted by observing the most frequent label among the K-nearest instances, where K is a positive integer.A majority voting scheme for predicting the query sample can be explained mathematically as follows [26]: where  is a binary index depending on Ω v (class labels set) and C z (closest neighbour's label).There are several distance functions used to determine the distance between the query instance and other training instances.Also, the accuracy of the model can be improved by choosing an appropriate distance function [27].
In this paper, we use the Euclidean distance function to calculate the distance between two instances as below: where A and B are two different instances and N is the dimension of the feature vector.To prevent an exhaustive KNN search, a KD tree search algorithm is proposed in this paper [28] that leads to a fast protection system.

SIMULATION STUDIES AND ANALYSIS
In this section, in addition to the total algorithm study, each unit is also studied independently during the training and testing phases.

SVM model simulation and training validation
As noted in Section 3, the reason for utilizing the SVM binary classifier for the startup unit is its type of learning, which is regarded as an eager learner.A 3rd order polynomial kernel as expressed in Equation ( 9) is used to map 6D input data.Training data includes 154 cases of which ten cases are dedicated to nonfault and pseudo-fault disturbances, and the other 144 cases are related to three different fault types, 12 locations ranging from 10 to 990 km, and the resistance range of R = {2,50,150,300}Ω.Since the visualization of a 6D scatter plot of input features is impractical, only three dimensions of them are shown in Figure 5, depending on the fault type.With Regard to the scatter plot of SVM inputs, salient discrimination between faults and non-faults instances is noticeable.This significant distinction is due to a smart choice of an input feature vector that helps to increase the startup machine accuracy and improve its efficiency [29].For training performance validation, a tenfold cross-validation model of the SVM module is built and the confusion matrix is formed.As illustrated in Figure 6, a 3rd order polynomial kernel with a 6D feature vector can contribute to a perfect binary classification with 100% accuracy in training validation.

KNN model simulation and training validation
Upon the startup unit detects fault occurrence, it activates the KNN classifier and correlation calculator unit by a logical signal.The correlation coefficient calculator begins to compute Pearson's correlation between zero and aerial modes of currents in a new 6-sample frame.Pre-calculated SDs of fault current and 300-4800 Hz harmonic current, which were obtained simultaneously with SVM input features, and Pearson's correlation coefficients, are considered as KNN module input feature vector.Since the KNN algorithm computes Euclidean distance between the instances, using per-unit quantities is a basic requirement for utilizing the KNN algorithm to prevent overwhelming the effect of other features by wide range attributes [30].The SDs of the mentioned features are calculated as below: where, N is the total number of observations in a sampling frame, and x represents the mean of observations.A four-class KNN model is well trained with a 6-D feature vector with the same training data, which was used for SVM.The considered pseudo-fault class for the KNN module prevents the whole algorithm from the wrong line trip command due to the probable mistake of the SVM module.For faster response, KD-tree is employed for the searching method as mentioned in Section 2, and the Euclidean distance function is chosen.One of the challenging parameters of the KNN algorithm is the number of neighbourhoods (K).Depending on the KNN model, input features, and the number of classes, KNN accuracy can be ameliorated by changing the number of neighbours.Further studies have been carried out with the K = 5 assumption and the confusion matrix as represented in Figure 7 endorses an accurate training validation for this assumption.

Unit tests and results
For the test phase, each unit has been tested with 53 cases including 40 fault cases and 13 non-fault data, which constitute one-third of the training data.It is important to note that the mentioned ratio of training/testing data is a commonly accepted criterion in ML issues; however, we go further and examine our proposed method with a total of 2220 test cases (636 cases for Tables 1-3 and 1584 faraway-high impedance faults for Table 8).
In addition to the confusion matrix, Table 1 is listed to contribute detailed information.It is to be noted that "F", "N", "T /NT" and T d refer to "Fault", "Non-fault", "Trip/Not Trip" and detection time, respectively.The mentioned redundancy caused by the 4th class of the KNN module compensates for the lack of accuracy of the startup unit in the case of 8/20 μs-100 kA positive lightning and increases the total accuracy of the protection system.In addition to the algorithm process, every pseudo-fault case is also applied to the KNN unit independently and the results of prediction are written in parentheses in Table 1.As a brief study, the whole algorithm is considered as an integrated protection system with its all units, and the related confusion matrix is presented and Figure 8.

Effect of sampling frequency
Reducing the sampling frequency will lose the momentary features of the signal.The proposed algorithm is applied to 265 test cases with lower sampling frequency and the results are depicted in Table 2.
As can be seen, the total algorithm cannot satisfy the security of protection completely in lower sampling frequency, as it mistakes lightning strikes as an internal fault.Since the duration of the sampling window is noted in Table 2, the effect of the numbers of samples in the sampling window which is known as the length of the sampling window is separately studied in  further subsections to avoid any mistaking of the sampling window duration for the sampling window length.All simulations are done without consideration of line surge arresters due to considering the worst-case study.Thus, adding surge arresters to both line terminals can improve the efficiency of the algorithm and mitigate the errors related to lightning strikes.It is worth mentioning that the misclassified test data in Table 2 are related to the lightning strikes where the algorithm fails to detect and distinguish lightning from faults.So, the security is the only parameter of the protection system which gets undesirable by decreasing the sampling frequency, while other parameters such as dependability, speed, and selectivity remain unchanged.

Effect of noise
Additive white Gaussian noise (AWGN) is a wideband noise with constant power spectral density at different frequencies, which is arisen due to many random processes that occur in a realistic environment.To evaluate the protection algorithm in a practical situation, AWGNs with different Signal to Noise Ratios (SNRs) are applied to the test data.The results are shown in Table 3, which endorse the protection algorithm's effectiveness in noisy environments.Since, this algorithm is based on one side measurements and does not need a communication channel, the effect of Bit Error Rate (BER) is not considered.All misclassified test cases in Table 3 are related to lightning strikes as mentioned in Section 4.4.

Effect of different operation and control modes
The operating point of converters is achieved by intersection of the control steady-state characteristics, consisting of two different segments for each converter: Constant Ignition Angle (CIA) and Constant Current (CC) for the rectifier, and Constant Extinction Angle (CEA) and Constant Current (CC) for the inverter.As it is needed to change the direction of power flow, each converter is expected to act as a rectifier or inverter depending on the power flow direction.So, adding CIA and CEA segments to the inverter and rectifier control characteristics, respectively, leads to an actual bi-directional power flow HVDC link.The operating point is achieved by intersecting the rectifier and inverter control characteristics.Under normal operation with enough AC voltage behind the rectifier valves, the operating point takes place at the intersection of the CC segment of the rectifier and the CEA segment of the inverter in V dc − I dc control characteristics as it is denoted by G in Figure 9.In this condition, the rectifier controls the DC current by changing the ignition angle α, and the inverter controls the DC voltage, namely Current Controlled by Rectifier (CCR).On the other hand, under reduced AC voltage condition, the operating point happens at the intersection of the reduced CIA segment of the rectifier and the CC segment of the inverter, where the inverter and the rectifier control the DC current and voltage, respectively.This point is denoted by G' in Figure 9.This is corresponded to α min what we call Current Controlled by Inverter (CCI).In CCI mode, the rectifier operates at its minimum allowed α value.All of the previous test cases in this article were done by applying faults at 0.4s where the system was operating in the CCI mode ( ≃ 6 • ).To study the effect of mode shift on the protection algorithm, the AC voltage of the rectifier is increased by a tap changer, and the operating mode is switched to CCR as depicted in Figure 10.Then the fault is applied to the system in CCR mode.It should be noted that in the first instants of fault inception, thyristor ignition angle remains unchanged because of the time constant and the delay of control loop [31].At the Section 4.8, Table 8 contains the protection system results with the consideration of the mode shift.

Effect of window length
As a principle of machine learning the process of feature extraction for training and test data set should follow the same rule and formulation [33].At the first steps of designing a machine learning model, some assumptions should be considered.One of these assumptions in this problem is the length of the sampling window.So, to ensure the scientific validation of studying the effect of the sampling window length on the model, it should be applied to the training set too. it is to be noted that these models will be different from what we examined in earlier parts of the paper and they will have different hyperplanes equations

SVM training validation
By using 10-fold cross-validation, SVM model validation with the same training data through five different windows lengths (4, 6, 8, 10, and 12 samples) is shown in Table 4.
As it is illustrated, tenfold cross-validation for the SVM model shows that they have similar deterministic results for different windows.But from SVM theory we know that these models benefit from different class-separating hyperplanes.by defining margin as it was introduced earlier, the model with a higher priority can be selected.The width of the margin is equal to the difference between the classification score for the true class and the classification score for the false class.Since we are using binary classification, the margin is equal to twice the classification score as expressed in Equation ( 16).In addition, scores are If the true label of x is the positive class −1 otherwise (16) where ( 1 , … ,  n , b) are estimated SVM parameters (known as alpha and bias) and G (x j , x) is dot product in predictor space (according to the utilized kernel, where using a 3rd order polynomial kernel in 6-D feature vector results in = 84 dimensional predictor space).Also, f(x) which is called distance, expresses that how far the instances are placed from separating hyperplane and m is the margin.adding y to the margin equation as above leads to better visualization and depicts all instances in a positive axis.By this formulation, misclassified instances provide a negative margin.calculating the margin for different lengths of sampling window leads to Figure 11 which represents the width of margin in a standardized space.Classification margin helps to compare different machine learning models with each other.It means that the machine learning model with a higher margin is preferable to other models using the same data in the training phase.It should be noted that we are using SVM in a standardized space, where the mar- gin for each class is almost 1 (total margin width of 2).Since in standardized space the margin does not provide a comprehensive comparison (because they are almost the same), another criterion can be used to evaluate the models.
The weighted average of distances to separating hyperplane which is called "classification edge" can represent a yardstick to appraise the obtained models.Similar to margins, models with higher classification edge provide a better prediction.As mentioned, the classification edge is a weighted average of distances to the border, where we can exert our prior knowledge about faults and disturbances as weights.The prior knowledge can be involved in form of prior probability.For example, [34] has stated that positive lightning strikes, are rare and constitute less than 10% of all lightning strikes.Since the statical study about fault type probabilities in HVDC transmission systems is out of our scope, we considered the same weight for instances.Calculating the distance of f(x) and its average value, and m for different SVM models results in Table 5.
The outcome of Equation ( 15) for the training dataset and weighted average of each class is depicted in Figure 12. it is worth mentioning that the visualized margin and distances do not literally exist in the way that they take place in classical protection algorithms via their associated threshold values.

KNN training validation
To evaluate the KNN model with different sampling windows, doing the same procedure which is expressed in Section 4.7.1 is needed.But it must be noted that the classification margins and edges follow different formulation from the SVM unit. in the  KNN algorithm, the margin is defined as the difference between the classification score for the true class and the maximum classification score for the false classes.And the score is equal to the number of neighbours with that classification tag divided by the number of nearest neighbours (k = 5).Except for the 4-sample window which is illustrated in Figure 13, the other KNN models provide 100% accuracy in the validation phase.Figure 13 obviously shows that the result of the 4-sample window for the KNN model is not acceptable.Further parameters are listed in Table 6.For better understanding, consider the 12-sample window for the KNN model.The minimum margin in the mentioned model describes that three out of five neighbours, have the same true class tag, and the two other neighbours are labelled similarly as another class.According to the majority voting scheme, 3/5 of neighbours have voted to a class (which is the true class) while 2/5 of neighbours have voted to another class (which is a false class), And the difference between 3/5 and 2/5 is equal to 0.2 as given in Table 6. it should be noted that voting to a false class will provide a negative score.

Total algorithm with test dataset
In this section, each model is tested with the main test dataset (53 cases) and as an example, the confusion matrix of the total  algorithm for the 4-sample window which does provide an accurate prediction is represented in Figure 14. in addition, Table 7 provides the accuracy and response time of models.Although these models could have more accurate predictions by tuning hyperparameters of the associated model in the validation process, we did not tune the hyperparameters of each model to consider the effect of the sampling window solely with the same hyperparameters as we used for the main model (6-sample window).Doing the same distance measurement according to (15) for test cases results in Figure 15. it is worth mentioning that the transmission link is operating in the normal state at first, and the fault instances move from the normal region to the fault region.This movement has a dynamic.the algorithm makes its decision as soon as the instances pass the decision border.As expected, the first instant in which the algorithm declares the fault leads to a doubting decision (since the instance is placed in margin), but in a few samples, it reaches its maximum assurance.
In addition to kernel trick, implicit mapping, and the optimal hypothesis guarantee, another reason for our algorithm superiority is our system architecture which can be considered as "ensemble learning".Ensembled methods use multiply learn- ing algorithms to perform better predictions than individual machine learning algorithms [29].Using the different learners as we did in the manuscript results in heterogenous ensembled learning.The main reason for using different machine learning methods is to enhance total performance since each learner has its own advantages and disadvantages.They learn the generalization rule for observations (training set) from a different perspective which can act as complementary components and construct a bigger and more accurate learning system.
To prove the claim above, consider Figure 15 where a pseudofault (lightning strike) is wrongly classified into the fault category.As it can be seen, although there are fault instances with lower decision assurance than the misclassified instance (e.g., instance 40), the total algorithm does not mistake faulty instances for misclassified normal ones.So, it can be said that the manner of the proposed algorithm totally differs from other single-end and threshold-based methods where if our proposed algorithm followed the same rule as other thresholdbased methods do, three fault instances of 12,13 and 40 would be more likely to be misclassified and considered as disturbances rather than instance No.52.(since they are closer to the decision border than the misclassified instance of 52).

Effect of Faraway-High Impedance faults with the consideration of operation modes, presence of noise, and reduced sampling frequency simultaneously
As mentioned in Section 4.1, the training dataset includes the fault impedance of {2, 50,150,300} Ω, and 12 locations covering the range of 10 to 990 km.The test dataset related to faults shown in Table 1 was generated randomly including the whole location of the transmission line and fault impedance ranging from 1 up to 500 Ω.
In this subsection, high impedance-faraway test cases are provided and tested with considering the presence of noise and down-sampled data in both CCI and CCR modes.Note that each case is tested several times to get a comprehensive result due to the random nature of white Gaussian noise.The accuracies shown in Table 8 are the average value of extensive algorithm run over 11 cases for each type of fault covering the fault location from 990 to 1000 km by step of 1 km.

Highlighted preferences of the proposed algorithm
• As mentioned in Section 1, the majority of protection algorithms are based on threshold concept in different ways such as [16,19,35] with traveling waves, [2,13,36] via threshold on derivations on current and voltage, and [5,7,8] using the differential current method.The main question on the threshold is about the process of choosing an appropriate value; most of the researchers have selected it according to the worst-case study.In this research, we use no threshold value, and the task of classification and detection is fulfilled by comparing the similarities between query data and training dataset.• Other research projects such as [37][38][39] have used ML for fault detection and classification too, but the main contribution of this paper and the proposed algorithm is a new insight into integrated ML system architecture where a combination of two or more different ML models through a new system architecture can provide salient capabilities as it can be seen in [40].We consider different ML models for detection and classification and connect them with an appropriate property which results in ensembled learning and a more accurate protection model.An SVM model is used for the startup unit because of its classification manner.For the startup unit a fast decision is needed for every upcoming sample continuously, and the SVM is an appropriate choice as it is regarded as an eager learner and its superiority to ANN models in such problem is reviewed in [41].KNN model is used for the fault type classification module because it is not being activated continuously (mostly operating in idle phase) and the fourth class of this module (non-fault class) makes a redundancy, which compensates the startup unit's probable error by a different ML model.comparing to machine learningbased protection algorithms such as [38,39,42,43] which just covers dependability of protection (since all test cases are in-zone faults), our results also show the examination of the security aspect of protection reliability too.Although the reference [44] is proposed for VSC based HVDC systems, the results are only limited to fault detection with an accuracy of 86.5%.We have studied the outranged fault resistance and locations comparing to the training dataset (training set up to 300 /990 km and test set up to 500 /1000 km), while the fault resistance for the training dataset in [44] covers the range of 0 to 1500 Ω while the test dataset is examined up to 1000 Ω.The rare and severe lightning strikes which constitute less than 10% of cloud-to-ground lightning strikes [35] are successfully detected and classified in the pseudo-fault category.One more step to challenge our proposed algorithm is that the lightning strike is applied to the power conductors instead of overhead earth wires without any lightning surge arresters in the line.In addition to the lightning strike simulations, many fewer training cases requirement comparing to [37] is highlighted and the proposed algorithm successfully identifies converter faults and reactor faults as an external fault while there are errors in [37] in such cases • Choosing a well-separated and suitable feature vector can improve classification accuracy and less training data is needed.Thus, a comprehensive study on appropriate features leads to selecting distinctive attributes, which can provide well-separated classes.• This paper tries to challenge the protection algorithm with a variety of internal-external faults and pseudo-faults such as severe lightning and valve faults, which cannot be seen in other papers like [7,9,11,45].• The high detection speed is highlighted in similar research, where detection time (from the occurrence of fault) reaches 4.2 ms for faults at L = 1000 km and R f = 500Ω.

CONCLUSION
This paper proposes a multi-machine learning based protection for CSC-HVDC transmission lines which is formed by two different ML modules.An SVM model is employed as a startup module since it is considered as an eager learner, which needs less time to classify the queried sample with a 6D feature vector containing the gradients of DC current, harmonic current, and DC voltage of both poles.The output of the startup unit includes a trigger signal and posterior probability of predicted labels.These two outputs are applied to a 4-class KNN module, which is responsible for fault type classification and acts as a redundant unit for the startup unit in the case of the low posterior probability of query samples.The feature vector utilized for the KNN module contains the correlation coefficient of zero mode and the aerial mode of DC current as well as the standard deviations of DC current and harmonic current of both poles.A total of 154 data including different internal faults and external/pseudo-faults cases are used for the training module.The training phase is validated by a tenfold crossvalidation technique.Then the algorithm performance is tested with 53 test cases under different conditions such as different levels of noise, sampling frequency, and operating modes (totally becomes 11 × 53 test cases) and different lengths of the sampling window.Additionally, numerous high impedance-faraway faults locating at 990-1000 km of transmission line with R f = 500Ω are simulated and tested under noisy conditions, operating mode, and down-sampled frequency simultaneously, while the training dataset has covered fault location from 10 to 990 km and the fault impedance range of 2-300 Ω.It is worth mentioning that the protection process based on the proposed algorithm does not suffer from any communication channel failure and delay as it uses one side measuring signals.The obtained results endorse the accuracy and speed of the algorithm in which maximum detection time is as low as 4.2 ms.The high performance of the proposed algorithm is due to the selection of well-separated attributes as the elements of the feature vector that bring a salient distinction between fault and non-fault classes.Also, using a heterogenous ensembled learning scheme provides a more accurate model than a single machine learning approach.there is no direct use of threshold value in the mentioned algorithm and the process of detection and classification is based on the similarity of gradients and standard deviations between the signal of query samples and trained cases.

FIGURE 1 FIGURE 2
FIGURE 1Schematic circuit of the studied bipolar CSC-HVDC system[21]

FIGURE 3
FIGURE 3 Characteristics of the external and internal fault input impedance

FIGURE 4
FIGURE 4 Schematic of the proposed protection algorithm

FIGURE 5
FIGURE 5 3D scatter plot of the input features

FIGURE 6
FIGURE 6 SVM module confusion matrix of training validation

FIGURE 7
FIGURE 7 KNN module confusion matrix of training validation

FIGURE 8
FIGURE 8Total protection algorithm confusion matrix for the test case

FIGURE 9
FIGURE 9Steady-state control characteristics of an Actual converter[32]

FIGURE 10
FIGURE 10 Ignition angel in CCI and CCR modes without fault

FIGURE 11
FIGURE 11 Classification margin for SVM unit

FIGURE 13
FIGURE 13 Confusion matrix for the KNN model with the 4-sample window in the validation phase

FIGURE 14
FIGURE 14 Confusion matrix for the total algorithm with the 4-sample window model in the test phase

FIGURE 15
FIGURE 15 Distances of test instances from decision border

TABLE 1
Detailed specification of the test cases

TABLE 2
Accuracy of units and protection algorithm in lower sampling frequency

TABLE 4
Confusion table for SVM unit in the validation phase

TABLE 5
Evaluation parameters for different SVM models FIGURE 12 Distances and edges of instances

TABLE 6
Evaluation parameters for different KNN models

TABLE 7
Evaluation parameters for the total algorithm with different lengths of the sampling window

TABLE 8
Accuracy of detection for high impedance_faraway faults in different situations