A method for battery fault diagnosis and early warning combining isolated forest algorithm and sliding window

The vehicle's power battery is composed of a large number of battery cells series or in parallel. Due to the manufacturing process error and the different use environments, there are differences between the battery cells, and the battery pack will have inconsistency problems, which will increase the safety hazard. Therefore, it is of great practical significance to identify and warn about the inconsistency of power batteries. Based on the data of the internet of vehicles platform, this paper proposes an improved isolated forest power battery abnormal monomer identification and early warning method, which uses the sliding window (SW) to segment the dataset and update the data of the diagnosis model in real‐time. The scores of normal battery cells and abnormal battery cells were analyzed, and then the fault threshold was determined to be 0.75. The results show that the recall ratio and precision ratio of the algorithm are 0.91 and 0.95, respectively, which is more suitable for inconsistent battery cell identification than other methods. If the SW size is 15, the warning effect is the best. Before the vehicle alarm occurs, the algorithm can realize early fault warnings, thus effectively avoiding the safety problems caused by inconsistency faults.


| INTRODUCTION
Lithium-ion batteries are widely used as power sources for new energy vehicles due to their high energy density, high power density, and long service life. 1,2However, it usually requires hundreds of battery cells in series and parallel to meet the requirements of pure electric vehicles for mileage and voltage. 3The differences caused by the initial production process will be continuously amplified in the subsequent use process; temperature, vibration, and mechanical stress can also affect the performance of the battery, resulting in inconsistency failure of battery cells, seriously affecting battery life, and even affecting vehicle safety. 4,5The inconsistency of the battery cell has great harm, in a series module, there is a short board effect in the charging and discharging process of the battery pack, 6 that is, the battery cell with the smallest capacities determines the maximum capacity of the whole battery pack, which leads to the waste of the remaining battery capacity.With the aging of the battery, the performance of the battery will decrease.During each charge and discharge cycle, the battery cells with smaller capacity are always charged and discharged to the maximum extent, so the service life of these batteries will be seriously reduced, which will eventually lead to scrapping with other battery cells of the module. 7,8The different heat generated by the battery cells with inconsistent internal resistance will make the temperature field around the battery pack more uneven.At the same time, the uneven temperature field will aggravate the inconsistency of the internal resistance of the battery, form negative feedback, and continue to lead to the increase of inconsistency. 9Therefore, it is of great practical significance to carry out inconsistent fault diagnosis of power batteries and identify abnormal cells in real-time to reduce the harm to vehicles.
][12][13][14][15][16][17][18][19] Karden et al. 12 analyzed the impedance spectrum of the battery.By measuring the impedance spectra of nine batteries, they believed that the impedance inconsistency was caused by manufacturing temperature and historical cycles.Can et al. 13 used factor analysis and fuzzy clustering method to analyze the collected acoustic emission signals of lithium-ion batteries in time domain.Based on the Kmeans algorithm, Lin et al. 14 realized the online identification of lithium-ion battery pack inconsistency by using voltage outlier, voltage range and differential pressure as characteristic variables.Eldesoky et al. 15 examined the impact of depth of discharge (DOD), C-rate, upper cut-off voltage (UCV), and temperature on the lifetime of cells, the results show that the capacity loss increased slightly with DOD and C-rate, capacity retention and impedance control are related to UCV.Temperature inconsistency is the main factor affecting the inconsistency between monomers except for manufacturing reasons.The temperature inconsistency fault of power battery is also one of the main causes of electric vehicle fire accidents. 16Ouyang et al. 17 quantitatively selected the EIS features that are not sensitive to SoC, adopted the internal temperature estimation method based on EIS, and used support vector regression to estimate the temperature.Haimin et al. 18 believes that temperature seriously affects the capacity and service life of lithium-ion batteries.Lower temperatures will lead to battery degradation, while higher temperatures will trigger thermal runaway.The most suitable operating temperature for the battery is 25-40°C.Wu et al. 19 established a hybrid battery thermal management system based on heat pipe, microchannel liquid cooling plate and phase change material, which effectively improved the uniformity of temperature distribution, which can reduce temperature inconsistency.
][22][23][24][25][26][27][28] Gasper et al. 23 used machine learning-assisted model recognition methods to predict battery life, with uncertainty quantified by bootstrap resampling, and the uncertainty of life prediction is greatly reduced.Xia et al. 7 established a model for short-term capacity estimation and long-term remaining useful life prediction of lithium-ion batteries based on data-driven methods.The results revealed that the method could effectively overcome the phenomenon of lithium-ion battery capacity regeneration and inconsistency.Zhao et al. 24 proposed the 3σ multi-level screening strategy to diagnose the inconsistency of battery voltage data, which can eliminate the influence of outliers in the data and find out the ideal center point.Cadar et al. 25 proposed to use the voltage difference of battery cells to control voltage consistency and realize fuzzy equalization.Kang et al. 26 proposes a method based on cross-voltage measurement and statistical analysis.The voltage of a single battery is reflected on two voltmeters, and the voltage sensor fault, connection fault and short-circuit fault of the battery pack are analyzed by correlation coefficient.Wang et al. 27 proposed a spatial clustering method that combines Mahalanobis distance with density-based noise to comprehensively evaluate the multi-parameter inconsistency of the battery system.The adaptive least square method is used to reduce the fluctuation phenomenon in model parameter identification, the results show that the method has high robustness and is suitable for practical application.Deng et al. 28 used the multi-classification support vector machine to map the original data to the highdimensional space, and segmented the data through the hyperplane of the high-dimensional space to determine whether a fault occurred.There are few types of research on online fault identification and early warning.Gao et al. 29 proposed a fault warning method for the electric vehicle charging process based on the adaptive deep belief network by combining the accelerated adaptive moment estimation algorithm, deep belief network and Pearson coefficient.This method realizes voltage fault warnings during electric vehicle charging.Yu et al. 30 proposed a fault diagnosis method based on an improved model with voltage as input and current as output is proposed to detect current sensor faults.The similarity of vectors is defined by an exponential function, which can effectively measure the regularity of time series and is less affected by noise.However, this method is subjective in the selection of ambiguity function.Hong et al. 31 proposed an entropy-based battery voltage anomaly detection method for electric vehicles.Through the online monitoring of the real-time voltage fluctuation data of the vehicle, the abnormal cell was identified under the analysis method based on Shannon entropy.Qiu et al. 32 used the difference between the theoretical maximum entropy of the random variable X and the actual Shannon entropy to measure the inconsistency, but the rationality of threshold selection needs to be considered when using this method for inconsistent fault diagnosis.
4][35][36] Due to the large frequency and amplitude of current changes during startup and disconnection, it is difficult to extract fault features from it, and the accuracy of identification is low.Many faults of the battery will eventually lead to an increase in temperature, but the battery usually has a heat dissipation system.The temperature change is small at the initial stage of the fault, and the temperature rise takes a certain amount of time.Therefore, the use of temperature to identify the fault response speed is slow.An obvious feature of overcharge and overdischarge faults is that the voltage of the battery is higher or lower than the cut-off voltage.Internal short-circuit fault will lead to voltage reduction, and in the later stage of internal short-circuit, the voltage will even drop to 0 V.When the internal temperature of the battery is too high, the separator will melt and cause internal short circuit, and then the battery voltage will decrease.Therefore, the voltage will respond to most of the battery faults.
In summary, the current research on the inconsistency of power battery cells mainly focuses on the identification of abnormal cells.Although these methods can identify abnormal cells (often including normal cells) to a certain extent, there are few studies on the early warning of abnormal cells and most of them are more complex.Therefore, this paper proposes a power battery abnormal monomer identification and early warning method by combining isolated forest (IF) algorithm with sliding window (SW).To make the model simple and efficient to identify and warn the fault cell, the voltage is selected as the main identification parameter.The proposed method uses a SW to divide a series of time series data into multiple subdatasets.Each subdataset is equivalent to a period.The subdatasets flowing into the SW are used to construct an IF diagnosis model separately.According to the scores of normal samples and abnormal samples identified by the IF model, the threshold is determined by mathematical statistics to achieve a more intuitive analysis of sample data.According to the score of each window diagnosis, it is judged whether the monomer of the time is abnormal.When the battery is abnormal due to overcharge, overdischarge, internal short circuit and other reasons, to realize the identification and early warning of the abnormal monomer.
The outline of this article is as follows.Section 2 introduces the basic principles of SW and IF algorithms, respectively.In Section 3, an abnormal cell identification and early warning method of power battery based on the IF are proposed.Section 4 introduces the data used in this paper, determines the fault threshold, obtains the anomaly recognition effect and early warning effect, and completes the model verification.Finally, the conclusion is discussed in Section 5.

| CORRELATION PRINCIPLE 2.1 | IF algorithm
IF is a typical anomaly detection algorithm using integrated learning strategies. 37The algorithm constructs and fuses multiple subdetectors to obtain better detection performance, with relatively low complexity and high accuracy.The principle is to continuously cut the dataset until each data becomes an isolated point.In an IF, outliers are considered to be easily isolated points, that is, normal data needs to be cut multiple times to be identified, and outliers are easily cut and identified.
By comparing the path length of the distance root node with the standard value when each point is isolated, it is judged whether it is an outlier.The main steps of the IF algorithm are as follows: Step 1: Multiple subsamples are randomly selected from the preprocessed dataset as the sampled dataset and put into the root node.
Step 2: Randomly specify a dimension from the sample, and randomly generate a cut point p to cut the sample (the cut point exists between the maximum and minimum of the current node dimension).
Step 3: The cut data will be divided into two parts.The dimension less than p will be placed on the left side of the child node, and the dimension greater than p will be placed on the right side of the child node.
Step 4: Repeat Step 2 and Step 3 on the child node, and then keep dividing the data until no new child nodes can be divided.
The algorithm applies an abnormal score to determine whether the data is an exception point, the exception score is defined as follows: where n is the number of samples, x is the sample point, E h x ( ( )) is the average path length mathematical expectation of x, and the average path length is as follows: Here, H(n) is a harmonic number, H n n ξ ( ) = ln ( ) + , ξ is Euler constant.
In Equation ( 1), the range of S is between 0 and 1, the closer S is to 1, the greater the possibility of abnormal data points.When S is less than 0.5, the data is normal.When S = 0.5, it can be considered that the data has no obvious abnormal characteristics, which remain to be observed.
In the practical application process of the IF algorithm, the model involves several important parameters, as shown in Table 1.In Zhao et al., 24 it is pointed out that the accuracy of IF is close to convergence when the number of parameter trees (n_estimators) is 100 and the number of subtree samples (max_samples) is 256.For the other two parameters, this paper uses the grid search method to optimize the parameters, and the grid search method uses the principle of exhaustive method to find the parameters.This method needs to set the range of parameters in advance according to experience, test each set of parameters within the selected range, and the best parameter is the final result.Finally, it was found that the performance of the model was the best when the max_features and contamination of the subtree were set to 10 and 0.05, respectively.

| Sliding windows
SW is a data structure and algorithm problem-solving technology, which is suitable for the application of arrays or lists.It can convert the nested loop problem into a single loop problem and reduce the time complexity and be used to access online information.When a SW is used to access online information, as the window slides, the old data is removed and the new data is added to update the data. 38This approach reduces the complexity of the problem and can also realize the online detection of data.Generally, a piece of data of length n is represented as, X y t y t y t = ( ( ), ( ), …, ( )), where y(t) is the data collected at time t.
In the process of data segmentation, the parameter (w, s) should be set in advance, where w is the segmentation length of the SW, s is the moving step size after one segmentation, continuous sliding (n − w)/s times to form multiple equal length subdata fragments.The SW principle is shown in Figure 1.

| INCONSISTENT MONOMER IDENTIFICATION AND WARNING METHOD FOR POWER BATTERIES
The IF uses two quantitative features of abnormal data: First, the abnormal data accounts for a relatively small number, which is a small number of points.Second, the values of some characteristic attributes are obviously different from normal data, it is easier to be isolated than normal data points. 39When an inconsistent fault occurs, the voltage of a battery is different from that of a normal cell and the number of abnormal cells in a battery pack is usually small, which meets the data requirements of the IF algorithm.Therefore, it can be combined with the characteristics of low complexity and high efficiency of IF algorithms to detect abnormal monomers.
In this paper, an abnormal cell identification and early warning method based on SW and IF algorithm is proposed for the cell voltage data of battery pack.The IF diagnosis model is obtained through training of historical data, and a SW is created to segment the dataset.When the window slides, the model will be triggered to recalculate the data under the window.After each calculation, new data will be added to the model to realize the update of the diagnostic model.The proposed method in this paper consists of the following steps, and the flowchart is shown in Figure 2.
1. Determine the threshold.Taking the historical voltage data of power batteries as the dataset, the IF diagnosis model was built, the scores of normal samples and abnormal samples were calculated, and the scores were statistically distinguished, to obtain the score intervals of different samples, and the abnormal threshold was set according to the score boundaries.

| MODEL VERIFICATION 4.1 | Data introduction
According to the standard GB/T 32960 "Technical specifications of remote service and management system for electric vehicles," new energy vehicle companies must upload vehicle data to the big data vehicle networking monitoring platform, including fuel cell data, drive motor data, vehicle data, extreme value data, alarm data, and vehicle location data.During the start-up process of the vehicle, the sensors on the vehicle are always detecting the vehicle data, which can monitor the running status and various data of the online vehicle in real-time.The platform not only accumulates a large amount of historical data, which can provide a training basis for battery inconsistency cell detection algorithm, but also accesses real-time data to provide data basis for online early warning of inconsistency cell.
In this paper, the vehicle data provided by Jiangling Motors Co. Ltd is taken as the research object, and the identification and early warning of abnormal battery cells are carried out.In this paper, the data of multiple new energy vehicles of the same type are analyzed, and the receiving frequency is 20 s once.The original data contains 89 contents, including battery cell voltage, total voltage, total current and other relevant data in this paper.The basic information of the packet is shown in Table 2.
3 shows the charging cell voltage curve of the alarming vehicle and the non-alarm vehicle for 1 month, where Figure 3A is the monomer voltage diagram of the alarming vehicle, and Figure 3B is a partial charging data fragment intercepted from Figure 3A.It can be seen from Figure 3 that the voltage fluctuation of each cell is stable during the early charging of the alarming vehicle, and there is no outlier battery cell.However, in the subsequent charging process, the voltage of the No. 55 monomer has an obvious outlier phenomenon, and with the increase in charging time, the degree of outlier is increasing.It can be considered that the monomer has an inconsistency problem.In Figure 4, it can be seen that the voltage of each cell of the battery of the non-alarm vehicle is similar under multiple charging, and there is no obvious abnormality.In summary, a certain cell of the alarming vehicle will have a voltage outlier phenomenon, and the difference between the voltages of different cells reflects the degree of inconsistency to a certain extent, so the normal and abnormal cells can be identified by the algorithm.

| Threshold determination
The IF is an unsupervised anomaly detection algorithm.There is no clear boundary to distinguish normal values from outliers, so threshold values need to be set to make the diagnosis more clearly.In this paper, the threshold is determined by statistical analysis of the score interval of normal samples and abnormal samples.Twentyseven vehicles were randomly selected from normal vehicles (more than 1000) as non-alarm vehicles (more than 10,000 km), and 27 vehicles were randomly selected from vehicles with inconsistent fault alarms (more than 100) as alarm vehicles (more than 10,000 km).To ensure that the extracted sample data is representative, the method of random extraction is as follows: the samples are divided into three groups according to the size of the data volume.According to the number of charging times, each group of samples that have been grouped is subdivided into three groups, a total of nine groups, and three vehicles are randomly selected from each group to form 27 vehicles.The battery voltage data in the vehicle data is extracted for threshold determination.Table 3 lists the batteryrelated parameters.The battery cell voltage data of these two types of vehicles are input into the IF model for scoring calculation.The statistical results of the scores are shown in Figure 5.It can be seen from the diagram that there is a demarcation phenomenon at the score of 0.75, among which the scores of three vehicles are doubtful and within the acceptable range.Therefore, the threshold score T = 0.75 is calculated by the IF in this paper.Anyone who exceeds the threshold is identified as an abnormal monomer.The higher the score, the more obvious the anomaly.

| Abnormal battery cell identification
After the threshold is determined, the variation of the score of the monomer can be used to determine whether it is abnormal or not.The alarm vehicle A is selected as the case analysis to extract the charging voltage data of the vehicle in the month of alarm.The voltage diagram of some alarm times is extracted, as shown in Figure 6.First, the data is preprocessed, invalid values are eliminated, and then the charging voltage data before the alarm is selected as the input data of the model.Second, the data is divided by a SW, the size of which is (w, s), in which w and s are preset as 20.w represents the window size of a calculation, and s is the step length of data forward.After the calculation is completed, s is moved forward to form a new SW until all data is calculated.The method is compared with the original IF algorithm, and the comparison results are shown in Figure 7.
Both methods can identify the No. 32 monomer, while the SW IF identifies the 98th monomer more than the original algorithm.
Figure 6 shows that the No. 32 and the No. 98 monomers have obvious outliers in the later charging process.Among them, the voltage of the No. 32 cell is lower than other voltages for a long time.This kind of anomaly is obvious, and both methods can identify it.However, the voltage of the No. 98 cell only fluctuates abnormally at a certain time.Considering that the IF algorithm is not sensitive to local data, it cannot identify this subtle change.After the introduction of the SW, the data is subdivided, and the calculation ability of the algorithm for local data is strengthened.Therefore, the SW IF method can correctly identify the No.98 monomer.
Recall and precision are the most widely used in measuring information retrieval. 40To further verify the effectiveness of the proposed method, three indexes, Precision, Recall, and F -measure , were introduced for evaluation, and compared with the local outlier factor (LOF) anomaly detection algorithm proposed in the literature.Where, Precision represents the probability of identifying the correct result in the algorithm recognition, and Recall represents the probability of identifying the truly inconsistent monomer by the algorithm.F -measure is the comprehensive score of the first two, which is used to evaluate the quality of the method.(5) In the formulation, T P indicates that the actual monomer is normal and the predicted result is a normal monomer.F P represents an abnormal monomer in fact, and the predicted result is a normal monomer.F N indicates that it is a normal monomer, and the predicted result is an abnormal monomer.β is used to adjust the weight.In this paper, Precision and Recall are equally important, so β is 1.At this point, F -measure is also known as F 1 .
Based on the analysis of the battery data of 27 vehicles with inconsistent failure, the overall indicator data of all vehicles are calculated, and Table 4 lists the results.As shown in Table 4, the precision and recall rates of IF with SW are both high.Although the accuracy of the primary IF is 1, its recall rate is too low to correctly identify all monomers.However, the LOF algorithm is more sensitive to local data than the IF algorithm, 39,41 which also leads to that some normal monomers will be mistaken for abnormal monomers when combined with SW, thus reducing the accuracy.When F 1 scores of various methods are considered comprehensively, SW-IF scores are the highest.Therefore, the method proposed in this paper is more suitable for the recognition of abnormal cells of power batteries.

| Abnormal warning
Based on the anomaly recognition model, the realtime warning of vehicles is realized.Specifically, the charging voltage data when the alarming vehicle has an alarm fault is selected as the input data of the early warning model.The SW is used to segment the dataset, and each window is recorded as a time point.Each SW generates a new time point.Each corresponding window has a score result, and so on until all data calculations are completed.Finally, the trend chart of the monomer score over time is obtained, and the early warning of the battery cell is realized by comparing the scoring trend of the monomer with the fault threshold.
To verify the influence of different window sizes on the early warning model, voltage data of four alarm vehicles with alarm segments were selected, and multiple groups of window sizes were set to calculate the early warning time, to obtain the optimal solution of the window, as shown in Figure 8.As can be seen from the figure, when the window size is about 15, the warning effect is the best, while when the window size is too large, the result of no warning will appear.The reason is that if the window is too large, it means that there is a lot of data in one window, and the model will calculate the normal data and abnormal data together, which may not be able to fully identify the abnormal monomer and reflect the warning time.If the window is too small, the effect may not be the best, and the calculation time will be too long, affecting the efficiency.Therefore, it is more appropriate to set the SW size at about 15.
When the SW size is 15, the results of all battery cell scores of the four alarm vehicles changing with time are shown in Figure 9 The method proposed in this paper can recognize the signal of voltage outlier when there is no obvious outlier phenomenon and warn the abnormal battery.Therefore, the method proposed in this paper can effectively identify and warn abnormal cells for power batteries.Although it has a high precision rate, the recall rate is too low.The algorithm proposed in this paper can effectively avoid the phenomenon that the above normal cells are misidentified and the abnormal cells cannot be identified.If the SW size is 15, the early warning effect of the fault is the best.Before the vehicle alarm occurs, the algorithm can warn of the battery inconsistency fault.
In the process of practical application, although it is difficult to identify other faults through current, the identification of short-circuit faults is simple and rapid.To identify the efficient operation of the algorithm, the data of the current sensor and temperature sensor can be used to determine whether the value is within the normal range.Then the algorithm proposed in this paper is used to identify the overall fault of the battery by using the voltage parameters, which can make the fault identification more rapid and accurate.

T A B L E 1 2 .
Parameter setting of isolated forest model.Create a SW.The ordered power battery voltage data is divided into multiple subdatasets according to the time series, and each subdataset flows into the IF diagnostic model as a separate data segment.After each calculation, the old data is removed and the new data flows into the model until all data are calculated.3. Abnormal battery identification and early warning.The IF algorithm was used to calculate the data in the window successively, and the score of each monomer in different windows was calculated.A window is denoted as a point in time.After all data are calculated, the score of each unit is compared with the fault threshold.When the score of a cell exceeds the threshold at each time point, the cell is diagnosed as an abnormal cell.If the score of a single battery increases with time and exceeds the threshold value, the time exceeding the threshold can be compared with the actual vehicle alarm time to achieve the warning effect.

T A B L E 2
Basic packet information.Total number of single cells Total number of cells in the battery pack 108 Alarm status 19 alarm contents, such as poor consistency alarm of battery cells, are divided into normal and alarm states.Alarm/non-alarm Recorded time June 21-28, 2022 F I G U R E 3 Cell voltage distribution curve of the alarm vehicle.(A) Voltage curve of cells of the alarm vehicle, (B) data segment of the alarm vehicle.

6 F
Cell voltage distribution curve of non-alarm vehicle.(A) Voltage curve of cells of the non-alarm vehicle, (B) data segment of non-alarm vehicle.T A B L E 3 Battery-related parameters.I G U R E 5 Threshold statistical diagram of the anomaly recognition model.

F
I G U R E 6 Vehicle A monomer voltage diagram.F I G U R E 7 Vehicle A anomaly recognition result diagram.(A) Isolated forest identification results, (B) sliding window isolated forest recognition results.
. In the figure, the vertical dotted line represents the time point of the platform alarm, and the horizontal dotted line represents the algorithm warning point.It can be seen from the Figure 9 that the scores of No. 3 and No. 32 in Vehicle A and Vehicle B always exceed the threshold value, which indicates that these two monomers are always in the abnormal state, while the scores of No. 65 of Vehicle C and No. 64 of Vehicle D rise from the normal score range to the abnormal range.When the score of a cell exceeds the fault threshold, the algorithm gives an early warning.Compared with the vehicle alarm time, the warning time of the algorithm is advanced by 3.62, 0.2, 1.65, and 2.25 h, respectively.

T A B L E 4 8
Comparison of performance indexes of different methods.Impact of different windows on alert time.Aiming at the demand of battery inconsistency fault diagnosis, this paper proposes an improved IF algorithm for fault diagnosis and early warning of power batteries.The algorithm divides the vehicle data through the SW, and constructs the IF diagnosis model separately by the subdataset flowing into the SW, which improves the low recall rate of the ordinary IF algorithm.By analyzing the platform data and model results, the following conclusions are obtained.The IF fault diagnosis algorithm has no clear boundary between outliers and normal values.By comparing the score intervals of the samples in the two cases, the threshold is determined to be 0.75.Data analysis regarding this threshold can make the diagnosis results more intuitive and accurate.Compared with the LOF anomaly detection model, the anomaly recognition model based on IF and SW has higher precision and recall.Due to the excessive sensitivity of the LOF algorithm, the combination of LOF and the SW will cause the phenomenon that normal cells are misidentified and reduce the precision.The originally IF algorithm does not segment the data.

F I G U R E 9
Early warning effect of four vehicles at sliding window size of 15; (A) Vehicle A, (B) Vehicle B, (C) Vehicle C, (D) Vehicle D.