SOH estimation and RUL prediction of lithium batteries based on multidomain feature fusion and CatBoost model

In this paper, a lithium‐ion battery State of Health (SOH) estimation algorithm is proposed based on the fusion of multidomain features and the application of a CatBoost model. The aim is to address the issue of low prediction accuracy in SOH caused by the utilization of single‐feature extraction techniques. The algorithm encompasses the extraction of various features from the original charge–discharge data, including time‐domain, frequency‐domain, entropy, and time‐series features. Following the evaluation of feature importance, a feature selection process is conducted to eliminate redundant features that provide a limited contribution to the predictive results. Subsequently, a multiple‐set discriminative correlation analysis is employed to integrate high‐dimensional features. To attain accurate predictions, the CatBoost model is further optimized through the utilization of a sparrow search algorithm. Experimental results demonstrate that the proposed algorithm achieves accurate SOH estimations within individual batteries, as evidenced by mean square error values consistently below 4e−4 and goodness‐of‐fit values exceeding or equal to 0.98. Additionally, the algorithm exhibits reliable prediction capabilities across different batteries operating under the same charge/discharge strategy. Comparative analysis indicates that the adoption of the multidomain feature fusion approach yields improved prediction accuracy in contrast to the utilization of a single feature extraction method.


| INTRODUCTION
Lithium-ion batteries, one of the most used energy storage devices, are extensively employed in aerospace, new energy vehicles, and portable electronic devices. The battery life gradually ages or even fails with time due to the internal chemistry of the battery and the external environment. [1][2][3][4][5][6][7][8] Timely prediction of lithium-ion battery health contributes to battery planning and management, thus reducing the probability of accidents and ensuring safe and reliable operation of devices. Accordingly, the accurate prediction of the State of Health (SOH) and the remaining useful life (RUL) of Li-ion batteries takes on a critical significance in the field of Li-ion battery usage. [9][10][11][12] Theories and technologies in the fields of artificial intelligence, machine learning, and data mining have been progressively mature over the past few years, and datadriven battery SOH prediction has aroused increasing attention. [13][14][15][16] Researchers have developed mapping relationships between battery SOH and RUL and features using advanced machine learning methods to achieve the prediction task. In a recent study concerning SOH estimation of Li-ion batteries, a novel health factor (HF) was introduced by Yue et al. The researchers employed the instantaneous voltage drop amplitude of the initial segment of discharge under constant-current discharge conditions. To mitigate the impact of noise, the HF data was reconstructed using multiorder Bezier curves. Moreover, an empirical degradation model was constructed to account for the influence of noise contamination on the new HF data. 17 In a separate study conducted by Zhou et al., a data-driven model utilizing a temporal convolutional network (TCN) was proposed. This model aimed to establish a mapping relationship between the battery charging curve and the SOH. TCN, a neural network comprising multiple layers of causal convolutions, was utilized to encode the sequence of sampling points from the battery charging curve. The resulting encoding facilitated the establishment of a mapping relationship with the SOH. Experimental findings revealed that the proposed TCN-based SOH estimation model exhibited high accuracy in estimation and demonstrated good adaptability across different battery types. 18 Wen proposed a model based on incremental capacity analysis and BP neural network to predict battery SOH at different ambient temperatures. By analyzing the correlation between IC curve characteristics and SOH, the mapping relationship between temperature and IC curve characteristics was established by the least square method, and the SOH prediction model at different temperatures was obtained. 19 Wang et al. proposed a Gaussian regressionbased, joint principal component analysis method for processing the charging curve to obtain indirect health features. Moreover, the battery aging model of their study achieved high accuracy. 20 Pan et al. proposed to construct health indicators (HI) characterizing battery decline under dynamic operating conditions and then introduced extreme learning machine (ELM) to train the ELM decline model for the whole life cycle of the battery offline for SOH estimation online, which is more robust than the conventional estimation methods. 21 Song et al. proposed a SOH estimation method for lithium-ion battery based on XGBoost algorithm to increase the accuracy of SOH estimation. Furthermore, they extracted average voltage, voltage difference, and temperature difference as features to describe the aging process of batteries. 22 For data-driven battery SOH prediction, typical features should be extracted from the capacity degradation data of the battery, and a mapping relationship should be developed between them and the health status. If a single feature is employed for battery SOH and RUL prediction, the prediction accuracy will be low since it cannot characterize excessive information. Thus, multiple features are often extracted from the battery data. Zhengyu et al. employed the ratio of the current capacity to nominal capacity of Li-ion battery as the HF, whereas useful information will be easy to ignore during training, such that the generalization performance will be reduced. 23 In the research of Ji Wu, the voltage during the charging process of Li-ion served as an indirect HF, and a model for predicting the RUL of Li-ion battery was built based on an artificial neural network (ANN). Nevertheless, fewer features were extracted, and the model generalization ability of the model with fewer features in the actual prediction should be verified. 24 Yi et al. proposed an algorithm combining ensemble empirical mode decomposition (EEMD) and gated recurrent network (GRU) to predict RUL. This algorithm is capable of achieving lithium-ion battery RUL prediction, whereas the correlation between factors is not considered in the selection of HFs, such that the prediction is subjected to slow speed and low accuracy. 25 The purpose of feature fusion seeks to integrate the features that have been extracted from the original data into a single feature that is more distinctive than the input features. Combining features at different scales has been identified as a crucial technique to enhance the performance of models in many studies. Presently, the primary methods of feature fusion comprise convolutional neural network (CNN), canonical correlation analysis (CCA), and discriminant correlation analysis (DCA). The process of feature extraction, from low-level to high-level, is achieved by deepening the layers of a CNN. Low-level features possess a higher resolution and contain greater location and detail information, but they are also noisier as they have been subject to less convolution. The approach to feature fusion utilizing CCA entails leveraging the correlation between two input features to compute two transformed features that exhibit a higher correlation than the two sets of input features. However, CCA suffers from a notable limitation, as it disregards the interclass relationships within the data set, resulting in the separation of classes within each feature group. To address this drawback, DCA has been introduced. DCA aims to maximize the correlation between two features and their corresponding classrelated features while simultaneously maximizing the differences among different classes. In scenarios where the feature sets comprise more than two classes, multiset discriminant correlation analysis (MDCA) can provide more comprehensive decision-making information regarding the input feature data, while also ensuring that features pertaining to different classes are decorrelated. Notably, this method exhibits computational efficiency and does not encounter the small sample problem encountered by other feature fusion algorithms, such as CCA.
Neural networks and integrated learning algorithms are the prominent algorithms employed in the field of data-driven SOH and RUL prediction for lithium-ion batteries. Among these, Gradient Boosting Decision Tree (GBDT) stands out as a category of integrated learning algorithms. GBDT reduces overall error by mitigating bias, necessitates minimal tuning, exhibits enhanced robustness, and finds extensive application in diverse domains such as transportation, healthcare, finance, and lithium battery lifetime prediction. In 2017, Yandex, a Russian company, introduced CatBoost, a machine learning library founded upon the GBDT framework. Compared to other GBDT algorithms such as XGBoost and LightGBM, CatBoost demonstrates improvements across various dimensions. It addresses the issue of gradient bias during iterations through the utilization of the ordering principle, ordering enhancement algorithm, and greedy strategy. This approach effectively reduces the probability of model overfitting and accelerates model execution. The performance of CatBoost heavily relies on selecting an appropriate set of hyperparameters. Currently, many integrated learning models employ the grid search method for hyperparameter exploration. However, this approach entails traversing a large parameter set, leading to inefficiency in the search process and even resulting in a high-dimensional space. Thus, the adoption of optimization algorithms becomes necessary for effective hyperparameter search.
The Sparrow Search Algorithm (SSA) is an emerging metaheuristic algorithm that was proposed in 2020 and is characterized by its robust global and local search capabilities. Its effectiveness has been demonstrated in diverse domains, including but not limited to vehicle lane change decision, 26 engine parameter prediction, 27 data feature selection, and multiobjective optimization. Given the remarkable performance of the SSA algorithm in various applications, this paper elects to employ it in optimizing the CatBoost model, thereby circumventing challenges associated with dimensionality explosion arising from grid search during the training process.
To address the problem of low accuracy of single feature prediction, a multidomain feature extraction method is proposed in this study, that is, extracting time-domain, frequency-domain, entropy, and timeseries features from the battery data, and then screening and fusing the above features as the input of the subsequent prediction model. Moreover, an SSA-CatBoost model is proposed in this study for lithium battery SOH estimation and RUL prediction to further increase the prediction accuracy.

| MULTIDOMAIN FEATURE FUSION ALGORITHM FLOW FOR SOH PREDICTION OF LI-ION BATTERIES
For the SOH prediction of Li-ion batteries, a multidomain feature fusion algorithm flow is built in this study. The process is shown in Figure 1.
For SOH prediction of Li-ion batteries, a multidomain feature fusion algorithm is built in this study. First, multidomain features (e.g., time domain, frequency domain, entropy value, and timing parameters) are extracted from the collected battery signals to form a high-dimensional feature set. Second, to reduce the computational pressure of the model and the possibility of overfitting, the Gini coefficient is used to calculate the feature significance of the respective feature, and a suitable screening threshold is set to eliminate redundant features to obtain a low-dimensional feature set. Next, the MDCA is used to fuse the features between different domains. Lastly, the optimal feature set is obtained as the feature input of the prediction model for battery SOH prediction.

| ALGORITHM INTRODUCTION
3.1 | MDCA Information fusion can occur at different levels of prediction (e.g., at the feature level, the match score level, or the decision level). However, feature-level fusion is considered more effective since a feature set contains richer information regarding the input feature data than the matching scores or the output decisions of the classifier. 27,28 MDCA is a statistical analysis method used to assess the correlation between multiple sets of variables. It maximizes the differences between categories through linear combinations of variables, so it is suitable for exploring the differences and similarities between different groups in a data set.
Taking two data sets with different domains as an example, the specific steps of MDCA are as follows.
It is assumed that there are two data sets ∈ X R p n × and ∈ Y R q n × , whose columns in the original data matrix n are divided into groups a. Moreover, for the data set X , set ∈ x X ij as the eigenvector corresponding to the jth sample in the ith class. On that basis, the interclass scatter matrix is defined as (1) x − i and x − denote the mean of x ij in the i class and the mean of x ij in the full feature set, respectively. The diagonalized transformation matrix is developed.

( )
where P denotes the orthogonal eigenvector matrix; Λ represents the diagonal matrix of real eigenvalues and nonnegative eigenvalues in decreasing order. Q c r ( × ) represents the first r eigenvectors, corresponding to the maximum nonzero eigenvalues of r of the matrix P. The most significant vector r of S bx is expressed by mapping: −1/2 expresses the transformation that reduces the dimensionality of the matrix X from p to r, which is written as X′ represents the projection of the matrix X in the space whose interclass scatter matrix is I . Likewise, the data set Y is processed to develop a transformation matrix W by , and it yields: , and the singular value decomposition of the diagonalized matrix S′ xy r r ( × ) is adopted, which is expressed as ε represents a diagonal matrix with nonzero diagonal elements, set W Uε = Subsequently, the following transformations are conducted.
where W x and W y denote the final transformation matrices of X and Y , respectively.

| Multiscale alignment entropy
Permutation entropy (PE) refers to an average entropy function to measure the complexity of time series and detect kinetic mutations, with a simple algorithm and fast calculation speed. Multiscale permutation entropy (MPE) is a technique that extends the concept of PE to perform coarse-grained analysis of time series data at multiple scales. By applying this approach, the time series is effectively granulated into coarser representations. Subsequently, the PE is computed for each of these coarsely granulated series at different scales. MPE captures the variation of information across different scales, providing a more comprehensive characterization of the signal's properties. It enables signal segmentation at various scales as required, thereby enhancing its adaptability and flexibility across diverse fields, signal types, and applications.
In the field of rolling bearing fault diagnosis, a study referenced as Correa et al. 29 employed MPE with optimized parameters to extract entropy-based features from collected bearing data. The results demonstrated the capability of the method to extract crucial information regarding fault conditions. Another study, cited as Jigang and Gang, 30 focused on analyzing and extracting fault features from the original vibration signal. It quantified these fault features using multiscale permutation entropy and achieved high recognition accuracy for different fault types.
In the domain of small-current system faults, a publication referred to as Liu et al. 31 decomposed the zero-sequence currents of each line during faults. Subsequently, it calculated the multiscale alignment entropy of the intrinsic mode function and employed this entropy measure as a feature vector for training a classifier. This approach demonstrated rapid line selection, simplicity in principle, and high accuracy in fault identification.
The specific calculation process of MPE is illustrated as follows: where s is the scale factor; y s j , expresses the multiscale time series.
A temporal reconstruction of y s j , yields where m denotes the embedding dimension, τ denotes the delay factor, and t denotes the t reconstructed component.
Arrange the above equations in ascending order, that is, The original position index of each element in the time series is reconstructed as There are m! permutations. It is assumed that each permutation occurs D times, and then the probability of the time series at the respective scale is denoted as p D n s m The MPE value is determined as follows: , the above equation takes the maximum value m ln( !), normalizing the MPE, . The larger it is, the closer it is to 1, the more random and nonstationary the time series is, and vice versa, the more regular the time series is.

| SSA algorithm
SSA is an emerging meta-heuristic algorithm, the goal of which is to obtain a set of input parameters to minimize or maximize a function. Compared with traditional algorithms, the SSA focuses more on the ability of individual exploration, has a simple structure, is easy to implement, and has fewer control parameters and better local search ability. 32 Let there be n sparrows in the population, then the population consisting of all individuals can be expressed , and the corresponding fitness . Its discoverer location is updated as follows: where iter max is the maximum number of iterations, i is the current number of iterations; x i j t , denotes the position of the ith sparrow in the jth dimension in the tth generation; ∈ α (0, 1), R 2 denotes the warning value, ST denotes the safety threshold, and Q is a random number that follows a normal distribution.
The follower positions are updated as follows: where x t worst denotes the position of the worst adapted individual in generation t, and x P t+1 denotes the position of the optimal adapted individual in generation t + 1. Vigilante locations are updated as follows: where x t best denotes the global optimal position in generation t, β is the control step and follows a normal distribution with mean 0 and variance 1, ∈ k (−1, 1), is set as a constant to avoid the denominator being 0. f i denotes the fitness value of the current individual, f g and f ω denote the fitness values of the current global optimal and worst individuals.

| CatBoost algorithm
CatBoost algorithm is an emerging Boosting algorithm based on improved gradient boosting decision trees. Its most distinctive advantage is that it proposes a ranking boosting strategy to address the gradient bias and prediction bias problems in gradient boosting decision trees, thus reducing the occurrence of overfitting. Moreover, the symmetric tree-based learner is used to improve the generalization ability and prediction speed of the model, thus ensuring the training speed and prediction accuracy of the model with the advantages of fewer parameters, high accuracy, and robustness. Figure 2 presents the basic principle of the Boosting algorithm. First, the first subset of data is obtained by initial weights. Subsequently, a weak learner is trained with this data subset, and the weights are recalculated based on the prediction error of the weak learner, that is, the weights of the training samples with high learning error rate are adjusted upward. On that basis, the weak learner focuses on the samples with high learning error rate in the next round of learning, and the performance of the learner is continuously enhanced by iterating sequentially. Lastly, the weak learners with high learning accuracy are given larger weights, and multiple weak learners are sequentially combined into a strong learner by weighting.

| GBDT
GBDT accumulates the results of multiple regression trees to obtain the results in accordance with the principle of addition. The problem of easy overfitting of a single tree is effectively avoided by controlling the weights of the regression tree results in each iteration.
The steps of the GBDT algorithm are presented as follows: Step 1. Input training data set.
Step 2. Initialize the weak learner f x ( ) 0 and define the prediction function of the model f x ( ): where L y c ( , ) i represents the loss function; y i denotes the first i predictor; i expresses the parameter with the smallest squared loss function.
Step 3. For the number of iterations t T = 1, 2, …, , a weak learner with training sample D are constructed in each round to solve the negative gradient.
Step 4. Using the solved negative gradients r ti and x i N ( = 1, 2, …, ) i , a CART regression tree is fitted based on the squared error minimization training decision tree to obtain the t regression tree: Its corresponding leaf region is expressed as R j J ( = 1, 2, …, ) ti , where j denotes the number of leaf nodes of the regression tree t.
Step 5. Update the Strong Learner: where I jm is the first j regression tree.
If the corresponding decision function meets the convergence condition, the iteration stops and the expression of the final strong learner f x ( ) is denoted as

| Catboost model principle
Like all standard gradient boosting algorithms, CatBoost also fits the gradient of the current model by constructing a new tree. To address the problem of overfitting problem that often occurs in the classical boosting algorithm, CatBoost makes some improvements to the classical gradient boosting algorithm. 33 Let D be the training set.
where n is the number of sample groups i n ( = 1, 2, …, ), vector of the first group, and y i is the target value of the ith. The feature transformation values X i k of the CatBoost model are expressed as follows: where φ is the indicator function, P represents the prior value, and α is the prior weight. These values help mitigate the noise issue for categories with a low frequency of occurrence. CatBoost replaces the gradient estimation method in the conventional algorithm by using Ordered boosting. To obtain unbiased gradient estimates, CatBoost is adopted to train a separate model for the respective sample in the training set, all other sample data that do not contain the sample for training are selected, and the corresponding model is built. The base learner is continuously trained by computing the gradient estimates of the sample data to further obtain the final model. On that basis, the generalization ability of the model is enhanced.
In this study, the battery tester is used to perform cyclic charge/discharge experiments on the battery and to detect the voltage, current, power, capacity, and other parameters during the process. The battery uses NCM battery, which is also the most extensively used battery in the method car at present. The experimental equipment comprises a battery tester, a thermostat box, an upper computer, as well as an alligator clip. The thermostat box is adopted to create an experimental environment of 15°C.
The battery to be tested is an 18650 Li-ion battery as the experimental object, with its details listed in Table 1.

| Experimental steps
The experimental procedure is illustrated in Figure 3.
As depicted in Figure 3, the charging method adopted in this study is constant-current, followed by constant-voltage charging method. It is capable of shortening the charging time and ensuring the safety of the test battery. The battery is charged to 4.2 V at a constant current of 0.5C (1.3 A) at low rates. Subsequently, it is charged at a constant voltage of 4.2 V till the current declines to 0.2 A. The battery is discharged with 1.5C (3.9 A) constant current to a cut-off voltage of 2.7 V. Lastly, a predetermined number of charge/ discharge cycles serves as the experimental termination condition, and the experimental data are recorded. Furthermore, the discharge capacity curve is plotted and then analyzed.

| Introduction to the data set
Two data sets are selected for the experiments, which are denoted as Data set A and Data set B, to verify the prediction effect of the proposed algorithm for Li-ion batteries based on different charging and discharging strategies.
Data set A: Lithium-ion battery test data from NASA prediction center (prognostics center of excellence, PCoE), the battery applied is 18650 battery, with the rated capacity of 2Ah. The B0005, B0006, and B0007 battery data were taken as the experimental data (e.g., the battery cycle charge and discharge process voltage). The experimental process is illustrated as follows: charging process, charging with 1.5 A constant current, when the voltage reaches 4.2 V, and then switch to constant voltage charging, till the charging current is less than 20 mA to stop charging; discharging process, discharging with 2.0 A constant current, till the voltage is less than 2.5 V. The battery is charged and discharged for 168 cycles in accordance with the above process. [34][35][36][37] Data set B: The data acquired from the measurement in accordance with the experimental procedure in Section 4.1, including the battery data of voltage, current and power during charging, resting, and discharging processes. The parameters of 148 cycles of the battery are determined after processing.

| INSTANCE VALIDATION
In this study, Data set A is taken as the research object, multidomain feature extraction, screening and fusion are performed, and then a prediction model is built to predict SOH and RUL of B0006 and B0007 with B0005 as the training set. Moreover, to verify that the feature engineering proposed in this study exhibits strong generalization ability, Data set B is taken as the validation object, feature engineering is built again for Data set B, and a prediction model is built. Furthermore, the prediction effect is identified.

| Multidomain feature extraction
The data set selected in this study is the battery charging and discharging cycle, including a total of 10 attributes (e.g., voltage measured, current measured, temperature measured, current load, voltage load for charging experiment; voltage measured, current measured, temperature measured, current load, voltage load for discharging experiment, temperature measured, current load, and voltage load). To be specific, the voltage load attribute of battery No. 5 serves as an example to illustrate the experimental process of feature extraction, screening, and fusion. | 3089 Figure 4 plots the time domain decay curve of the property voltage load in the discharge experiment.
As depicted in Figure 4, the respective curve represents the decay curve of the attribute voltage load in the respective cycle of the discharge experiment, the length of the sampling points of each curve is not consistent, and the decay of the respective curve will be suddenly accelerated when the curve decays to nearly 1.8 V. On that basis, the attribute is difficult to be directly used as a feature quantity for battery life prediction, and feature extraction is required. Here, feature extraction in terms of time domain, frequency domain, entropy value and timing features is performed, respectively.

| Time domain feature extraction
The simplicity and directness of time domain analysis makes it extensively employed in various signal online detection systems. The signal is analyzed by calculating the simple statistical features of the signal and then selecting the appropriate characteristic parameters to accurately characterize a wide variety of signals.
In general, the time domain characteristic parameters are classified into two categories. One is the dimensioned characteristic parameters with original signal units (e.g., maximum value, average value, root mean square value, peak-to-peak value, standard deviation, and variance). The other is the dimensionless characteristic parameters (primarily including the product or ratio of the dimensioned parameters), such that the units of the final obtained characteristic parameters cancel each other (e.g., cliffness, waveform factor, peak factor, pulse factor, margin factor, and skewness factor).
For the battery data, the dimensioned feature parameters can indicate the trend and rate of battery capacity decay, which can indicate the information contained in the battery under different operating conditions. The dimensionless feature parameters are correlated with the operating environment in which the battery is located. Accordingly, the dimensioned and dimensionless feature parameters are selected to form the time-domain feature set T. Figure 5 shows the trend of the time domain characteristics of the voltage load sample section.
As depicted in Figure 5, the mean characteristic of this property tends to decrease as the cycle proceeds, while the margin factor, on the contrary, shows an increasing trend; and the peak-to-peak value can well capture the capacity regeneration phenomenon of the battery during the charging and discharging process, effectively reducing the influence of capacity fluctuation on the prediction results during the prediction process.
On the other hand, the time-domain parameters of the battery contain more limited information regarding the capacity features, and some of such features have large uncertainties. Thus, other aspects of battery features need to be extracted.

| Frequency domain feature extraction
The battery signal is sampled by the signal collector and then obtained as a discrete signal, and the spectrum function can be obtained through fast Fourier transform. On that basis, some frequency parameters can be determined.
Common frequency domain parameters comprise center of gravity frequency, mean square frequency, average frequency, frequency variance, frequency standard deviation, amplitude spectrum, as well as the power spectrum. The center of gravity frequency is capable of expressing the frequency of the signal with larger components in the spectrum, indicating the power spectrum distribution of the signal. Mean square frequency and root mean square frequency are capable of describing the change of the main band position of the power spectrum. Moreover, frequency variance refers to a dimension to measure the energy dispersion of the power spectrum. In addition, frequency standard deviation changes with the change of the spectrum amplitude; the larger the spectrum amplitude, the smaller the frequency standard deviation and vice versa. Notably, frequency standard deviation can be adopted to describe the dispersion of the power spectrum energy distribution.
A total of the above five feature parameters are extracted, constituting the frequency domain feature set of the battery data P. Figure 6 presents the frequency domain feature parameters of the voltage load attribute.

| Entropy feature extraction
Entropy is a measure of information uncertainty, such that entropy features often serve as a class of effective features for feature extraction in regression models. The entropy of different frequency bands can be used to quantify the information contained in the signal to measure the uncertainty and complexity of the signal distribution. The mean value feature of entropy can indicate the complexity of the information in the system as well as the error orientation.
The most used entropy functions comprise scatter entropy and fuzzy entropy, as well as the derived multiscale entropy. Multiscale entropy comprises multiscale scatter entropy, multiscale fuzzy entropy, and multiscale permutation entropy, among which multiscale permutation entropy shows the advantages of robustness, simple calculation, and fast operation speed.
In this study, MPE is selected as the entropy feature, and the optimal scale factor is selected by analyzing the trend of the mean value of MPE at different scales s. Figure 7 illustrates the mean MPE values of the voltage load attribute under different cycles when different scale factors are selected. The most appropriate scale is selected based on the scale-mean decay relationship, that is, the dimensionality of the ranked entropy value feature vector is determined.
As depicted in Figure 6, with the increase of the scale factor, the decreasing trend of the mean value of MPE of the respective cycle voltage load signal in the first six discharge cycles becomes slower, followed by the increasing mean value and the phenomenon of overlap and crossover. If a larger scale factor is selected to extract the feature vector, it will cause the overlapping of feature information and affect the model prediction effect. The feature information cannot be fully extracted under the small scale factor. Accordingly, the MPE values of the first 6 scale factors are taken to form the sample set of entropy features S.

| Time-series feature extraction
In time series forecasting, the autocorrelation and partial autocorrelation coefficients are adopted to measure the correlation between current and past series values and to indicate the most useful past series values for predicting future values.
Autocorrelation, that is, serial correlation, represents the intercorrelation of a signal with itself at different points in time, that is, the similarity between two observations as a function of the time difference between them. It is capable of identifying repetitive signals (e.g., periodic signals masked by noise), as well as fundamental frequencies that disappear implicitly in the harmonic frequencies of the signal. Partial autocorrelation summarizes the relationship between the time series after the interference and the time series of the previous time step are excluded. The autocorrelation between the observations and the observations of the previous time step comprises direct and indirect correlation.
The autocorrelation coefficients and partial autocorrelation coefficients of the respective attributes are extracted to form the timing feature K. Figure 8 presents the autocorrelation coefficients and partial autocorrelation coefficients of the voltage load attribute.
As depicted in Figure 8, the upper and lower horizontal lines indicate the upper and lower bounds of the autocorrelation coefficient and the partial autocorrelation coefficient, respectively, and the part beyond the bounds indicates the existence of a correlation. As depicted in Figure 7, the absolute value of the autocorrelation coefficient maintains a large value for a long time, and it tends to decrease, suggesting the phenomenon of "trailing." The partial autocorrelation plot, on the other hand, fluctuates around the value of 0 after the second order, that is, the "truncated tail" phenomenon, suggesting that the time series is a smooth series.

| Feature significance assessment
After the previous multiscale feature extraction, a total of 240-dimensional high-dimensional features Q 1 are extracted. Too high dimensionality will increase the training cost and even overfitting phenomenon. Thus, dimensionality reduction operation should be performed on high-dimensional data to select several features with the richest information content of battery aging.
In the field of battery SOH estimation, researchers commonly employ Pearson correlation analysis to assess the association between aging characteristics derived F I G U R E 7 Mean multiscale permutation entropy values for the voltage load property. from raw battery cycle data and capacity. This analysis entails analyzing the magnitude of the correlation coefficient between these variables, with the aim of identifying relevant characteristics that closely align with the capacity variation pattern. Nonetheless, the Pearson correlation coefficient analysis exhibits certain limitations: (1) It is susceptible to the influence of outliers.
(2) When the sample size is small, the correlation coefficient tends to exhibit greater fluctuations and may approach an absolute value of 1; thus these factors motivate the purpose of screening.
The Gini coefficient is an index utilized for assessing the significance of features within a given context. The underlying concept of the Gini coefficient involves evaluating the contribution of each feature to each tree, measuring their average values, and subsequently comparing the magnitudes of these contributions across the features. One notable advantage of the Gini coefficient is its ability to handle both continuous and discrete values, rendering it suitable for constructing high-precision models. Moreover, the Gini coefficient yields a more intuitive measure, which enhances its efficacy in comparison to Pearson's correlation coefficient analysis for conducting the feature screening process. Consequently, this paper adopts the Gini coefficient as the chosen metric to evaluate feature importance and facilitate the subsequent feature selection procedure.
In this study, the Gini coefficient is adopted to calculate the feature significance and filter the features that contribute the most to the prediction results, which constitutes the low-Witt collection Q 2 . 38 The Gini coefficient, which measures the significance of a feature, is calculated as where t denotes the number of features in the above new training subset; λ sv represents the proportion of the vth feature in the node s; λ sj expresses the proportion of the vth feature in the node j.
The significance of the feature F i at the node m is where GI 1 and GI 2 denote the Gini coefficients of the two new nodes after branching, respectively. The significance of the feature F i in the k tree is: where M indicates the number of times the feature F i appears in the k tree. Then the significance of the feature F i in the tree K is Lastly, the weights of the i features are normalized as follows: where n denotes the number of features. On the basis of Equations (29)- (33). The weights of each input variable can be obtained, and the percentage of each weight can be output visually, which is the significance of the input variables so that all features can be output by significance.
The significance assessment of each domain feature of voltage load is done separately, and a suitable threshold value is selected for the feature significance assessment graph, as shown in Figure 9.
The higher the feature significance, the more battery aging information the feature contains, that is, the higher the correlation with SOH. Here, the feature significance value is used to filter the time domain, frequency domain, and entropy value F I G U R E 8 Plot of autocorrelation coefficient and partial autocorrelation coefficient.
features, and the significance threshold R is set to 0.02, below which the features are screened out, thus excluding the features that do not contribute much to the prediction results.
As depicted in Figure 9, FOR the voltage load attribute, both the time-domain features and the frequency-valued features are of high significance in the model, while the entropy-domain features have many components that contribute less to the results. The features with significance values below 0.02 in Figure 8 are chosen to be discarded to reduce the dimensionality of the input data.

| MDCA-based multidomain feature fusion
MDCA eliminates the correlation between classes and restricts the correlation to in classes. MDCA maximizes the correlation of corresponding features between multiple feature sets while decorrelating the features belonging to different classes in the respective feature set. The method is computationally efficient and does not suffer from the small sample problem faced by other feature fusion algorithms (e.g., CCA).
For the battery data, the low-dimensional feature set is obtained by the previous multidomain feature extraction and feature screening based on feature significance, and then the features in the lowdimensional feature set Q 2 are fused using the MDCA feature fusion algorithm to obtain the optimal feature set M containing 10-dimensional feature vectors, which can serve as the input of the model to significantly reduce the computational pressure of the model. The SHAP method is adopted to characterize the fusion results. The core idea of the SHAP method is to calculate the marginal contribution of a feature to the model output and then take the mean value considering the different marginal contributions of the feature across all feature sequences. All features are considered "contributors." Furthermore, for the respective prediction sample, the model generates a prediction value, and the SHAP value is the value assigned to the respective feature in that sample.
The SHAP value of the respective feature is calculated for the 10-dimensional feature variables and a feature density scatter plot is made. In the density scatter plot, the horizontal coordinate represents the SHAP value, each line represents a feature, and the wide place indicates that there are numerous samples aggregated. A point represents a sample, and the color of the point represents the relative size of the value of the point; the redder the color, the larger the SHAP value will be; the bluer the color, the smaller the SHAP value will be. Figure 10 presents the scatter plot of the density of SHAP values for the respective features, with the vertical coordinate order number in descending order of the average absolute value of SHAP values.
In Figure 10, numbers 0-9 characterize the 10dimensional features entered in the previous section. As depicted in Figure 9, the larger the mean SHAP value, the more important the feature will be, suggesting that the feature plays a positive role in the prediction of the target value. Also, the mean absolute value of the SHAP value of feature 1 is the largest and most significantly affects the prediction of the sample. Furthermore, feature 4 takes on a great significance to the prediction of the sample.

| SOH prediction based on CatBoost model
In this paper, SOH is defined by comparing the capacity of the battery (C batt ) to its initial capacity (C init ) to determine the current degree of battery degradation. Typically, the termination life of the battery is considered to be when the current capacity reaches 80% of the initial capacity value. Consequently, SOH is expressed by the following equation: where Second, the CatBoost algorithm is affected by the hyperparameters, such that the SSA algorithm is adopted to optimize the four CatBoost model hyperparameters with the regular term parameter L2_lea-f_reg, the learning rate parameter learning rate, the iterations parameter iterations, and the random strength parameter of the model to enhance the prediction model performance.
The above features fused by MDCA serve as the input of the CatBoost model for battery SOH prediction. The experiment of SOH prediction based on CatBoost model comprises two main parts as follows. One is for the same battery, that is, using the data of one battery to train the model to predict the SOH of that battery. The other is for different batteries, that is, using the data of one battery to train the model to predict the SOH of other batteries.

| Predicted results for each cell in the same cell
For the 168 cycles of data of batteries B0005, B0006, and B0007 obtained after feature processing, the first 84 cycles are employed as the training set and the 85th to 168th cycles serve as the test set to compare the model prediction results in the same battery. The CatBoost prediction model built in the previous section is adopted to predict the three sets of batteries. The previous four CatBoost model hyperparameters are optimally searched to enhance the prediction model performance. The SSA algorithm is employed with a population size of 20 and 100 iterations. Additionally, the multidomain feature extraction and feature fusion operations, as elaborated in the preceding section, are independently performed for each group of batteries. This division facilitates the separation of training and testing sets. The objective function employed in this analysis is the mean square error of the training set, while the battery capacity serves as the predictive label for training and model prediction. Notably, Figure 11 showcases the optimization-seeking iteration curve of the SSA algorithm within data set A, while Table 2 presents the final results obtained from each hyperparameter search.
The test set SOH and RUL prediction results for data set A are shown in Figure 12.
As depicted in Figure 12, the fit of each prediction curve is good on the test set results, and the overlap between the prediction curve and the true capacity curve is significant, suggesting that the model can well conduct the prediction purpose for SOH estimation and RUL prediction in the same cell.
Mean square error (MSE), mean absolute error (MAE), root mean square error (RMSE), and goodness of fit (R-squared, R 2 ) are selected as the prediction models to show the prediction accuracy of each cell in more detail. The prediction results are assessed by the assessment indexes. The specific information and error distribution of the prediction assessment indexes are shown in Table 3 and Figure 13, respectively.
where MSE, MAE, RMSE, and R 2 are characterized as follows: where f x ( ) i denotes the true value; y i represents the predicted value; i N = 1, 2, …, , N expresses the sample size; R 2 is a measure of the overall fit of the regression  Table 3 represents the absolute value of the difference between the actual RUL and the predicted RUL of the cell. In the context of the widely adopted NASA data set, this study incorporates comparative experimental findings from various scholarly works to contrast against its own prediction results. Within the scope of the NASA data set, prior literature 39 employs an enhanced dynamic cuckoo search technique to optimize the particle filtering algorithm. Subsequently, the study derives the HI from measurable parameters pertaining to the operation of a lithium-ion battery. A mapping model between the HI indicator and the SOH is established and integrated into the state space model for observation purposes. The average RMSE of their prediction results is found to be 0.6452, which is notably higher than the corresponding prediction result presented in this paper (0.0132). Moreover, another literature 40 applies the principal component analysis method to reduce the dimensionality of feature parameters capturing the battery performance degradation. They construct a SOH prediction model based on SVR, yielding an average RMSE value of 0.3600, which also demonstrates inferior performance compared to the findings reported in this paper. Figure 13 characterizes the prediction error distribution of batteries 5, 6, and 7, with the horizontal axis representing the error between the predicted value and the true value, and the vertical axis showing the frequency of the error interval. Combined with Figure 11 and Table 3, the error intervals of the three batteries are basically concentrated between 0 and 0.05, especially for battery B0005, whose error is basically below 1e−3, which can prove that the feature engineering and CatBoost model built in this study still have greater advantages in the field of SOH estimation in the same battery.

| Prediction results of different batteries under the same charging and discharging strategy
For different batteries of the same model with the same charging and discharging strategy, the feature engineering establishment method and CatBoost model proposed in this study are capable of achieving their SOH-accurate prediction.
In Data set A, the sampling interval between the number of cycles of battery B0005 varies from 20 and 10 s, and the data form is more abundant. Thus, Data set A serves as the training set for the prediction experiments of different batteries. Here, all the cycle data of battery B0005 serve as the training set to train the CatBoost model, and the trained model is adopted to predict the capacity of the overall cycles of batteries B0006 and B0007. Furthermore, the prediction results are presented in Figures 14 and 15.
As depicted in Figures 14 and 15, the overlap of prediction curves is significant between different batteries of the same model based on the same charging and discharging strategies, and the number of cycles to reach the failure threshold point is almost consistent with the real value, suggesting that the prediction model is effective in SOH estimation and RUL prediction and can achieve the prediction between different batteries of the same model very well. Table 4 and Figure 16 present the specific prediction index and prediction error distribution, respectively.
As depicted in Table 4 and Figure 16, the overall prediction results for both batteries are good, with MSEs around 1e−4, fit coefficients above 0.98, and prediction errors concentrated around 0-0.05 (A/h), suggesting that the model predictions achieve good results.

| Comparison of the prediction effect of different models
To assess the predictive performance of various integrated learning models, an experimental study is conducted to compare their respective effects. Specifically, the study F I G U R E 13 Error distribution of State of Health prediction in the same cell.
examines the prediction effects of XGBoost, LightGBM, and CatBoost models with default parameters, as well as a CatBoost model optimized by SSA. As a case study, battery B0005 from Data set A is selected, and the experimental results from its corresponding test set are presented in Table 5.
Based on the data presented in Table 5, it is evident that CatBoost, LightGBM, and XGBoost exhibit commendable regression prediction capabilities. Notably, all three models demonstrate similar levels of goodness of fit and RMSE, highlighting the robustness of the feature engineering employed in this study. Furthermore, while the CatBoost model shows improvement in goodness of fit compared to the default version, it still falls short of the CatBoost model optimized using SSA. The latter achieves an MSE value of 1.3876e−04 and outperforms other models in the prediction index of RUL. Consequently, the SSA-CatBoost approach effectively enhances estimation accuracy and RUL, displaying noteworthy performance in SOH estimation.

| Comparison of model predictions under different feature extraction methods
To show the superiority of the feature fusion method proposed in this study, the test sets consisting of single features in the time domain, frequency domain, entropy value, and time series are now predicted using the CatBoost model separately and compared with the F I G U R E 14 B0006 overall forecast results.
F I G U R E 15 B0007 overall forecast results. prediction results after feature fusion, and the results are shown in Figure 17.

Battery
As depicted in Figure 17, the prediction curve of multidomain feature fusion has the most significant overlap and the optimal fitting effect, and the prediction effect is significantly better than that for single feature extraction, indicating that the MDCA multidomain feature fusion algorithm proposed in this study has high superiority.

| Model generalizability validation
To verify the generalization ability of the feature engineering and prediction model proposed in this study, this study performs feature extraction again for Data set B and performs SOH prediction based on CatBoost model. There are 148 cycles of data in Data set B. The first 60 cycles serve as the training set, and the last 88 cycles are used as the test set. Furthermore, the model built in the previous paper is used to train and predict, and the prediction results are presented in Figure 18.
As depicted in Figure 18, in Data set B, the CatBoost model also achieves better prediction results in the feature engineering built in this study, with MSE value of 0.1031, MAE value of 0.2655, and R 2 of 0.9997, suggesting that the model can achieve accurate SOH prediction under the same cell.
In brief, the prediction models based on multidomain feature fusion and CatBoost prediction models proposed in this study achieve good prediction accuracy in different data sets, so the models built in this study have good generalization ability.

| CONCLUSION
To address the problem of low accuracy of SOH and RUL prediction caused by the difficulty of establishing feature engineering for lithium-ion batteries, this study proposes a SOH and RUL prediction model based on multidomain feature fusion with CatBoost, and conducts relevant experiments with the following conclusions.
(1) The multidomain feature fusion algorithm proposed in this study can capture the battery aging information to the maximum extent, and the Gini coefficient-based feature screening is capable of screening redundant features intuitively and effectively. (2) In the field of battery SOH estimation, the feature engineering and prediction model built in this study can effectively achieve its accurate prediction. For the SOH prediction of batteries in the same cell, the method in this study performs well, with R 2 above 0.98 for each cell. For the prediction of different cells of the same model with the same charge/discharge strategy, the method in this study still performs well, with R 2 also above 0.98. (3) The MDCA multidomain feature fusion algorithm proposed in this study works significantly better than the prediction effect for feature extraction in a single domain. (4) On Data set B, the feature engineering and model built in this study also achieved good prediction results, which proved that the method in this study has good generalizability.