Cross‐domain state‐of‐health estimation of Li‐ion batteries based on transfer neural network with soft‐dynamic time warping

The success of deep learning in the field of state‐of‐health (SOH) estimation relies on a large amount of battery data and the fact that all data possess the same probability distribution. While in real situations, a model based on one working condition data set may not be valid for another working condition data set due to distribution differences. Therefore, this article proposes a transfer learning method using soft‐dynamic time warping (soft‐DTW) as the statistical feature in the feature transfer method, called soft‐DTW domain adaptation network (SDDAN). By combining the prediction error with the time‐series gap in the model training process, the feature transformation can make the obtained prediction results more similar to the source domain results, which can help us to obtain better prediction results in the target domain. Experimental results show that SDDAN can effectively predict the SOH of Li‐ion batteries and significantly improve the performance of feature learning and knowledge transfer.


| INTRODUCTION
Over the past decade, due to the global energy crisis and environmental pollution, the process of battery production technology has rapidly developed. As a green, nonmemory effect, long service life, and low selfdischarge rate energy, lithium batteries have been widely used in various fields. However, in the use of lithium batteries, the capacity of the batteries will decrease with time, and when it is below a certain threshold, continued use will bring about performance degradation, unusable operation, or even catastrophic accidents, resulting in casualties and economic losses. Therefore, accurate prediction of the state-of-health (SOH) of lithium-ion batteries (LiBs) is an essential part of the battery use process. The extensive research on estimating SOH can be broadly classified into three main categories: empirical-based approaches, modelbased approaches, and data-driven approaches.
Most of the empirical-based methods use one or more battery health factors as the basis of SOH, summarize the battery chemistry from experience, and build empirical models to simulate the aging state of batteries, 1,2 such as complete discharge voltage and internal resistance, 3 battery discharge curve, 4 and so on.
Most of the model-based methods for SOH estimation rely on a deeper understanding of the internal chemistry of the battery, which are known as filter-based methods. In essence, they are based on Bayesian probability. Starting from the idea of state estimation, the to-bedetermined coefficients of the empirical model are used as model states, and the state parameters are updated and corrected in real time by observed data. It is mainly divided into Kalman filters and particle filters (PFs): Kalman filters assume that the noise obeys Gaussian distribution, based on the minimum variance estimation criterion, and mainly solves the assumption of a linear system and Gaussian probability model, including the extended Kalman filter, 5 the unscented Kalman filter, 6 the adaptive unscented Kalman filter, 7 and the constrained Kalman filter. 8 PFs assume that, compared with the Kalman filter, the state space model of the PF can be nonlinear and the noise distribution can be of any type; 9 they are often used in combination with other methods, for example, Mo et al. 10 combined the Kalman filter and particle swarm optimization algorithm to propose a PFbased SOH estimation method for the estimation of SOH in LiBs, Li et al. 11 proposed an online remaining useful life (RUL) prediction method based on unscented PF and least-squares support vector machine, and Chen et al. 12 used a novel PF framework with gray neural network to predict lithium battery SOH and RUL.
Data-driven methods have gained popularity in recent years for their ability to predict battery health states without requiring domain-specific knowledge of battery operation mechanisms and specialized mathematics, but rather by extracting features from historical data and establishing relationships between the data and battery health states. [13][14][15] With the development of hardware and the reduction of computational costs, deep learning techniques such as long short-term memory neural network (LSTM) 16 and gated recurrent unit (GRU), 17 which are variants of recurrent neural network (RNN), have been widely used in LiB SOH prediction applications. 18,19 These deep learning techniques can fully learn from the historical data and address the problems of gradient explosion and vanishing gradient in traditional RNNs. For example, Zhao et al. 20 combined the broad learning system algorithm and LSTM neural network to develop a fusion model that predicts the capacity and RUL of LiBs. Kong et al. 21 proposed a combination of deep CNN and double-layer LSTM for online battery health prediction. Pan et al. 22 proposed a fusion of the LSTM network model based on transfer learning and PF model for long-term prediction of LiBs capacity. These studies have shown promising results and offer potential for practical application in the field of battery health prediction.
Although deep learning has demonstrated powerful capabilities in the field of LiB SOH prediction, it still suffers from two problems: first, the decay trends of the training and test data are different in real situations because LiBs are affected by external environmental changes, changes in operating conditions, and changes in chemical properties. In this regard, the generalization error of the depth estimator learned with the training data under constrained conditions can be large, so the trained estimator is not suitable for direct estimation of battery data with certain differences. Second, the success of deep learning depends on a wide variety of large amounts of LiB data from which to learn the degradation pattern of battery capacity, but collecting sufficient battery data is difficult in realistic scenarios, especially under operating conditions with limited battery sensors and data transmission capabilities. Therefore, learning specific prediction schemes from inadequate training data is challenging.
To address the above problems, transfer learning provides an idea to solve the problem. Transfer learning differs from traditional machine learning in that it applies the knowledge learned on the source domain to a different but related target domain. Deep transfer learning can effectively solve the problem of insufficient data volume and data dependency by extracting common features from the source and target domains and reducing their differences to improve the generalization ability of the network. Deep transfer learning has been successfully applied to various fields, including image recognition 23 and fault diagnosis 24 problems. By introducing deep transfer learning in SOH prediction can be the network learns the same features of different battery data to achieve cross-domain prediction of SOH. Numerous studies on SOH estimate based on transfer learning have been conducted. Deng et al. 25 used early aging data of batteries to achieve degradation pattern recognition and transfer learning, both of which can effectively improve the SOH estimation accuracy. Li et al. 26 proposed a framework for SOH estimation based on semisupervised transfer component analysis, and validated the method using mutual information analysis. Han et al. 27 took into account the mapping relationship between terminal voltage, current, and battery capacity and inserted a domain adaptive layer in the LSTM. To address the distribution differences of different domains, Ye et al. 28 propose a multisource domain adaptive algorithm that transfers knowledge learned from multiple operating conditions to the target condition. Ma et al. 29 used a convolutional neural network (CNN) to extract features from raw charge voltage traces, while the maximum mean discrepancy (MMD) was used to reduce distribution differences between training and test battery data, extending MMD from a classification task to a regression task, and for SOH estimation. The SOH estimation methods mentioned above rely on the use of MMD to measure the similarity between two probability distributions. MMD involves mapping two probability distributions to a high-dimensional feature space and calculating the distance between them in that space. However, when dealing with time-series data, MMD may not be effective in capturing the local information of sequences and is susceptible to noise and outliers, resulting in a decline in the performance of transfer learning. In addition, the above methods all use fixed hyperparameters to balance multiple training tasks, but this approach cannot be applied to all scenarios and can lead to overfitting or underfitting problems. Therefore, in dealing with time-series data, such as current, voltage, and SOH values, it is crucial to develop a framework that can effectively measure the similarity between domain distributions. Additionally, it is essential to balance the estimation error and MMD loss to enhance the robustness and performance of transfer learning methods. Therefore, it is necessary to explore new approaches that can achieve these goals. The development of such a framework would greatly benefit the field of transfer learning, especially for applications related to battery health estimation and RUL prediction.
To overcome the challenges mentioned earlier, a transfer learning approach is proposed for cross-domain SOH estimation of LiBs. This approach employs softdynamic time warping (soft-DTW) as a statistical feature in the transfer method to incorporate prediction error and time-series gap in the model training process, thereby improving the prediction accuracy. During the training process, the feature extraction and transformation can make the obtained prediction results closer to the source domain results, allowing for better prediction results in the target domain, and capturing the commonality between the data and the private properties of the target domain.
The remainder of this article is organized as follows. Section 2 presents the proposed method. Section 3 explains the LiB data sets and implementation details. Section 4 gives the experimental results and discussion. The conclusions are explained in Section 5.

| PROPOSED METHOD
The framework is made up of three parts: pretrain, sequence adaptation, and cross-domain estimation, as shown in Figure 1.

To estimate and learn features from the source domain
data, a GRU-based estimator is employed in feature learning. 2. For the feature transformation transfer method, the computed distribution error is propagated backward as an optimization target to optimize the network parameters of the pretrained estimator. This is achieved by comparing the learned feature distributions between the source and target domains at various scales using a nonparametric distance metric. 3. A sequential adaptive domain sharing estimator is used to make the SOH estimation, using GRUestimated values from various distributions in the target domain.

| SOH estimation based on traditional domain adaptation
Transfer learning aims at transferring knowledge learned in one domain to another domain, where the existing knowledge is called the source domain and the new knowledge to be learned is defined as the target domain. Domain adaptation is an important branch of transfer learning, which maps source domain features and target domain features to a high-dimensional space via a kernel function and reduces their distances in the space. The classical method for computing the distance between features is MMD, which is described by the following equation: where X X , S T represent the source and target domain data, respectively, the variables N S and N T are the lengths of the X S and X T feature representations; the whole formula represents the MDD between two domains of data in a reproducing kernel Hilbert space.
Unlike the traditional SOH estimation, the domain-based SOH estimation will have data from both source and target domains, with the source domain being our known knowledge and the target domain being the domain where we want to learn new knowledge, both of which have similar but not identical distributions, that is, different materials, shapes, and usage scenarios in the battery domain. In transfer learning, the model is pretrained on the source domain to get a pretrained model, which shortens the time for the model to get the best performance in the domain adaptation phase.
In this stage, the model learned in the source domain will be trained using data from the target domain. If the model is trained only using data from the target domain, transfer learning's traditional fine-tuning approach is used. Furthermore, we will examine the differences between the source and target domains and use them as one of the goal functions for domain adaptation. The following formula represents the loss function: where SOH t and  SOH t are the estimated and true SOH at the time step t N , is the total length of observed data, and λ is the hyperparameter; for the selection of λ parameter in deep transfer learning, the typical method is to set it at a constant amount and then experiment to find the ideal value. we introduce the dynamic weight average (DWA), which allows us to change the value to equalize the weight of various jobs.

| Soft-DTW
A soft-DTW approach is presented on the basis of neural networks to decrease the variance of common characteristics of time series.
The DTW algorithm is widely used in speech recognition, 30 online signature matching, 31 gesture recognition, 32 data mining, 33 and time-series clustering, 34 as well as music and signal processing. When there is some drift in the time series or when the lengths of the time series are not equal, DTW can determine how similar two time series are when the Euclidean distance is not a reliable indicator of how different the two series are. By enabling "elastic" modifications of the time series to look for comparable patterns with various phases, the DTW algorithm reduces the effects of temporal shifts and distortions. The specific algorithm is as follows.
Given two time series: The algorithm begins by creating a distance matrix C where c stands for each pairwise separation between X and Y . The local cost matrix for the alignment of the two sequences X and Y is known as this distance matrix. The cumulative distances of the best regularized pathways, which define the overall matching cost, are added to get the DTW distance. DTW uses the idea of dynamic programming to find the minimum matching cost. The recursive formula is as follows: The degree of similarity between the two sequences is calculated as the minimal matching cost, which is the result.
However, the nondifferentiable nature of DTW computation that leads to its application cannot be directly extended to neural networks as a loss function, so Cuturi and Blondel 35 have developed the soft-DTW formula to compute all the minimum values for which a smooth dynamic program distance has been defined. Since soft-DTW is a differentiable loss function, its value and gradient can be computed. Equation (7) is the dynamic programming equation for DTW The soft-DTW algorithm improves the underlying dynamic programming equation according to Taylor's formula Since minimization min is a discrete process, this leads to a discrete DTW. Soft-DTW uses a continuous soft-min definition of this set. Equation (9) is the objective function of soft-DTW, in which each element A represents the alignment matrix of two time series. For a particular alignment matrix, only the point i j ( , ) on the path is 1 and the rest is 0. Thus, the matrix inner product △ A x y < , ( , )> denotes the cost sum under this path In the backpropagation of neural networks, it is necessary to calculate the gradient of the objective function, that is,  dtw x y ( , ) γ x , as in Equation (10)

| DWA
Appropriate loss weights must be adjusted for various tasks to balance several training tasks. We dynamically change the loss weights for each task in each training cycle using DWA, 36 ensuring that all tasks are given the same weight in terms of importance.
In Equation (11) the ratio of the relevant loss function values from the two prior training rounds should be used to define the relative falling rate of the kth task in the tth training round If w t ( − 1) k is minimal, it means that the t-1th training round reduces the loss and improves learning for this task, allowing the attention to this activity to be suitably lowered. The kth task's weight is computed as Equation (12) T in Equation (12) determines how flat the weight distribution is. The weights of all jobs are close to 1 for a big T. For the first and second training cycles, the weights are set at 1.

| SOH estimation based on transfer neural network with soft-DTW
We propose a soft time-series difference-based deep transfer learning method for lithium SOH estimation, to address the shortcomings of the conventional domain adaptive-based SOH estimation, which lacks a good measure of the distance between time series, and that the weights between two optimization objectives are not well balanced in backpropagation.
Soft-DTW is used in forward propagation, which calculates the distance between the two ends of our input time series and adds it to the loss of the backpropagation as one of the optimization objectives. The backpropagation of neural networks uses DWA, which strikes a good balance between the twin goals of decreasing neural network estimate error and time series similarity.
The proposed method's structure is shown in Figure 2. To obtain the SOH values, the data in the source and target domains are passed through two layers of gray correlation (GC) and GRU layers. The optimization has two goals: first, it seeks to close the gap between the estimated and actual SOH values in the target domain; second, it seeks to shorten the distance between the source and target SOH values. By employing DWA to balance the two objectives, the aim is achieved through directional propagation of the gradient.

| EXPERIMENTAL DATA AND DETAILS
Data on capacity deterioration from the LiBs testbed at the NASA PCoE Center 37 was used for the test. An 18,650 LiB with a 2 Ah rated capacity was utilized in the test. Measurements of electrochemical impedance spectroscopy impedance, charging, and discharging were all part of the tests. Its SOH value is shown in Figure 3. The battery statistics comprised capacity, voltage, impedance, charging current, and discharging current.
In addition to capacity, we also extracted the health indicator (HI) from the data, which we referred to as the time interval of equal charging voltage difference (constant current) 38 and time interval of equal charging current difference (constant voltage). 39 They then used GC analysis to determine how closely the HI and capacity compared. The experimental findings show both HI's efficacy for degradation indication.
For deep learning, the selection of hyperparameters is crucial and has a great impact on the model performance. The hyperparameters in this work are divided into F I G U R E 2 Flowchart of the training procedure for softdynamic time warping (soft-DTW) domain adaptation network. The primary training objectives for the proposed approach are twofold: (1) to minimize the discrepancy between the estimated and actual state-of-health (SOH) values, and (2) to reduce the distance between the source and target SOH values. To balance these objectives, we employed dynamic weight average to adjust the weights assigned to each objective during the training process. FC, fully connected; GRU, gated recurrent unit.
F I G U R E 3 State-of-health (SOH) changes of NASA three batteries. network framework, pretraining process, and domain adaptive process. The specific data are shown in Table 1.
For pretraining in the source domain, the number of epochs is 500. Undersized epochs cannot fit the estimated curve of LiB state of charge (SOC) and capture the characteristics common to both source and target domains, and an oversized epoch will lead to overfitting, which is counterproductive to the performance of the target domain.
In domain adaptation, the pretrained model already performs well in the target domain, so we set the learning rate small so that the model eventually finds the optimal solution. We set different learning rates for different layers, and the learning rate of the GRU layer is one-tenth of that of the FC layer, because the GRU layer at the beginning of the neural net learns more general and generic features, while the FC layer near the output layer learns more specific and not very generic features. The larger learning rate of the FC layer facilitates the net to better learn the higher-level features in the target domain.
In this article, root-mean-square error (RMSE) and mean absolute error (MAE) are used as the performance evaluation metrics actual predicted (14) 4 | RESULTS AND DISCUSSION

| Comparison of different SOH estimators without domain adaptability
In this section, three cells from the NASA data set (Nos. 05, 06, and 07) were chosen to test the SOH evaluation capabilities of SDDAN. The cells were charged using a constant current of 1.5 A until the cell voltage reached 4.2 V, and then they were charged using a constant voltage mode until the charging current decreased to 20 mA, using a constant current of 2 A. The voltage of cells 05, 06, and 07 dropped to 2.7, 2.5, and 2.2 V, respectively, after discharge, with each cell having a distinct SOC at the end of discharge. Table 2 lists the data sets that were utilized for the studies. For three sets of tests, we choose one battery data as the source domain and another as the target domain. Only the top 25% of the target domain is used for training in domain adaptation. The contrast between SDDAN and a nontransfer learning approach, which means training the model using only the top 25% of the target domain's data, is shown in Figure 4. The results demonstrated that SOH curves predicted by SDDAN are consistently closer to the actual SOH values, and the SOH inaccuracy is smaller than 0.1 in absolute terms. Due to the variations in cell distribution, SDDAN is able to learn domain-invariant characteristics from both the source and target domains. The performance of SDDAN with and without transfer learning is further contrasted in Table 3.

| Comparison of SDDAN with traditional transfer learning methods
The traditional domain adaptive approach is based on using the probability density function to measure the distance between the source and target domains, and using fixed weights to balance the tradeoff between fitting error and domain divergence. In this section, a series of comparisons are made between SDDAN and other transfer learning methods.
According to Figure 5, we compared SDDAN with traditional transfer learning methods, where fine-tuning is training the pre-trained model with addtional data from the target domain. MMD aims to map the data to a different space where the distance between distributions is maximized by minimizing a discrepancy measure.
As shown in  precisely quantify and minimize distribution differences under a variety of operating settings, making it a useful tool for battery state estimates in real-world engineering applications.

| Effect of train epoch for cross-domain SOH estimation
The training outcomes from the aforementioned trials all include 50 iterations. Because soft-DTW can provide the desired effect in a short amount of time, the number of training epoch should be kept to a minimum to prevent overfitting. In this section, we discuss the effect of training epoch on the estimator effect. Figure 6 shows the plot of RMSE as a function of epoch count domain for different transfer learning methods. As shown in Figure 6, the MMD method performs better by taking into account the differences between domains based on fine-tuning, and the SDDAN method performs best by better capturing the differences in time series. All methods achieve convergence after around 50 epochs and obtain adequate SOH estimation accuracy. Although a longer training epoch will undoubtedly produce better results, doing so will cost more money and need more computing time and resources.

| Robustness testing of SOH estimator
LiB data from the Center for Advanced Life Cycle Engineering (CALCE) at the University of Maryland, 40 College Park, were taken into consideration for testing to further confirm the robustness of the procedure. In this study, four cells from the experiments, CX33, CX38, CS33, and CS38, were chosen. CX stands for lithiumcobalt-acid batteries with a capacity of 1350 mA, while CS is for lithium-cobalt-acid batteries with a capacity of 1100 mA. At 0.5 and 1 C, respectively, cells CX33 and CS33, and CX38 and CS38 were discharged. the SOH curves are shown in Figure 7. We perform two sets of experiments using CALCE data with CX33 and CS33 as the source domain and CX38 and CS38 as the target domain, respectively. As with the NASA data set, the results of comparing different transfer learning methods using only 25% of the data in the target domain are shown in Figure 8 and Table 5.
The results demonstrated that, among these transfer learning methods, SDDAN is capable of obtaining the greatest SOH estimation. In practical engineering applications, experimental results demonstrated that SDDAN can still capture the intricate link between process data (such as current, voltage, and temperature) and battery SOH. Additionally, the use of soft-DTW substantially reduces the distribution discrepancies under various operation situations. SDDAN is a useful tool for battery SOH estimation in real-world engineering applications under various operating situations.

| CONCLUSION
SDDAN is proposed in this research as a solution to the issue of inconsistent data distribution among various time series domains. SDDAN measures the distance between time series using soft-DTW and back propagates it as a loss function. In the experiments, the loss weights of each job were balanced using DWA. The suggested approach produces superior prediction results than the underlying neural network as well as other transfer methods, and it does not require complex weight adjustment. According to the experimental results, SDDAN performs knowledge transfer more effectively and captures common features among time series. It provides new ideas for SOH estimation and RUL prediction in case of insufficient data, which is beneficial for practical applications.