Drift‐Aware Feature Learning Based on Autoencoder Preprocessing for Soft Sensors

In this article, a novel approach is presented for drift‐aware feature learning aimed at calibrating drift biases in soft sensors for long‐term use. The proposed method leverages an autoencoder for data preprocessing to extract expressive signal drift traces features, and incorporates drift characteristics through the latent space representation in a long short‐term memory (LSTM) regression neural network. In the results, it is demonstrated that the proposed approach outperforms other typical recurrent neural networks, such as LSTM, gated recurrent unit, and bidirectional LSTM, with a reduced root mean square error of 60% for the training dataset (≈2.5 h) and 80% for the testing dataset (≈20 h). The proposed approach has the potential to optimize the performance of soft sensors with long‐term drift and reduce the need for frequent recalibration. By compensating for sensor drift using existing prior information and limited time data, the proposed neural network can effectively reduce the complexity and computational burden of the system, without the need for additional settings or hyperparameter fine‐tuning.


Introduction
The development of soft sensors has garnered significant attention in the field of soft robotics due to their compliance with the robot body and potential for miniaturization.Piezoresistive sensors [1][2][3][4][5][6][7] are widely used in soft robots for proprioception and exteroception.However, the accuracy of sensor measurement data is crucial for the soft robot's perception and control.Soft sensors may exhibit highly nonlinearity, hysteresis, and longterm drift of sensing signals, which pose significant challenges to calibration algorithms. [5,6,8]The main causes of sensor output signal drifting bias are polymer degradation and viscoelasticity of the material. [9,10]If not calibrated properly, the drift in the sensor signal can adversely affect the controller's performance, leading to a failure to achieve the desired optimal performance.Such biases result in a narrow mapping of the predicted detection range and accumulate overtime, potentially causing the malfunction of the subsequent controller.While the pre-straining method has been shown to mitigate signal drift to a certain extent, [2,11,12] it introduces a trade-off as it may result in a nonreproducible sensor response characterized by sensitivity fluctuations.
The data-driven approach has become widely adopted in soft sensor calibration. [8,13,14]Machine learning approaches can perform regression by leveraging datasets without explicitly modeling the sensor's characteristics (e.g., long shortterm memory [LSTM] neural network). [14]owever, long-term drift can cause the sensor signal to deviate significantly from the initial state.As the drift level increases overtime, typical data-specific machine learning methods can become less reliable and trustworthy in terms of prediction, [8] as depicted in Figure 1a.Moreover, repetitive calibration is time-consuming and ineffective.
[17][18] Some researchers have redefined the drift problem as domain adaptation and solved it using optimal transportation transfer learning, [15] while others have utilized Gaussian mixture domain adaptation [16] and deep subdomain learning adaptation network [17] to address it.To alleviate time-variant shift, a stateful LSTM network was implemented by training various initial conditions through a batch sampling process. [18]Although some researchers have attempted to mitigate sensor baseline drifting through the use of machine learning approaches, the investigation and validation of solutions specifically tailored for long-term sensor usage drift (over one hour) remain unexplored.Drawing inspiration from the successful application of unsupervised learning methods, such as autoencoder (AE), [19][20][21] in detecting sensor faults, there have been notable instances where unsupervised learning methods have been employed to address the drift detection problem.For example, Kim et al. utilized a variational AE model to detect semiconductor faults characterized by time-varying process drift. [19]In a separate study, Tian et al. employed Bayesian inference in conjunction with an AE to estimate and calibrate sensor drift bias caused by high thermal density, thereby achieving optimal control in a data center cooling system. [20]Moreover, Mirzaei et al. utilized sparse AE-based transfer learning to estimate hydrogen gas concentration and address the data distribution shift problem arising from instrumental variation. [21]owever, it is worth noting that these methods primarily focus on generating signals for normal conditions to detect sensor drift.Conversely, the extraction of drift pattern features through the representation learning of AEs becomes crucial for learning the regression relationship with drift correction.
AE is an unsupervised learning model that does not require labeled input data during the training process and learns dense latent feature representations. [22,23]It is commonly utilized for feature extraction, [24] dimensionality reduction, and denoising. [25]Furthermore, AEs find widespread applications in anomaly detection [26] and feature recognition. [27]The origins of AE research can be traced back to the 1980s, when neural networks were employed to learn input encodings for high-dimensional complex data processing. [28]Since the 2000s, the exploration of AEs has been further accelerated by the rapid advancement of machine learning techniques, resulting in the development of various types such as undercomplete AE, sparse AE, stacked AE, and denoising AE. [29] The undercomplete AE prevents overfitting by reducing the dimension of the hidden layer compared to the input dimension, thereby capturing the most significant features of the training data.In contrast, the sparse AE incorporates a sparsity loss term to penalize the activation level of hidden units and includes a regularization term (L1 or L2 regularization) in the loss function to encourage the model to learn other features. [30]Stacked AEs have the ability to learn multiple levels of representation and abstraction by stacking multiple AEs. [31]Following layer-wise unsupervised pretraining, a classifier or regressor is added to the final layer, and the entire neural network is fine-tuned using backpropagation to optimize the hyperparameters.Additionally, the denoising AE employs corrupted data, such as introducing Gaussian noise to the original data, as input to generate lossless output data, thereby enhancing generalization ability and robustness against data corruption.In this study, the sparse AE was selected as the proposed learning method due to its utilization of a sparsity loss term, enabling the learning of more general features.
In this article, we present an integration of the AE into the calibration process of a kirigami-inspired piezoresistive sensor.Similar sensor designs have been previously presented in our works. [5]During the sensor's operation, we observed the occurrence of sensing signal drift caused by material failure during stretching.To mitigate this issue, we propose the utilization of an AE for drift-aware feature learning, specifically targeting the calibration of drift biases in soft sensors for long-term applications.Our method combines a data preprocessing AE with an LSTM regression neural network to extract informative data features and incorporate drift characteristics, including drift signal traces, through the latent space representation.
By incorporating processed dynamic historical information and signal derivatives, we establish a temporal sensor regression relationship between the sensor signal and the desired physical measurement quantity.This relationship is encoded within the latent representations generated by the proposed neural network.As a result, our approach effectively compensates for sensor drift using existing prior information and limited time data, without requiring additional settings or hyperparameter fine-tuning.Consequently, the complexity and computational burden of the system are reduced significantly.Altogether, this article contributes the following.1) this article proposes a novel drift-aware feature learning approach to compensate for the long-term drift in sensor signals by leveraging existing prior information and limited time data.The approach utilizes an AE to capture and extract the drift features, which are then fed to an LSTM neural network to learn the temporal regression relationship.2) The proposed solution employs two strategies that leverage the characteristics of the output measurement quantity to segment the drift signal trace datasets.The datasets carrying drift patterns and hidden temporal information are clustered using unsupervised learning methods based on the correlation among the sensor output signal, output quantity, and time.3) Performance enhancement techniques are introduced and discussed, including the involvement of signal derivatives and the avoidance of the saturation region.
For the following parts of the article, Section 2 outlines the algorithm for drift-aware feature learning and provides an overview of the neural network structure.The experimental setup for validating the long-term drift calibration of the kirigami sensor is presented in Section 3. The results of the neural network training progress and performance are analyzed and evaluated in Section 4, while measurement noise and microcracks during sensor stretching are discussed in Section 5. A summary of the current methods and future works is presented in Section 6.

Algorithm
The soft sensor signal tends to drift from its origin due to the inherent properties of the material, such as viscoelasticity or polymer degradation, as well as fatigue of material during stretching. [5,8]Soft sensors that employ viscoelastic materials experience stress relaxation during deformation, while sensors designed with a buffering-by-buckling structure (kirigamiinspired structure) to enable high stretchability may suffer from fatigue due to significant stress concentration in the cycle trails. [5]he sensor measurement of a flexible piezoresistive strain sensor made from conductive silicone with a kirigami-inspired structure, as shown in Figure 2, provides a good illustration of this drift phenomenon.Due to the rearrangement of the conductive fillers and the viscoelastic response exhibited by the elastomer matrix, [2,32] the sensor undergoes long-term displacements (as depicted in Figure 2a), resulting in temporal variations in the level of drift observed in the sensor signal overtime, as illustrated in Figure 2b.This temporal drift subsequently influences the relationship between the sensor's output signal and the corresponding physical measurement quantity, in this case, the extended length, as demonstrated in Figure 2c.This deviation in the mapping relationship renders typical regression methods ineffective and requires repetitive sensor calibration.
To address this bias behavior, we propose a drift-aware feature learning approach that utilizes an AE to preprocess historical data and interleaves a typical LSTM regression neural network with processed condensed latent representations of drift trace information.The proposed algorithm architecture consists of a data preprocessing AE and an LSTM neural network regressor, which are used in both the learning phase (Figure 4a) and the updating phase (Figure 4b).The AE is capable of learning and extracting the featured drift characteristics, such as patterns, through the latent space representation.The regression neural network then learns the mapping between the preprocessed data carrying temporal drift information and the desired physical quantity.

Drift-Aware Feature Learning Stage
Due to the inherent nature of soft materials, the sensor signal of soft sensors exhibits high nonlinearity, hysteresis, and long-term drift.Unsupervised learning models, such as AEs, can be adopted to capture featured information from the drifting signal trace datasets, as they do not require labeled data compared to supervised learning approaches.As illustrated in Figure 3a, the signal trace datasets can be segmented into several sets (clusters) corresponding to the range of the measured physical quantity.For instance, the measured physical quantity, the extended length of the kirigami sensor in this case, can be divided into three regions (0-20, 60-80, and 80-110 mm, respectively).The clusters of these regions contain corresponding temporal sensor readings, forming the signal drift datasets.Alternatively, the dataset partition can also be treated as a clustering problem and solved intuitively using unsupervised learning techniques, such as k-means clustering, [33] as shown in Figure 3b.To be more specific, the first approach involves segmenting based on output region, where the output region is equally divided into multiple regions.Sensor readings are then clustered based on their corresponding outputs.This method assumes that each region contains unique information regarding signal drift, which contributes to the regression learning of the mapping relationship.Alternatively, the second segmentation method utilizes unsupervised clustering techniques such as k-means clustering.Unlike the region-dependent segmentation, this approach incorporates an additional dimension of time by considering the correlation among the sensor output signals, extended length, and time.As a result, the clusters generated through this method are data driven and encompass multiple dimensions.Each time-series trace refers to a certain range of measurement and involves spatial and hidden temporal knowledge, representing the status of sensor readings.Such information is learned and extracted as a latent space representation in the AE, enabling the generation of an unseen level of drift features.
In addition, the drift level of the sensor signal is time variant and related to velocity.Influenced by the viscoelastic material, the sensor signal is also relative to the sensor's deformation speed, which has been utilized by researchers who incorporated the first-order derivative and second-order derivative of the sensor readings into neural network learning to improve the regression performance. [5]These parameters are directly associated with velocity and acceleration, while higher-order derivatives may hold less physical significance.Along with these derivatives, the preprocessed data and sensor signal are fed to the subsequent regression neural network, which enhances awareness of the sensor signal's long-term drift.The learning procedure is explicitly illustrated in Figure 4a.

Drift-Aware Feature Updating Stage
In real-world scenarios, the sensor signals are time series and sequential.After estimating the first predicted measurement through the regression neural network, the drift traces of the corresponding physical quantity region (clusters) are updated using the current sensor reading.For the traces beyond the cluster range, the next data remain the same value as the previous point for each trace.All the clusters containing signal traces are processed in the AE and extracted into latent representations.The preprocessing data, along with the next sensor reading and its derivatives, are then fed to the regression neural network again for the next estimation, as illustrated in Figure 4b.

Neural Network Architecture
The neural network architecture utilized in this study is illustrated in Figure 4, where the main components of the architecture include an AE and an LSTM regression neural network.
As shown in Figure 4a, the AE is composed of an encoder and decoder. [34,35]In the encoder part, the input x ∈ ℝ D is mapped to the latent space representation z through a transfer function h.
where W represents the weight matrix and b represents the bias vector.
In the decoder part, the compressed representation z is mapped to a reconstruction of the original input b x through another transfer function h 0 .
where h 0 , W 0 , and b 0 denote the transfer function, weight matrix, and bias vector of the decoder, respectively.
The AE is designed to reconstruct its input x with output b x, and it is trained to minimize the loss function L, which is represented by the mean squared error (MSE) between the original input x and the replicated output b x.
where N is the total number of training data and K is the size of input layer.For sparse AE s, a penalty regularization term Ω sparsity is introduced to the loss function to encourage the sparsity of the whole neural network.This term leverages the average output activation level of a neuron b ρi.
where n represents the total amount of input datasets used for training.The subscript j denotes the jth input data, whereas another subscript i refers to the ith row of the weight matrix W and the ith value of the bias vector, respectively.The Kullback-Leibler divergence is used as the sparsity regularization term Ω sparsity , [36] which is a difference measuring function for two distributions.
For feature extraction, another penalty term, L2 regularization Ω weights , is added to the loss function L, which accounts for the sum of the squared entries of the weight matrices for each layer.
where the number of hidden layers is denoted by L, while n l and k l represent the output size and the input size of layer l, respectively.The eventual form of the loss function L for sparse AE is defined as follows where λ and β represent the coefficients for the L2 regularization term and the sparsity regularization term, respectively.In this work, the performance of different neural networks is evaluated and listed in Table 3.The evaluation involves typical recurrent neural network (RNN) neural networks and the proposed neural network with different clustering regions.For neural networks with clusters less than 5, each cluster is trained with an AE to ensure good replication capability.In comparison, unsupervised clustering can generate various clusters and is trained using one AE to extract condensed latent representations.For example, 10 clusters are generated from the k-means clustering approach, and then condensed in the AE to 5 latent  3).For convention, AE ([0,20]) represents the neural network with an AE of a cluster containing signal trace datasets from 0 to 20 mm.
The training and testing of the whole neural network is implemented in MATLAB 2023a using Deep Learning Toolbox, with the following PC configuration: an Intel i5-10 400 2.91-GHz processor, an NVIDIA GeForce GTX 3060 GPU, and 16 GB of RAM.

Materials and Manufacturing Methods
The drift-aware feature learning is validated in a real-world application, a kirigami soft strain sensor developed from conductive silicone, as shown in Figure 5.The fabrication process follows previous work. [5]The material used for fabrication is conductive silicone with a Shore hardness of 62 A (KE-3601SB-U, Shin-Etsu Silicone, Japan).Initially, a 20 Â 30 mm kirigami pattern layout is designed using AutoCAD.The material sheet is then laser cut following the predesigned kirigami pattern (CMA960, YUEMING Laser, China) and cleaned of dust on the surface using isopropyl alcohol.Finally, the two ends of the sensor are taped around copper electrodes with leading wires.

Experimental Settings
The experiments are implemented in a customized universal tensile testing machine, as shown in Figure 5.One end of the kirigami soft strain sensor was fixed by a jig, and the other end was stretched by a jig on the moving platform of the slide screw.To measure its resistance changes, the piezoresistive strain sensor was connected to a voltage divider circuit in series with a 10 kΩ resistor.
Details on the data acquisition process are as follows.To acquire adequate long-term usage trace datasets, the soft strain sensor was stretched to a random position within a range of 110 mm (≈370% elongation ratio) for a duration of 22 h.The speed of the moving platform of the slide screw varied between 100 and 400 mm min À1 .To ensure the stability of the entire test, the sensor was securely fixed to the moving platform, and its initial position was carefully marked before the test.After the cyclic stretching, the position of the sensor was verified to confirm the test's overall stability.A DAQ device (USB6212, National Instrument, USA) was used to record the sensor raw measurement and the extended length.
Deep learning approaches, such as LSTM, require a large amount of datasets for training to prevent overfitting of the training data and poor model generalization ability. [37]A sufficient time-scale dataset is desired to validate the proposed method with respect to its long-term performance.Consequently, the dataset is divided as follows: the training dataset consists of the initial 2.5 h, and the testing dataset consists of the following 19.5 h.It is noticed that the data from the first 10 min are excluded in the training dataset since the sensor is just stretched as a warm-up.Regarding the impact of temperature on the sensor's performance, the experimental environmental temperature was controlled at 23 °C.

Results
The determination of the model structure for the AE is established through a process of fine-tuning various sets of hyperparameters, guided by empirical evaluation.For the activation function, the conventional sigmoid activation function is employed in the encoder.In the decoder component of the neural network, the objective is to reconstruct continuous drift signals without imposing any specific range constraints, leading to the utilization of a linear activation function.
The selection of the AE model is based on the model's capability to adequately replicate the input data (indicated by low MSE) and its capacity to extract condensed hidden representations for feature extraction, while maintaining sufficient sparsity for generalization ability.Considering the AE model that satisfies these requirements (highlighted in red color in Table 1), the associated neural network hyperparameters are employed for preprocessing the data to extract drift traces.
Following the fine-tuning process, the AE structure utilized in the proposed method consists of an encoder with a sigmoid activation function, hidden representations with a size half that of the input data dimension, and a decoder with a linear activation function.The AE is trained for 1000 iterations on historical datasets, with the following hyperparameter settings: the coefficient of L2 weight regularization (λ) is set to 0.001, the sparsity threshold (δ) is set to 0.05, and the sparsity regularization term has a coefficient (β) of 1.After obtaining latent features from the AE, the sensor signal along with preprocessed data is fed to the regression model to handle sensor drifting and train a reliable regressor.Typically, machine learning-based sensor calibration methods employ LSTM neural networks for regression learning.In conventional approaches, the sensor's output signal is used as input to establish the mapping relationship.However, in our proposed method, we introduce the condensed drift tracking information obtained after preprocessing with an AE model.This additional information, combined with the inclusion of kinematic knowledge (signal derivatives), contributes to the awareness of signal drift during the regression relationship learning process.
Following established practices in neural network research, [7,38] we trained different sets of hyperparameters to achieve optimal performance.This involved fine-tuning the number of hidden layers, the number of hidden states, and the dropout layer rate.To prevent overfitting, early stopping with a patience parameter of 100 and L2 regularization (set to 0.2) are applied.The neural network is trained using the Adam optimizer with a learning rate of 0.001. [39]The root MSE (RMSE) is chosen as the cost function to evaluate the performance of the neural network.As an initial step in drift-aware feature learning, we utilized AE [20] as a prototype for training the neural network.Various sets of hyperparameters were explored within the ranges [1-4, 10, 25, 50, 100], and [0.1, 0.25, 0.5].The training results for these sets are summarized in Table 2.After considering the neural network with the lowest RMSE value (highlighted in red color), we employed the associated neural network hyperparameters for long-term data prediction estimation.
In general, the LSTM regression neural network architecture comprises an input layer, two hidden layers (each incorporating a dropout layer), and an output layer.Each hidden layer consists of 100 hidden units, along with a dropout layer having a dropout rate of 0.1 to mitigate overfitting.
The training progress plot, shown as training loss against steps, is illustrated in Figure 6.Each colored line represents the training performance of the neural network with specific data-preprocessing settings, and the typical LSTM regression neural network is used as a reference (dark blue line).It is evident that the neural networks with drift-aware feature learning exhibit better and more stable training results (below 0.02 in training loss) with increasing training steps than the typical LSTM regression neural network with and without signal derivatives.With more signal traces segmentation, the training loss tends to drop significantly before 20 training steps and approaches 0.01 eventually, indicating that the AE processed drift information is critical to the regression relationship between the sensor signal and physical output quantity.
The overall neural network performance, including training (first 2.5 h) and testing results (remaining 19.5 h), is summarized in Table 3.The RMSE and coefficient of determination (R 2 ) are  used as evaluation parameters in the assessment of overall temporal performance, similar to other research teams. [15,18]ccording to the results, the typical RNN regression neural networks, such as gated recurrent units (GRU), LSTM, and bidirectional long short-term memory (BiLSTM), have a high overall RMSE for both training and testing datasets, which are larger than 14 and 47 mm, respectively.With drift trace information, the RMSE was significantly reduced to under 10 mm for training data and 20 mm for testing data.Compared with the neural network using raw sensor signals as drift traces, the approach leveraging AE to process drift traces would result in a further reduction in RMSE, with 8.48 and 11.26 mm for training and testing results, respectively.This is also reflected markedly in the R 2 of testing datasets (0.77), which is close to the training result (0.87).Together with signal derivatives and more signal traces segmentation, the corresponding test data RMSE is decreased to 9.09 mm, and R 2 is increased to 0.85 for the neural network There is a special case with additional segmentations (AE [20,40], AE [40,60]), resulting in a larger testing RMSE of 14.77 mm.It will be discussed in depth in the following paragraph, along with an illustration in   temporal style.Regarding segmentation using unsupervised learning methods, the following regression neural networks would yield superior performance in testing results with RMSE and R 2 of ≈8 mm and 0.88, respectively.
Additionally, the evolution of estimation error for each neural network is depicted in Figure 7.The RMSE of conventional LSTM regression neural networks tends to increase overtime, while the performance of the drift-aware feature learning remains at the same level below 10 mm.
To achieve a convincing visualization in the temporal dimension, the time-variant estimation results and R 2 of representative neural networks are illustrated in Figure 8.The estimation of  For soft sensors, a saturation region exists where the sensor signals exhibit insensitivity to changes in the measurement output, [5][6][7] as depicted in Figure 9.As indicated by the pinkhighlighted area, the saturation introduces a discrepancy between the estimation results and the actual extended length, resulting in a deflection with a span of 50 mm in the R 2 graph.This phenomenon has a significant impact on the regression of sensor prediction when using a neural network with segmentation that includes this region (AE[0,20] þ AE [20,40] þ AE [40,60] þ AE[60,80] þ AE[80 110]).However, by intuitively excluding this portion in the segmentation [20,60], the performance of drift-aware feature learning remains consistent and improves with increased segmentation.This improvement is characterized by a higher level of agreement between the estimated and actual extended lengths.
Neural networks (e.g., AE [0,20] þ 9 clusters [60 110]) with varying clustering numbers were trained to elucidate the process of determining the optimal cluster number, as demonstrated in Figure 10.Upon examination of the Figure 2, it is evident that the RMSE diminishes as the cluster number increases (up to 9 clusters), after which it reaches a saturation point between 9 and 11 clusters for both the training and testing results.Notably, while the training RMSE continues to decline, the testing results exhibit a substantial increase due to overfitting stemming from excessively dense clusters.To mitigate the effects of overfitting, a selection evaluation of the cluster number is imperative through iterative refinement.

Discussion
Regarding sensor measurement noise, we conducted a power spectrum analysis using fast Fourier transform on two sets of data: the complete 22 h dataset and the first 20 min of data.As shown in the periodogram (Figure 11), our analysis reveals relatively lower levels of high-frequency noise, whereas the presence of low-frequency spectrum power can be attributed to long-term signal drift and sensor's movement.
The scanning electron microscope with model JSM-7800 F (JEOL Ltd., Japan) was employed to examine distinct areas of the samples subjected to various stretching durations under 10 kV, involving unstretched sample, and sample stretched for 20 h.Prior studies on kirigami structures have indicated the presence of stress concentration at the edge of the hole. [5,40]onsequently, closer inspections were performed on the hole (corner) and hole (center) regions, as shown in Figure 12.As illustrated in Figure 12, upon closer examination, microcracks were observed at the edge of the holes in the sensor after being stretched for 20 h (hole [corner] and hole [center], as shown in Figure 12b).Conversely, no evident microcracks were observed in the unstretched sample following laser cutting, as depicted in Figure 12a.This observation suggests that the stretching motion contributes to the occurrence of microcracks at the sensor's edge.

Conclusions
This article presents a novel approach for drift-aware feature learning to calibrate drift biases in soft sensors for long-term use.The proposed method leverages an AE for data preprocessing to extract expressive signal drift traces features and incorporates drift characteristics through the latent space representation in an LSTM regression neural network.Our results demonstrate that the proposed approach outperforms other typical recurrent neural networks such as GRU, LSTM, and BiLSTM, with a significantly reduced RMSE for both the training and testing datasets.By compensating for sensor drift using existing prior information and limited time data, our proposed neural network can effectively reduce the complexity and computational burden of the system and the need for frequent recalibration.Our proposed approach has potential applications in various fields, including optimizing the textile manufacturing process and other fields that use soft sensors.
Moreover, our study evaluates the performance of our proposed approach using RMSE and R 2 as evaluation parameters.The results show that the proposed approach significantly reduces the RMSE to under 10 mm for training data and 20 mm for testing data.By incorporating signal derivatives and more signal traces segmentation, the corresponding test data RMSE is further decreased to 9.09 mm, and R 2 is increased to 0.85 for the neural network (AE[0,20] þ AE[60,80] þ AE[80 110]).Our findings demonstrate that our proposed approach provides a promising solution to address the challenge of drift biases in soft sensors, and has the potential to significantly improve the performance and reliability of soft sensors for long-term use in various fields.However, there are still some limitations of the proposed learning methods.The performance of the calibration neural network can be significantly influenced by the range of the saturation region when segmenting the drift traces using clustering techniques.If multiple saturation regions exist, the clusters may be limited, thereby reducing the effectiveness of the correction process.To address this issue, one potential solution is to adjust the cluster sizes (by reducing the range) or alternatively, employ hierarchical clustering to determine the appropriate level of clustering.
Finally, our approach can be further improved by incorporating unsupervised learning methods to enhance the segmentation strategy and yield superior performance in testing results with RMSE and R 2 of ≈8 mm and 0.88, respectively.The proposed method can also be extended to address the saturation region of soft sensors, which produces a gap between the estimation results and actual extended length.By intuitively skipping this proportion, the performance of drift-aware feature learning could remain consistent and improve with increasing segmentation.

Figure 1 .
Figure 1.The long-term usage estimation results of neural networks.a) The typical calibration results of a long short-term memory (LSTM) neural network.In contrast, b,c) enhanced performance achieved through drift-aware feature learning with segmentation and unsupervised clustering, respectively, is demonstrated.

Figure 2 .
Figure 2. Long-term sensor measurement drift patterns.a) Measurement of the extended length in the kirigami conductive silicone sensor.b) Corresponding sensor output signal.c) Relationship representations depicting their relationship.The gradient color indicates temporal variations.

Figure 3 .
Figure 3.The segmentation of signal drift traces for autoencoder (AE) preprocessing.a) Scenario shows the typical segmentation based on the output region (extended length).b) Scenario demonstrates unsupervised clustering by the correlation among the sensor output signal, extended length, and time.

Figure 4 .
Figure 4. Neural network architecture.The drift-aware feature learning is composed of two phases: a) the learning phase and b) updating phase.

Figure 6 .
Figure 6.The neural network training progress plot.

Figure 8 .
Figure 8.The testing results overtime and in terms of the coefficient of determination.Each panel represents a specific neural network setting and corresponding neural network training computation time.The pink square area denotes the saturation region.
each neural network overtime in the blue line is compared to the real value in the orange line, while they are forming in the light blue and light green regions, respectively, in the coefficient of determination scatter plots.Although LSTM neural networks can handle long sequential data, the estimation tends to drift overtime, similar to the sensor signals, and this phenomenon is also reflected as a shift in the R 2 scatter plot.Thanks to the aid of signal derivatives and AE feature extraction of signal traces, the temporal deterioration of prediction results was expressively alleviated, corresponding to the narrowed gap between training and testing datasets in the R 2 graph.

Figure 11 .
Figure 11.Power spectrum analysis with a) whole 22 h sensor readings and b) first 20 min sensor readings.

Table 2 .
RMSE of LSTM neural networks with various hyperparameters in a) one hidden layer, b) two hidden layers, c) three hidden layers, and d) four hidden layers.

Table 3 .
Neural network performance.The asterisk represents the neural network setting with outstanding performance.