Adaptive Calibration of Soft Sensors Using Optimal Transportation Transfer Learning for Mass Production and Long‐Term Usage

Soft sensors suffer from high manufacturing tolerances and signal drift from long‐term usage, which degrades their practicality. Although deep learning has recently been proposed to address these issues, it is expensive in terms of data collection and processing. Therefore, an adaptive calibration method is proposed for soft sensors, suitable for mass production and long‐term usage. In addition to maintaining the original benefits of deep learning characterization, this method enables fast and accurate calibration by capturing the change in the characteristics of the sensor through domain adaptation, using optimal transportation. An offline calibration method is first described, which is for alleviating the difficulty in calibrating every single unit from mass produced soft sensors. The main advantage is that identically manufactured soft sensors in a large volume with variations can be calibrated with reduced time and effort for collecting and processing data. Online calibration is then discussed, which compensates for the parameter changes when a soft sensor is continuously used for an extended period of time. For a single sensor, even though the sensor shows signal drift from the long‐term usage, the calibrated network weights can be quickly adjusted online. Finally, the proposed adaptive calibration is experimentally evaluated using actual soft sensors.

DOI: 10.1002/aisy.201900178 Soft sensors suffer from high manufacturing tolerances and signal drift from long-term usage, which degrades their practicality. Although deep learning has recently been proposed to address these issues, it is expensive in terms of data collection and processing. Therefore, an adaptive calibration method is proposed for soft sensors, suitable for mass production and long-term usage. In addition to maintaining the original benefits of deep learning characterization, this method enables fast and accurate calibration by capturing the change in the characteristics of the sensor through domain adaptation, using optimal transportation. An offline calibration method is first described, which is for alleviating the difficulty in calibrating every single unit from mass produced soft sensors. The main advantage is that identically manufactured soft sensors in a large volume with variations can be calibrated with reduced time and effort for collecting and processing data. Online calibration is then discussed, which compensates for the parameter changes when a soft sensor is continuously used for an extended period of time. For a single sensor, even though the sensor shows signal drift from the long-term usage, the calibrated network weights can be quickly adjusted online. Finally, the proposed adaptive calibration is experimentally evaluated using actual soft sensors.
conditions of the host polymer materials. [27,28] These problems not only lead to different offsets in the initial sensor outputs but also cause distribution of operating ranges. Therefore, to ensure the accuracy, the user needs to collect the training data of each sensor and then train its individual neural network. This step is expensive and cumbersome. It is also highly inefficient to use an individual neural network with different weights for each sensor when many sensors are simultaneously used. Imagine a situation in which 20 neural nets are used in parallel. It will be significantly slow, and the kernel may even shut down. The second limitation is the assumption that the characterization of the sensor can be completed by one-time calibration. However, if we use the same sensor for an extended period of time, the sensor calibration using deep learning will become useless, as the sensor output will drift due to several reasons, such as polymer aging and changes in sensor positions in case of applications in wearables. [29][30][31][32][33] This means the sensor needs to be periodically recalibrated and repositioned, which is tedious and time consuming.
In this article, we propose to address these two limitations by solving a domain adaptation problem. [34,35] Both limitations are caused by domain shifting, in which a set X shifts given a mapping f ∶X ! Y, but a task set Y remains intact. In both cases, X is the set of possible sensor values, and Y is the set of admissible reference stress or strain levels, depending on the purpose of the soft sensor. For characterizing a large volume of soft sensors, if a single sensor is calibrated, there exists a neural network mapping f θ ∶X s ! Y with a weight θ (Figure 1a). Rather than making another neural network g i,ϕi ∶X T i ! Y, we introduce the mappings γ Ã i ∶X T i ! X S , i ¼ 1, 2, : : : , n that connect each target domain to the source domain. Then, the neural network g i,ϕi can be replaced with a composite g i, i . Finally, if we find suitable mappings for γ Ã i , we can represent multiple sensor mappings using only a single neural network. For using a single soft sensor in a long term, the source should be the initial calibration with a neural network mapping f θ ∶X S ! Y, and the target domain will shift over time. We can introduce a mapping γ Ã t ∶X t ! X S , where t is the evolution time ( Figure 1b). Finding γ Ã i and γ Ã t is called an optimal transportation problem. In this work, we use microfluidic soft sensors to implement and experimentally evaluate the performance of our proposed method. The sensor is composed of a silicone matrix and an embedded microchannel filled with eGaIn. Any mechanical pressure or strain applied to the sensor changes the geometry of the embedded microchannel and consequently its electrical resistance. Our contribution is the use of optimal transportation transfer learning to solve two common problems in calibration of soft sensors 1) when a large volume of sensors is produced with manufacturing tolerances and 2) when a single sensor is used for a prolonged period of time. We validate solutions to these two problems through two different experiments: 1) calibration of seven identically manufactured sensors with variations in their characteristics and 2) calibration of a single sensor attached to a human body for gait cycle measurement. The result shows that the proposed method yielded a similar level of errors compared with previous studies even with less training time. [25,36] The remainder of this article is structured as follows. Section 2 explains the concept of a time-delay artificial neural network (TDANN) to compensate for nonlinearity and hysteresis of soft sensors and then discuss the optimal transportation problem to describe transfer learning and the algorithm used in this work. Experimental validation of the proposed method using microfluidic soft sensors is shown in Section 3 and 4 for the two cases: large-volume characterization and a single sensor calibration for long-term usage. Fabrication of the soft sensors and the data collection process for transfer learning are also described in these sections. Section 5 finally presents the results, followed by the discussion in Section 6 and the conclusion in Section 7.

TDANN for Calibration
Silicone rubber was used as the main material for the soft sensors in this work, and it shows hysteresis with time dependency due to the viscoelastic behavior. Therefore, using a recurrent neural network (RNN) to model hysteresis may be suitable, as it refers to past signals for learning. However, the goal of this article is to perform fast sensor calibrations and online estimations by transfer learning, Figure 1. Mass calibration and long-term usage problem can be redesigned as solving an optimal transportation problem (i.e., finding mappings γ Ã i ). a) Each sensor has a different sensor signal pattern, originated from a different domain, and the mappings γ Ã connect each target domain to the source domain. b) Due to long-term usage, polymer material of soft sensor experiences aging, which leads to domain evolution (shifting), and the domains are connected via mappings γ Ã .
www.advancedsciencenews.com www.advintellsyst.com and we decided to use a TDANN instead of an RNN. As we are interested in estimating the current state value rather than extracting the latent feature from the sensor signal or predicting the future sequence, there is no reason for a TDANN to fall behind an RNN. Implementation details for the structure of the TDANN and back-propagation steps are specified in Supporting Information.

Transfer Learning and Domain Adaptation
When designing a function that maps the relationship between the input and output through machine learning, it is most likely to be a process of decorating the neural networks or kernels by supervised learning. However, a necessary condition for such supervised learning is that the training dataset and the testing dataset must be extracted from the same probability distribution.
If not, the network should be redesigned, or the weights should be tuned by additional work. For example, imagine a situation in which a photographer trains a convolutional neural network (CNN) that enables classifying pictures of dogs and cats. He takes pictures of dogs and cats in the park and then trains the network.
In the next day, he goes to the park and takes pictures of dogs and cats again and then put them into the learned network. If the network is appropriately trained, it can readily classify the two classes with high accuracy. However, if he takes pictures of dogs and cats in a dark café and put them into the network, they cannot be adequately categorized due to the brightness difference and surrounding conditions. This is one of the most common drawbacks of machine learning because the network may need to be redesigned to accommodate the new distribution if the distribution of the sampling dataset changes. However, domain adaptation transfer learning used in our work makes this process easy. Let us define the source, the target, the domain, and the task through the aforementioned example. The source is the pictures of dogs and cats in the park, which describes the population of the sampled training datasets. The target is a population of the pictures of dogs and cats in a café, which is a sampling population of a testing dataset. The domain and the task are the concepts existing both in the source and in the target. We can think of a domain as an input of the CNN (the pixel value of the pictures) and the task as an output label.
Transfer learning can be divided into an inductive, an unsupervised, and a transductive process according to the conditions of the source, the target, the domain, and the task. We focus on transductive transfer learning in this article. Transductive transfer learning is a transition learning method that solves a problem when the source and the target domain distributions are different. For example, the distribution of the output values of a sensor may vary, but the distribution of the reference force or strain remains intact. The key feature of transductive transfer learning is that task labeling of the target is not necessary but only the target domain values are required. For the picture classification problem, pictures taken in the park should be labeled, but pictures taken in the café need not be labeled. Instead, the relationship between the distributions of the source and the target domains is studied. The mathematical definition of transductive transfer learning is shown as follows.
Let the source domain D S and its corresponding task T S , the target domain D T , and then, the corresponding task T T exists.
Transductive transfer learning finds the mapping f T ∶D T ! T T under the condition D S 6 ¼ D T , and T S ¼ T T by only using the information of D S , D T , and T S .
One way to solve the transductive transfer learning problem is to find a mapping which connects D T and D S , i.e., finding g∶D S ! D T , and finding the mapping is often called domain adaptation. In this article, we utilize this domain adaptation using the optimal transportation theory.

Domain Adaptation by the Optimal Transportation Theory
In this section, we review the concept of the optimal transportation theory to solve the domain adaptation problem for implementing it to our soft sensor application. The optimal transportation theory first starts with the Monge problem, which attempts to find a mapping that minimizes the target cost function while maintaining the image measure condition. [37] Let Ω S , Ω T ∈ R n be a compact, measureable space of dimension n, and there exists a probability distribution μ S ∈ PðΩ S Þ and μ T ∈ PðΩ T Þ, and the cost function c∶Ω S Â Ω T ! R þ . Then, the Monge problem seeks to obtain a mapping T∶Ω S ! Ω T which satisfies However, as this optimization problem is nonconvex, the solution may not exist. Thus, Kantorovich established a relaxed problem that finds a mapping in the joint distribution space [38] This problem is a linear program formulation that always has a unique solution. The Kantorovich problem can be described as a Wasserstein metric formula, as shown as where E½· is the expectation operator. The Wasserstein metric is often used to measure the distance between two probabilistic distributions. Therefore, the optimal cost for the optimal transportation mapping is equivalent to the 1-Wasserstein metric. [39] For the strictly convex cost function cðx, yÞ ¼ 1 2 kx À yk 2 , the dual Kantorovich problem can be written as maximizing the following equation where φ∶Ω S ! R, ϕ∶Ω T ! R is a bounded, continuous function satisfying φðxÞ þ ϕðyÞ ≥ xy. Chartrand et al. proved that if the pair ðφ, ϕÞ are convex conjugate functions, T ¼ ∇φ solves the Monge problem. [40] Other techniques, such as gradient descent, flow minimization, linear programming, or entropy regularization, exist as well.

Optimal Transportation to Sensor Calibration
We use a TDANN to calibrate the sensor output values and their corresponding reference values. The sensor changes its electrical resistance due to the changes in the geometry of the embedded microchannel, and the resistance value can be measured by a voltage divider with a static resistor connected in series. The reference values for the sensors are the force and the elongation length for the stress and the strain, respectively. The input and the output of the source are defined as a vector x s i ∈ X S and a scalar y s i ∈ Y S , respectively, where s is the source and i is the data number from an empirical perspective. Similarly, the input and the output of the target are x t j ∈ X T , y t j ∈ Y T . Then, our goal is to find a function that maps from the target domain to the target task f ∶X T ! Y T and the optimal transportation map γ Ã that satisfies the following equation Courty et al. solved this minimization problem using a block coordinate descent (BCD) algorithm. [39] We define another mapping γ Ã ∶X T ! X S . Then, by iteratively performing a stochastic gradient descent of γ Ã and f, we can finally solve the transfer learning problem ( Figure 2). As the source domain and the task data are transported to the target domain by γ, and the transformed source data acts as a training dataset of the target domain, the number of the target domain data may not be as large as the source domain data.

Materials and Manufacturing Methods
As mentioned earlier, we use a soft sensor made of silicone elastomer with an embedded microchannel filled with liquid metal (eGaIn). Park et al. have estimated the resistance change (ΔR) when a strain (ϵ) or stress (p) is applied where ρ, ν, L, w, and h are the resistivity of eGaIn, the Poisson's ratio of the elastomer, the length, the width, and the height of the microchannel, respectively. [8] For fabrication, a laser cutter has been used to create a microchannel on a silicone rubber block. [41] Similarly, silicone rubber has been cured in a 3D-printed mold with microchannel patterns. [8] However, we used a pneumatic dispenser integrated with a motorized x-y-z stage to directly print liquid-metal microchannel patterns on a silicone substrate and then cover the top with another layer of silicone to fully encapsulate the microchannel patterns. [42] Details on the fabrication process are as follows. A roomtemperature-vulcanizing (RTV) silicone elastomer (Ecoflex-0030, Smooth-On) was mixed using a centrifugal mixer (ARE-310, Thinky) with a weight ratio of 10:10:1 for parts A and B with an addition of a white pigment (Silc Pig, Smooth-On). The mixture was poured on a silicon wafer for spin coating at 200 rpm for 30 s. Once the elastomer was fully spread on the wafer, it was cured in an oven of 60 C for 20 min. The cured silicone substrate on the wafer was then transferred to the printing stage, and serpentine patterns were directly printed using eGaIn (gallium 75.5% and indium 24.5% by weight). For printing, a motorized x-y-z stage (Shotmaster 300ΩX, Musashi) and a pneumatic dispensing system (Supersigma CMIII V2, Musashi) were used with a laser distance sensor (LK-G32, Keyence) ( Figure 3). The printed trace was then covered with another layer of uncured elastomer, spin coated at 200 rpm for 30 s and cured in the oven. Finally, copper wires were plugged into the microchannel to make interconnections with the eGaIn channel at both ends for each sensor, and silicone epoxy (Sil-Poxy, Smooth-On) was applied to fix them in place.

Adaptive Calibration Procedure
We printed seven identical liquid-metal patterns (i.e., sensors) with the same geometry and printing conditions on a cured silicone substrate ( Figure 3). However, the initial resistance value of the seven channels were slightly different due to the manual wiring process and the nonuniform curing conditions. Without the help of transfer learning, we have to calibrate all seven sensors individually, which is a tedious and time-consuming task. To calibrate a sensor using our method, we first need to install a commercial load cell with one sensor and receive reference data that need to be mapped with the sensor data with synchronization. We then apply force to the sensor by compressing with different rates and magnitudes using a motorized stage and gather the output data. After collecting all the data, we preprocess them to match the 15-time sequence data of the sensor output voltage and a single reference force value (i.e., ½X tÀ14∶t , Y t , ∀t ≥ 15). Finally, a TDANN is trained by the minibatch gradient descent.
Instead of gathering the input-output tuple of all seven sensors individually and training them all, we use domain adaptation transfer learning here. After finishing the calibration of the first sensor that is used as a source distribution, the remaining six sensors are compressed with the motorized stage, and only the www.advancedsciencenews.com www.advintellsyst.com output values, which are used as the target domain data, are collected. In this stage, we do not need the full signal tuples. Even if we use only a portion of the signal values, we can still configure the optimal transportation map without considerable loss of accuracy. After all, as the source domain and the task data are transferred to the target region, it is the source data that learns the target input-output relation. In this problem, the source domain (X S in Equation (5)) is the sensor output which is completely calibrated, and the source task (Y S ) is the corresponding force output. The target domain (X T ) is the sensor output from the remaining six sensors, and the target task (Y T ) is the force output of the other six sensors. The influence of the size ratio between the source and the target is inspected in the result. The target domain data contribute only to formulation of the transportation mapping. This property means that the effort to calibrate the remaining sensors is significantly reduced compared with the process of repeatedly training the individual sensors with their input-output signal tuples. Finally, the optimal transportation mapping and the input-output relation mapping of the target sensor are found. We evaluated the result by calculating the root-mean-squared error (RMSE) between the estimated force by the TDANN and the reference from the load cell by applying forces to the seven sensors with the same test pressure inputs. Then, we evaluated the performance of the transfer learning by comparing the RMSE from individual training and the transfer learning together with the time and the computation costs. Finally, we investigated the influence of the source-target domain data ratio. If the performance does not decrease even though the size of the target domain data is far smaller than the source domain data, we can find the optimal ratio which consists of the time and the computation costs and the actual performance.

Materials and Experimental Settings
In this section, we discuss the calibration issue in longterm usage of a single soft sensor. The goal is to remove the contamination of the calibrated TDANN induced by the sensor drift online. Rather than doing a simple repeatability test with controlled conditions, we consider a more undefined and unidentified system, which can also be applied to human wearables for motion sensing. If a soft strain sensor is attached between the heel and the calf of the leg, the roll angle of the ankle can be estimated, referring to the soft sensor signal patterns. To obtain the reference data, an inertial measurement unit (IMU) was attached to the front of the knee and acquires roll angle. Our objective was to estimate the ankle angle without the aid of the IMU, only with the soft strain sensor (Figure 4). The experiments involving human subject have been performed with the full and informed consent of the volunteer.
A gait measurement module was developed for collecting the strain data from the soft sensor in real time. The resistance change of the sensor was measured by a voltage divider circuit and an analog-to-digital converter (ADC) included in a microcontroller (Arduino MKR-1000) powered by a battery and remotely operated. The sampling frequency was 50 Hz. An IMU (EBMotion V4, E2BOX) was additionally attached to the module to measure the reference pitch angles with a sampling rate of 50 Hz. A digital low-pass filter with a cutoff frequency of 96 Hz was applied for the accelerometer and the gyro sensors in the IMU module. The pitch angles were calculated by the IMU's own algorithm based on the sensor output from the accelerometer and the gyro sensors. All the measured sensing data were wirelessly transferred to a data acquisition module (myRIO-1900, National Instrument) and recorded with synchronization.

Online Adaptation and Learning
In this problem, the source domain (X S in Equation (5)) is the sensor output at the initial state, and the source task (Y S ) is the corresponding IMU angle. The target domain (X T ) is the sensor output when the time has passed, and the target task (Y T ) is the corresponding IMU angle.
The process of the experiment is as follows. First, a human subject wears the soft sensor and the IMU, then starts to walk on a treadmill, and signals from both the soft sensor and the IMU are collected. The signal from the soft sensor is the source domain and that from the IMU is the task value. Then, if a www.advancedsciencenews.com www.advintellsyst.com sufficient amount of data is collected, the TDANN starts to learn from the gathered data, whereas only upcoming signals from the soft sensor are mustered in the queue. The IMU signals are not queued here. When learning of the TDANN is complete, the queued soft sensor signals act as the target domain signal to update the TDANN weights using the domain adaptation, and then all the data in queue are cleared. However, the domain adaptation process is in progress, the incoming sensor signals are now accumulated in the queue again until the updating process ends and repeats to update the TDANN by the domain adaptation when the previous updating process is finished. Meanwhile, when the TDANN first succeed to learn the data, it can produce the estimation of the knee angle without the aid of the IMU. In summary, updating the weights by transfer learning and the state estimation can be performed simultaneously online, reflecting the signal drift of the soft sensor by the domain adaptation. We can improve this process by also updating the source data using the queued soft sensor signals and the corresponding TDANN estimation values. Optimal transportation works powerfully when the Wasserstein distance between the source and the target domains is small. To reduce the gap between the source and target domains, the source is updated by the best estimation of the previous target domain data. After adapting the transportation by the queued soft sensor signals, we estimate the ankle angle using the best-updated TDANN ever and use their inputoutput tuple as new source data. This process is graphically shown in Figure 5.
We evaluated the performance of this method by comparing the RMSE values in both cases in which transfer learning was and was not performed, and with the improved method. Moreover, we show the reference and the estimated values of each method and show that the performance gradually improved while the transfer learning was adapted.

Offline Adaptive Calibration for Mass Production
The size of the training set for the source sensor was 25 000, and that of the testing dataset was 2500. Optimal transportation was implemented by solving Equation (5), for which the size of the target domain was 1250-25 000. Figure 6 shows the adaptive calibration result of two soft sensors, which are the source and the target sensors. The source sensor was first trained with TDANN and tested with the testing dataset (Figure 6a) estimating the applied force level, for which its RMSE was 0.27 N. Then, optimal transportation was implemented to find the mapping from the target domain to the source domain, and the size of the target domain dataset was 25% of the source domain dataset. Figure 6b shows the result when optimal transportation was not implemented, and Figure 6c shows the result after optimal transportation was implemented. The RMSE improved from 0.70 to 0.29 N with implementation of the optimal transportation, and the result is comparable with   Table 1. The results from source learning and domain adaptation were indicated with the labels "Before domain adaptation" and "After domain adaptation," respectively, in the table, and "Own calibration" means the own input-output learning result, which is for comparison with domain adaptation. It can be noticed that the calibration was improved after domain adaptation was implemented. The influence of the size of the target domain dataset is shown in Table 2. The result shows that the correlation between the size of the target domain dataset and the RMSE value is weak from the observation that the RMSE value does not increase when the size of the dataset decreases. However, the target domain dataset should cover at least all of the admissible regions of the domain set to perform effective optimal transportation. Finally, the training time and the operating frequency before and after the domain adaptation are shown in Table 3. The result shows that the training of a single neural network required 86 s, but the optimal transport mapping required only 2.6 s to train. This is a dramatic reduction in calibration time. However, as the target sensors need to go through two mappings (optimal transportation mapping and the original neural network), the operating frequency was lowered from 40 to 25 Hz. Comparing to previous works that had conducted deep learning techniques or simple transfer learning, we maintained a similar level of error in force estimation. For example, the normalized-root-mean-squared error (NRMSE) of our result is between 8-11%, which is slightly higher than the previous work (8.38% [25] , 6.5-10% [36] ).

Online Adaptive Calibration for Permanent Usage
The raw data of the signals observed by the soft strain sensor were plotted to screen the distribution change (Figure 7). From this figure, we can conclude that a sensor drift exists when the soft sensor is used for a prolonged period of time from the observation of the input-output relation, as the IMU signal distribution remains intact whereas the soft sensor signal distribution changes over time. Comparison of the three cases, original, with domain adaptation, and with improved domain adaptation, is shown by evaluating the reference and the estimation sequence at each time interval and the evolution of the RMSE value as time passes     Figure 8 and 9). We can observe that the RMSE value increased when the sensor was used for a long time, but it stayed almost the same with transfer learning. These results strongly suggest that use of transfer learning (notably the improved algorithm) can prevent performance loss caused by sensor drift.

Discussion
The experimental results show that optimal transportation mapping is possible by analyzing the sampled dataset from the source domain and the target domain distributions. In mass calibration, the size of the target domain dataset was less than 25% of that of the source domain dataset. Adaptive calibration was still possible with a small RMSE value. However, the user must be cautious and ensure that the source task and the domain task distributions are identical; if not, it should work at least within the same scope.
In the case of long-term usage, the RMSE value stayed at the level of the initial calibration even with the time lapse. Although the reference corresponding to the actual value of the sensor (i.e., IMU signals) is not known in the online testing situation, it is a significant advantage to use only the domain signals to compensate the calibration by the aid of optimal transportation as the task space is intact. Finally, other types of neural networks, such as long-term-short-term memory (LSTM) networks can be also used in our work. There are variations in the techniques of transfer learning, from simple fine tuning to meta learning in artificial intelligence. One of possible alternative ways to calibrate soft sensors is   www.advancedsciencenews.com www.advintellsyst.com multidomain learning, where a single neural network handles data from a multidomain space and then provides results. It will be possible to apply this method to mass calibration, but online adaptation will not be possible as the multidomain learning requires a full training dataset from both domain and task data.

Conclusions
In this article, we solved the optimal transportation problem to address limitations raised in practical applications of soft sensors, for which it was successful to handle both mass calibration and long-term usage of soft sensors. First, avoiding generating multiple neural networks was possible by introducing a mapping γ Ã i that connects the source and the target domains. Second, the problem of signal drift of a soft sensor, when the sensor is used for a long period of time, was solved by updating the transportation mapping γ Ã i in real time. The experimental results showed that introducing optimal transportation reduced the error between the estimation and the reference values. Our method is memory efficient in terms of reducing the number of neural networks. Moreover, fast calibration makes online implementation possible. Finally, task reference data were not required. Future work will include the integration of the proposed methods with soft robot applications in a system level, such as wearable sensing suits for motion sensing, assistance, and rehabilitation. We believe this research will benefit research related to soft robotics where hysteresis and signal drift are major limiting factors for real-world applications. The implementation source code of the mass calibration problem is available at https:// github.com/mochacoco/AIS_Transfer_Learning.

Supporting Information
Supporting Information is available from the Wiley Online Library or from the author.