Low-Cost Data Glove Based on Deep-Learning-Enhanced Flexible Multiwalled Carbon Nanotube Sensors for Real-Time Gesture Recognition

through the microspine structure to enhance the strain sensitivity. Moreover, an ef ﬁ cient signal processing strategy based on an adaptive wavelet threshold function to improve the robustness and anti-interference of signals obtained from MWCNT sensors, which exhibit strong generalization and can be used in other nanomaterial strain sensor. Based on the depth-wise separable convolution, a novel hybrid convolutional neural network (CNN) long short-term memory (LSTM) model for gesture recognition is constructed. The proposed model achieves an average accuracy of 97.5% and recognition accuracy of 30 gestures with an average recognition time of 2.173 ms based on only ﬁ ve sensors. The fabricated data glove is a promising platform for low-cost and wearable human – machine interaction that can be directly interfaced in applications such as robotic hands, smart cars, and ﬁ rst-person shooting games. the motion signals change more obviously under the same ﬁ nger bending condition, the sensor was cut from the middle. Finally, an Eco ﬂ ex layer was used to encapsulate the sensor. Signal Collection and Dataset Con ﬁ guration : The motion signals were collected in real time by an 8-bit microcontroller (STM8L051F3P) for multiple-channel voltages at a frequency of 200Hz. In a python workspace, the sliding window moved in steps of six data points to cut the collected mul- tichannel sequence signals and extract signal segments. The signal segments were taken as samples and automatically labeled. Each gesture collected 500 samples,ofwhich400samples(80%)wereusedfortrainingand100samples(20%)wereusedfortesting,soatotalof15000samplesfor30gestures.Theexperimenthasbeenperformedwiththeconsentofthesubject.

DOI: 10.1002/aisy.202200128 With advancements in artificial intelligence, wearable motion recognition systems based on flexible nanomaterial sensors exhibit excellent potential for harmonious human-machine interaction. However, the sensing stability and demand of large-scale arrays limit the application of flexible nanomaterial sensors. Herein, a data glove system based on simple multiwalled carbon nanotube (MWCNT) sensors and a lightweight deep-learning algorithm to achieve accurate gesture recognition is proposed. A regional-crack mechanism is introduced through the microspine structure to enhance the strain sensitivity. Moreover, an efficient signal processing strategy based on an adaptive wavelet threshold function to improve the robustness and anti-interference of signals obtained from MWCNT sensors, which exhibit strong generalization and can be used in other nanomaterial strain sensor. Based on the depth-wise separable convolution, a novel hybrid convolutional neural network (CNN) long short-term memory (LSTM) model for gesture recognition is constructed. The proposed model achieves an average accuracy of 97.5% and recognition accuracy of 30 gestures with an average recognition time of 2.173 ms based on only five sensors. The fabricated data glove is a promising platform for low-cost and wearable humanmachine interaction that can be directly interfaced in applications such as robotic hands, smart cars, and first-person shooting games.
sensors with lightweight neural networks, [36] especially in the field of motion monitoring [37] and real-time interaction. [38,39] In contrast to the common visual recognition method using the standardized mass data from the camera, the distinguishing pattern features are difficult for wearable systems because the sensor number and the measured data size are limited by the cramped spaces. Although classical machine learning methods, such as support vector machine [40][41][42] and K-nearest neighbor, [43,44] have been used in motion recognition in nanomaterial sensors made by CNTs, [45,46] Mxene, [47,48] graphene, [49] and silver nanowires, [50] current algorithms still require high hardware configuration and computational costs, which are not appropriate for the applications that require real-time feedback and small device areas such as data gloves. Specifically, few methods have been proposed for the real-time detection of finger states based on the limited sensor number.
In this study, we fabricated a low-cost strain sensor by conveniently brushing MWCNTs on a usual flexible substrate and incorporated the sensor in a data glove system to realize real-time gesture recognition ( Figure 1). The working principle of the MWCNT sensors is the widely used piezoresistive principle, and the regional-crack mechanism was introduced through the formation of the microspine structure using the sandpaper template to enhance the strain sensitivity. To restrain the baseline drift and noise, we introduced an adjustment factor into the wavelet threshold function and proposed an adaptive calibration algorithm that can automatically adjust its threshold according to the degree of signal mutation. Moreover, depth-wise separable convolution was adopted to develop a lightweight and efficient hybrid CNN-LSTM model. After extracting the composite spatial-temporal feature of the motion signals, more than 30 gestures were accurately recognized based on only five simple MWCNT sensors. The adaptive calibration algorithm and gesture recognition model are generic and can be expanded to other nanomaterials sensors. We implemented various applications, including mechanical claw remote control, wireless car control, and virtual interaction, that required real-time interaction.

Fabrication of MWCNT-Brushed Sensors
Accurate gesture recognition requires devices with sufficiently high sensitivity and high finger conformability. In this study, a simple brushing method was used to coat the conducting MWCNT films onto a resilient Ecoflex substrate with a microscale spine structure. This low-cost method can be easily scaled for large-area substrate and mass production. Figure 2a displays the fabrication process of the sensor. First, the Ecoflex solution was spin coated on precleaned sandpaper as the template. MWCNTs were directly brushed on the cured Ecoflex surface using a banister brush. After cutting and pasting copper tapes as the contact wires, the sensor was encapsulated by another Ecoflex layer.
The fabricated device and morphology of the brushed MWCNT films are detailed in Figure 2b,c, where an apparent spine structure consisting of microscale peaks and valleys was formed on the Ecoflex surface. The average sizes of the microspine, which depend on the grits of sandpaper, were 250, 106, and 35 μm for P80, P150, and P400, respectively. Crucially, as shown in Figure 2d, the MWCNTs were evenly distributed on both the valley and peak of the spine, which revealed that the nanotubes were effectively embedded in the microstructure through rubbing and pressing between the brush and the Ecoflex surface.

Strain-Sensing Mechanism
Typically, the working principle of piezoresistive strain MWCNT sensors is the resistance change resulting from the crack effect on the crosslinked MWCNT conductive networks. [51] Here, a regional-crack mechanism was introduced through the microspine structure, and the strain sensitivity of the networks can be enhanced considerably. As shown in Figure 3a, an electrical model with a resistance network was developed to specify the microspine structure effect, in which the resistances of the MWCNTs embedded in the spine peak and valley are R p and R v , respectively. To facilitate the calculation, the circuit can be simplified, as displayed in Figure 3b, where the overall equivalent resistance R total is described as follows.
When the MWCNT film is stretched, stress is concentrated on the valley bottom and the spine valley deforms first, causing the fracturing of some conductive pathways, where the resistance of both R v and theR p is almost unchanged, resulting in the increase in R pv and R total according to Equation (2). As the applied strain gradually increased, the spine peaks were gradually deformed and more conduction paths break at both peak and valley. Therefore, the resistance of R p increased with increasing strain, and the resistance value of R v and R pv both increased drastically, making higher sensitivity. In the experiment, the surfaces with various roughness were prepared by different sandpaper models (the grits of P80, P150, and P400), and the real-time responses of these sensors under the strain from 10% to 200% are displayed in Figure 3c. According to the results, the sensor fabricated by the P400 sandpapers exhibited the best performance. Figure 3d indicates the typical relationship between the resistance response (ΔR/R 0 ) and the applied tensile strain (up to 200%) of the sensor, where ΔR is the resistance change under various strain values, R 0 is the base resistance of the sensor. The gauge factor (GF) is a key parameter for strain sensing and is calculated from the slope of resistance-strain curves. As shown in Figure 3e, the curves www.advancedsciencenews.com www.advintellsyst.com www.advancedsciencenews.com www.advintellsyst.com exhibit two distinct sensitive regions, which confirmed that the regional crack process occurred in the spine peaks and valleys.
Notably, the GF of the device prepared with the P400 sandpaper reached the maximum value of 39.105 (0-100%) and 357.628 (100-200%), which increased nearly tenfold compared with that of the device without the microspine structure. Furthermore, the resistance responses exhibited a highly linear increase (with R 2 ¼ 0.9551 and R 2 ¼ 0.9668, respectively) both within the ranges, which is desirable for applications. The representative repeatability tests under a strain of 200% are shown in Figure 3f. During ten continuous cycles, all the hysteresis loops exhibited similar profiles without obvious fluctuation, which revealed excellent fatigue resistance of the flexible strain sensor. The result of the response and recovery time is displayed in Figure 3g, where the stretching rate was determined to be 500 mm min À1 according to the typical time of the finger motion. The resistance changed rapidly and symmetrically with the response and recovery time of %100 ms, which satisfied the requirement of capturing the fast stretch and tension because of finger motions. Figure 3h displays the resistance response when the sensor attaches to the finger surfaces to monitor various bending states. Benefiting from the enhancement effect of sensitivity, the device with microspine structure could perceive both small-and large-scale figure bending. The recoverability and durability were investigated through continuous loading and unloading strain, as depicted in Figure 3i. The device exhibited a stable response after more than 1000 times stretching under the considerable strain of 100%. These results verified the feasibility of the proposed sensor for practical applications.

Design of the Data Glove System
We developed a wireless data glove system using the MWCNT strain sensors for performing real-time gesture recognition. As shown in Figure 4a, five sensors were attached to the joint position to detect releasing and stretching due to the motions of multiple fingers. The special structural design of the strain sensor is displayed in Figure 4b, where the red arrows indicate the current flow path. The description and explanation of the special structural design is shown in Note S1, Supporting Information, and the cropped sensor exhibits a more pronounced change in signal for the same finger bending condition (see Figure S1, Supporting Information). The customized wireless printed circuit board (PCB) integrates multiple functions, such as multichannel signal acquisition, signal amplification, wireless transmission, using integrated circuit components, as depicted in Figure 4c. The red dashed boxes represent the locations of the integrated circuit components that correspond to the numbers in the internal circuit of the PCB in Figure 4d. The circuit amplifies the collected signals and converts them into digital signals through the analog-to-digital converter (ADC) and then the digital signals are transmitted to the computer through the low-energy Bluetooth. The total mass of the system is as low as 35.7 g, which is less than one-tenth of typical commercialized data gloves (See Table S1 and Figure S2, Supporting Information). www.advancedsciencenews.com www.advintellsyst.com

Adaptive Calibration Algorithm Based on the Dynamic Wavelet Threshold
Because of the interference from the environmental factors and tensile fatigue, the flexible strain sensors based on nanomaterials sustain a high degree of instability and severe baseline drift during long-term use, which results in distorted data and hinders classification and recognition. Wavelet threshold is a wildly used method for analyzing nonstationary signals, especially for addressing problems, such as denoising and baseline calibrating. [52] Generally, to restore the signal feature after wavelet reconstruction, establishing a suitable threshold function is critical to ensure that the difference between the estimated and real values of the wavelet coefficient remains as small as possible. However, the conventional threshold function [53,54] is not suitable for analyzing the motion signals with a high mutation rate. Under the condition that the wavelet decomposition coefficient ω j,k is equal to the threshold λ, an oscillation typically occurs in wavelet reconstruction because of the discontinuous hard threshold function. Conversely, when using the soft threshold function, the deviation of wavelet coefficients is hard to be eliminated, which results in an unacceptable error between the reconstructed and real signals. To address this problem, we proposed a special adaptive method featured by the dynamic threshold function by introducing an adjustment factor X n . Figure 5a displays the flowchart of signal processing.
First, the Db8 wavelet family was used as the wavelet base to decompose the collected motion signals. The threshold λ was determined according to the VisuShrink threshold as follows. [55] λ ¼ σ ffiffiffiffiffiffiffiffiffiffiffiffi 2 ln N p (3) where N represents the length of sending signals. Furthermore, σ is the standard deviation of the noise and can be determined by the Donoho method as follows. [56] σ ¼ MAD 0.6745 (4) where MAD is the median of the amplitude of the wavelet coefficients for all high-frequency sub-bands.
Considering the mutations at various moments, a dynamic adjustment factor X n was defined to evaluate the mutation degree of the motion signals f ðtÞ as follows.
Thus, incorporating the features of soft and hard thresholds, the wavelet threshold function can be modified as follows  jω j,k j À λð1 À X n Þ, jω j,k j ≥ λ X n λðjω j,k j À X n λÞ λð1 À X n Þ , X n λ ≤ jω j,k j < λ 0, jω j,k j < X n λ (6) Figure 5b displays the function profiles of the soft, hard, and the wavelet function with dynamic thresholds. The modified function depends on the value of X n , which enables adaptive change with the trend of motion signals. After processing by the algorithm, the reconstructed signal exhibits an excellent signal edge and avoids signal oscillations without baseline drift (Figure 5c,d). The details of the derivation process are given in Note S2, Supporting Information.
Furthermore, the signal-to-noise ratio (SNR) and root mean square error (RMSE) were 49.8871 dB and 0.3895, respectively, which indicated a higher denoising efficiency of our dynamic threshold function than that of the soft and hard threshold functions ( Table 1). The adaptive calibration algorithm can be extended to mutation signal processing, which provides a valuable solution for improving environmental adaptability and stability of nanostrain sensors. Here, the values of SNR and RMSE are defined as follows.

Hybrid CNN-LSTM Model for Gesture Recognition
The gesture motion signals collected by the strain sensors are typical time-related sequences. However, conventional machine learning and CNN models can be used only to extract the spatial features of the signals, whereas temporal features are ignored. This phenomenon limits recognition accuracy. Furthermore, complex learning networks are not suitable for wearable system and real-time applications because of heavy computational costs and long response time. By combining the powerful data processing capabilities of the CNN with LSTM to perform time-series prediction, we established a lightweight hybrid CNN-LSTM model to improve both prediction accuracy and execution efficiency based on the spatial-temporal features. Depth-wise separable convolution was proposed to replace the standard convolution module, which drastically reduced the number of parameters without losing key information. Figure 6a displays the formatting and creation of the dataset for various gestures. The sliding windows divided the calibrated five-channel time series into 2D matrices with the same format. After labeling and combining, the dataset was created for training the model. Figure 6b reveals that the proposed CNN-LSTM model only comprises four layers that include the dominance of depth-wise separable convolutional layers and fully connected layers. Because the depth-wise separable convolution can perform single-channel convolution on multiple channels, the number of training parameters can be considerably reduced, [57,58] which brings low computation and high efficiency.
After the dataset passed through the model, three convolution kernels were used to perform convolution operations on different channels in a motion signal sample. Thus, three feature maps were obtained. To fuse the information between each channel, the point-wise convolution operation was performed on these feature maps. Next, the feature vector obtained through the fully connected layer was passed to the LSTM layer (Figure 6c), where two hidden layers were used to extract the temporal features in the feature vector. Finally, the vectors containing spatialtemporal features were input into the classifier and the final predictions were obtained.
To verify the recognition performance of the proposed hybrid CNN-LSTM model, the motion signals generated by 30 gestures (Figure 6d) in Chinese sign language were collected to establish a dataset for training with CNN, LSTM, and the hybrid CNN-LSTM models. As displayed in Figure 6e,f, the poor loss convergence but better accuracy convergence may be due to the higher approximation of many gestures (Note S4, Supporting Information). Compared with the single CNN model or the LSTM model, the hybrid model CNN-LSTM exhibits higher accuracy, faster convergence, and lower loss during the training process. Figure 6g displays the confusion matrix of the recognition results for 30 gestures. Although the hybrid CNN-LSTM model only contains four layers, it exhibits a high accuracy and outperforms CNN-based or LSTM-based methods. The results of the comparative experiments are shown in Figure S3, Supporting Information. For most of the gesture classes, the classification accuracy was more than 99% and the average accuracy was 97.5%, respectively. In the hybrid CNN-LSTM model, the number of parameters was 88 618, and the average recognition time was only 2.173 ms for a single sample, which is highly desired in real-time interactive applications. Notably, excellent recognition performance was achieved with only five sensors and 20 training epochs using the CNN-LSTM model, so the contradiction of computational accuracy and efficiency can also be settled. The parameters and accuracy of different models are shown in Table 2. This lightweight model renders the system feasible for the embedded mobile deployment.

Applications of the MWCNT Data Glove
As a proof of concept, the wireless data glove with MWCNT strain sensors was used to control mechanical and electrical equipment in real time. Figure 7a-c shows a demonstration of the mechanical claw imitating human gestures. The measured signals exhibit a stable amplitude with obvious motion feature. More than ten random gestures as well as various bending angles of the fingers can be accurately predicted and trigger the mechanical claw to work seamlessly (Video S1, Supporting Information). Moreover, a smart car was controlled using the data glove, as displayed in Figure 7d,e and Video S2, Supporting Information. The moving direction and speed were controlled through the gestures and bending angles, respectively. Because of the fast response time of the sensor and the high efficiency of the CNN-LSTM model, the total execution time of the proposed system is less www.advancedsciencenews.com www.advintellsyst.com than 200 ms, and even time delay from the motor and mechanical components is involved.
Fast response and precise action are required in FPS games. To show the rapid and accurate performance of the proposed system, a finger-computer interaction was completed through the data glove to play a commercialized FPS and a homemade flying game, respectively. Figure 8a-d displays the measured signal curves and corresponding control commands when playing the games. The data glove exhibits the fast response and precise control comparable to those of the mouse and keyboard. Users can complete the operations such as moving, turning, aiming, shooting, as well as loading bullets by wearing the glove instead of using the mouse or keyboard, as depicted in Video S3 and S4, Supporting Information. The system provided a rapid, accurate, and effective response to several external stimuli without perceivable delay, which can satisfy the requirement of remote mechanical control, medical rehabilitation training, and immersive VR. [59] These results indicate that the proposed data glove  www.advancedsciencenews.com www.advintellsyst.com that allows convenient and comfortable operation can be used as an alternative to conventional human-machine interfaces, especially when the rigid and bulky device is unavailable or not portable.

Conclusion
In this study, we proposed a low-cost MWCNT-based glove system for real-time gesture recognition. In contrast to previously reported methods that require a large-scale sensor array and high computation, in the proposed system, only five simple MWCNT sensors were used to capture the subtle finger motion and achieve an average accuracy of 97.5% for recognition of 30 gestures within 2.173 ms. By forming the microspine structure in the sensor, a regional-crack mechanism was proposed to considerably enhance the strain sensitivity of the device. The baseline drift and noise interference were addressed using the dynamic wavelet algorithm. Furthermore, the introduction of depth-wise separable convolution into the CNN-LSTM model improved the efficiency and reduced the complexity of extracting the composite spatiotemporal feature of the motion signal, and only 20 epochs of training were required to reach nearly 98% recognition accuracy. Furthermore, we interfaced the proposed system directly with a robotic hand, smart car, and computer to realize virtual control with human gestures with a total execution time of 200 ms. The proposed sensor design, calibration algorithm, and deep-learning model ensure the fabrication of highly robust strain MWCNT sensors with improved sensitivity and stability as well as the excellent potential for practical application of these sensors in HMI systems.
Fabrication of the Strain Sensors: First, the sandpapers with P80, P150, and P400 girts were prepared and cleaned. Then Ecoflex 00-30 was spin coated and vacuumed in the vacuum oven (40girts chased from STARCKEEcoflex film was formed, the MWCNTs were brushed on the surface of the soft substrate by a banister brush. According to the test requirement, the conductive film was cut into the shape of 1 cm Â 4 cm. To make www.advancedsciencenews.com www.advintellsyst.com the motion signals change more obviously under the same finger bending condition, the sensor was cut from the middle. Finally, an Ecoflex layer was used to encapsulate the sensor. Signal Collection and Dataset Configuration: The motion signals were collected in real time by an 8-bit microcontroller (STM8L051F3P) for multiple-channel voltages at a frequency of 200 Hz. In a python workspace, the sliding window moved in steps of six data points to cut the collected multichannel sequence signals and extract signal segments. The signal segments were taken as samples and automatically labeled. Each gesture collected 500 samples, of which 400 samples (80%) were used for training and 100 samples (20%) were used for testing, so a total of 15 000 samples for 30 gestures. The experiment has been performed with the consent of the subject.

Supporting Information
Supporting Information is available from the Wiley Online Library or from the author.