A Multiple-State Ion Synaptic Transistor Applicable to Abnormal Car Detection with Transfer Learning

An arti ﬁ cial synapse is an essential element to construct a hardware-based arti ﬁ cial neural network (ANN). While various synaptic devices have been proposed along with studies on electrical characteristics and proper applications


Introduction
Artificial neural networks (ANNs) inspired by the human brain have attracted considerable attention for various kinds of networks performing artificial intelligence (AI). [1][2][3][4][5] Most of these operations relying on software have been developed with a von Neumann architecture, which is accompanied by complex tasks that consume a lot of power for computation by using high-performance central processing units (CPUs) and graphics processing units (GPUs) in the cloud or a server. Recently, neuromorphic computing, which organizes ANNs based on hardware, has won attention as a promising alternative with drastically low power and time consumption compared to von Neumann computing.
One of the advanced directions in neuromorphic computing, called nonvolatile computing-in-memory (nvCIM), has been intensively researched to overcome the memory wall encountered in the von Neumann architecture. [6] Unlike the conventional von Neumann architecture, which consumes a lot of time and energy for transferring massive data between a processor and memory back and forth, the nvCIM can perform data processing in memory therewith by modulating synaptic weights. In addition, such data processing is particularly attractive in edge computing applications, where AI computing is performed near the physical location of a device or data source, rather than at the core of a centralized cloud or server. [6][7][8] In Supporting Information 1, the necessity of nvCIM in edge-computing is described in detail.
Among various candidates for nvCIM, an electrolyte-gated transistor (EGT) is attractive for realizing ideal synaptic properties: linearity, symmetry, and high dynamic range (HDR) with a number of multiple-state synaptic weights. [9][10][11][12] In addition, the EGT is advantageous for the separation of read and write operations because it is composed of three-terminal electrodes. [13] However, in the case of an EGT, it is difficult to implement a number of EGT synaptic devices with high density by wafer-level integration. This is because new materials with less complementary metal-oxide-semiconductor (CMOS) compatible fabrication have been used for most EGTs. [14,15] To date, the significance of microfabrication with CMOS compatibility for wafer-level integration has been underestimated, because its usefulness was frequently devalued compared to finding a new mechanism with new materials for neuromorphic devices. In addition, most neuromorphic studies have focused solely on improving the linearity and symmetry of weight modulation for high classification accuracy. Meanwhile, the importance of multiple-state weights has been overlooked. Because wide weight tunability with multiple states is not always required for simple AI computations such as static image recognition from datasets of Modified National Institute of Standards and Technology (MNIST) and Canadian Institute for Advanced DOI: 10.1002/aisy.202100231 An artificial synapse is an essential element to construct a hardware-based artificial neural network (ANN). While various synaptic devices have been proposed along with studies on electrical characteristics and proper applications, a small number of conductance states with nonlinear and asymmetric conductance changes have been problematic and imposed limits on computational performance. Their applications are thus still limited to the classification of simple images or acoustic datasets. Herein, a polymer electrolyte-gated synaptic transistor (pEGST) is demonstrated for video-based learning and inference using transfer learning. In particular, abnormal car detection (ACD) is attempted with video-based learning and inference to avoid traffic accidents. The pEGST showed multiple states of 8,192 (¼13 bits) for weight modulation with linear and symmetric conductance changes and helped reduce the error rate to 3% to judge whether a car in a video is abnormal.
Research (CIFAR-10), [6,16] or speech recognition from an acoustic dataset. [17,18] More to the point, however, multiple states are crucial to recognizing dynamic images such as video data.
World deaths by traffic accidents were %1.35 million in a year, i.e., one person in every 24 s die on the road. Therefore, preemptive actions, such as prescreening neighboring abnormal vehicles (e.g., sliding on wet surfaces, fishtailing, etc.) and warning a driver in advance, are crucial to prevent traffic accidents. They can be the most important applications with AI computation for autonomous edge vehicles using a vast number of dynamic images. [19,20] Few attempts based on software to assess the feasibility of abnormal car detection (ACD) have been reported. [21] For the actual ACD, however, there were two practical issues in the input video dataset: detecting time length and environment. One is that the maximum allowable time of the training video was shorter than 2 s. The other is that various scenarios according to outdoor environments including sunny, dark, rainy, and snowy conditions were not fully reflected. In addition, large-scale networks such as LSTM, GRU, and VGG-16 cannot support realtime operation that demands image processing faster than 15 frames per second (FPS). Moreover, a hardware-based attempt to apply neuromorphic devices for ACD by reflecting the measured synaptic characteristics from a fabricated device has not been reported to date, even though software-based simulations are still used in part. The present work is the first neuromorphic nvCIM study that can predict an abnormal car in advance through video-based learning and inference.
Herein, a polymer electrolyte-gated synaptic transistor (pEGST) was fabricated on a wafer with CMOS technology for a synapse, which is applicable to a neuromorphic chip to realize ACD with wafer-level integration. [22] In the expression of multiple states, the number of multiple states is 2 n when the number of bit (N bit ) is n. In previous works, N bit of 7 (¼128 states) was sufficient to classify a simple static image or acoustic data with nvCIM. [23,24] However, N bit more than at least 11 bits (¼2,048 states) is required to train a video-based ACD dataset with high accuracy. The fabricated pEGST showed multiple-state synaptic weights up to N bit of 13 with good linearity (α pot ¼ 1.17 and α dep ¼ À0.41) and symmetry in the potentiation and depression (P/D).
From a network point of view, the modified VGG-16 network that trained ACD dataset with fast operating speed was proposed for the first time. It is available for various anomalous scenarios regardless of a time length and outdoor environments on the road. The attained classification accuracy for the ACD was higher than 85%, which is comparable to the ideal detecting accuracy of 88% that is achievable by a state-of-the-art software-based classifier by the use of cutting-edge GPUs. As another advantage, the pEGST consumed low energy of less than 10 fJ pulse À1 for the weight modulation, which is preferred for in-memory computing for use in today's IoT computing environment or at the edge in application to driverless cars. Figure 1a schematically illustrates the proposed pEGST and a chemical structure of proton (H þ ) doped polyethylene glycol di-methacrylate (pEGDMA), which was used as a solid-state electrolyte for a gate dielectric. The background optical photograph behind the schematic in Figure 1a shows a fabricated 4-inch wafer enclosing 30 device arrays. To mimic a human brain composed of approximately quadrillion synapses, a high packing density of the synapses is necessary. However, fabricationinduced variability can be a concern in terms of the inter-array and intra-array uniformity. For uniform and conformal deposition of a thin polymer film across a wafer, an initiated chemical vapor deposition (iCVD) process was adopted to form an ultrathin solid pEGDMA electrolyte layer. [25,26] Step-by-step fabrication methods and the detailed iCVD processes are described in Supporting Information 2 and 3, respectively. There is structural similarity between a biological nerve system and the pEGST, i.e., element-to-element correspondence between them, as depicted in Figure 1b. There is also functional similarity between a biological nerve system and the pEGST. In the biological nerve system, neuron-to-neuron connectivity is controlled by electric potential via mobile ions. Such connectivity is modulated by gate voltage via protons in the pEGST. When pre-synaptic voltage (V pre-synaptic ¼ V GS ) is applied to a gate of the pEGST, protons in the pEGDMA electrolyte move close to or far from a channel of the pEGST. They influence the threshold voltage (V T ), which modulates the post-synaptic current (I post-synaptic ¼ I DS ) and channel conductance (G DS ), which is defined as ∂I DS / ∂V GS . Figure 1c shows a transmission electron microscopy (TEM) image of a cross-sectional gate stack, a fast Fourier transform (FFT) image of a single-crystalline silicon (sc-Si) channel, and an energy dispersive X-ray spectroscopy (EDS) mapping image of the gate stack in the pEGST. The good film quality of the thermally grown interfacial gate oxide (SiO 2 ), which is positioned below the pEGDMA and over the sc-Si channel, contributes to improving the subthreshold slope (SS), increasing on-state current (I on ), preventing gate leakage current (I G ), and reducing off-state current (I off ). Due to the high I on and low I off in the transfer characteristic (I DS -V GS ), a wide dynamic range (HDR) of conductance representing the window of conductance between the lowest state and the highest state, which is defined as G DS,max -G DS,min , was achieved. The HDR of conductance allows each conductance state to be distinguished among 13 bits in the P/D. Two representative current-voltage characteristics, transfer curve of drain current (I DS ) versus gate voltage (V GS ) and output curve of I DS versus drain voltage (V DS ), are shown in Figure S4, Supporting Information. In the I DS -V GS plot, there is hysteresis between a forward mode swept from ÀV GS to þV GS and a reverse mode swept from þV GS to ÀV GS . The hysteresis denoted by ΔV T is defined as the V T difference between the forward mode and the reverse mode. ΔV T is attributed to proton migration in the polymer electrolyte according to V GS polarity. It should be noted that the ΔV T is increased with a larger |V GS |. Figure 1d shows a conduction schematic by proton hopping along with the ethylene oxide links in the percolated network. It is known that the activation energy (E A ) increases as the purity level of pEGDMA increases. [27] The increased E A can be favorably utilized to prolong the retention of conductance weight. The aforementioned iCVD is attractive to increase the purity level due to a solvent-free vapor-phase process. When þV GS is applied, the electric field (E-field) is downwardly directed from the gate to the channel.

Structure and Electrical Characteristics of PEGST
Hence, protons in the polymer electrolyte hop toward the channel. Contrariwise, when ÀV GS is applied, they hop toward the gate. Such directional proton migration by the V GS polarity was verified in our previous work with the aid of impedance spectroscopy and time-of-flight secondary ion mass spectrometry (TOF-SIMS). [22] Figure 1e shows a pulse scheme of the gate and drain for potentiation, depression, and read, which modulates and deciphers the weight conductance (G DS ) for training video-based datasets. An optimal pulse amplitude for potentiation (V pot ¼ V GS ) and depression (V dep ¼ V GS ) is þ4.2 V and À5.5 V, respectively. V pot and V dep are time-invariant amplitude, i.e., identical pulse amplitude even for higher G DS . Their pulse widths are equally 10 ms, which is not changed for reinforcement, either. The gate read voltage (V GS,read ) is set to 0.1 V and the drain read voltage (V DS,read ) is 0.05 V, which are small enough to minimize proton drift in the electrolyte film. Figure 1f-i shows the modulated synaptic weight for the number of states ranging from 7 to 13 bits. The inset of Figure 1i shows a close-up view of the detailed G DS . Note that 13 bits for the weight modulation are the largest number of G DS ever reported among various synaptic devices. Even though there are some write noises for the weight modulation of 13 bits, the level is relatively smaller compared to other types of synaptic devices, such as Figure 1. Schematic of polymer electrolyte-gated synaptic transistor (pEGST) with biological analogy, transmission electron microscopy (TEM) image, origin of the memory characteristics, and weight conductance modulations with applied pulse schemes. a) Illustrated pEGST overlaid on optical photograph of fabricated wafer and chemical structure of the pEGDMA. b) Analogy between pEGST and biological synapse with neuron. c) Cross-sectional TEM image of the gate stack, fast Fourier transformation (FFT) image of single-crystalline silicon channel, and energy-dispersive X-ray spectroscopy (EDS) images for gate electrode, gate dielectrics, and channel. d) Schematic of proton hopping conduction in the pEGDMA electrolyte to show origin of memory characteristics. e) Applied pulse schemes for gate (V G ) and drain (V D ). f-i) Plot of potentiation and depression for modulation of synaptic weight conductance (G DS ) with 7, 9, 11, and 13 bits.
www.advancedsciencenews.com www.advintellsyst.com ferroelectric FETs, resistive random-access memory (RRAM), and charge-trap FETs. [28][29][30] To evaluate the linearity of the G DS modulation, a nonlinearity parameter α was calculated according to the following equation [31] G ¼ For α ¼ 1, modulation of the weight conductance follows an ideally linearized G DS on both P/D. For the condition of α > 1 that is usually observed in potentiation, G DS deviates from an ideal linearized G DS , following a concave profile. For the condition of α < 1, that is usually observed in depression, G DS deviates from an ideal linearized G DS , followed with a convex profile. Here, α pot and α dep represent the level of nonlinearity for potentiation and depression, respectively. α pot and α dep are 1.51 and À0.38 for 7 bits, and α pot and α dep are 1.17 and À0.41 for 13 bits. Both show good linearity. Measurement setup to characterize G DS and its extraction method were described in Supporting Information 5. And its uniformity evaluated for four different device arrays across a wafer was also shown in Supporting Information 6. The results showed not only good linearity but also high classification accuracy for ACD. Potentiation, depression, read, and multiply-accumulate (MAC) operations have feasibility with pseudo-crossbar array configuration in a threeterminal synaptic transistor. [32] 2.2. Neural Network Configuration with Transfer Learning Figure 2a shows transfer learning that is based on our modified VGG-16 model was used to construct the neural network for ACD. This network consists of a feature extractor using a convolutional neural network (CNN) and a classifier using a fully connected layer (FCL). The reason for using the modified VGG-16 model in this work is that its operating speed is faster compared to the original VGG-16 model because of the reduced FCL and the use of a global average pooling (GAP) layer. The GAP reduces the size of the features from the CNN layers, but maintains spatial information well. It boosts computational speed by reducing the computational volume in the FCL. [33,34] In the simulation study by using GPUs as a software-based attempt, the original VGG-16 showed an operating speed of 4 FPS. But, our modified VGG-16 shows 512 FPS, which is much faster than 15 FPS which is the standard for real-time operation. From an experiment, the classification accuracy was 88%, 81%, and 85% for the modified VGG-16, the original VGG-16, and the MobileNetV3 model, which is one of the latest networks, respectively (see Supporting Information 7). [35] A modified VGG-16, which is advantageous to memorize spatial information, was designed by inserting a global average pooling (GAP) layer between the feature extractor and the fully connected layers (FCL). The GAP layer in the modified VGG-16 is helpful to further improve the classification accuracy compared to the original VGG-16 model. Thus, it is attractive for performing a task of the ACD. [36,37] A sequential diagram of dataset processing for transfer learning is shown in Figure 2a. There are two phases: the training phase and the test phase. In the case of the training phase, input data of the preprocessed training video are passed to the transferred CNN with kernel weights, which were already trained by PyTorch. Afterward, the extracted features that were processed by a convolution operation with the input data and kernel weights are passed to the FCL replaced by the fabricated pEGSTs. They are then trained for several epochs. In the case of the test phase, input data of test video are sequentially passed to the transferred CNN and the learned FCL to distinguish whether the driving pattern of an anonymous car is abnormal. The detailed preprocessing procedure of the training and test dataset is shown with the size of each dataset in Supporting Information 8.
The configuration of the whole network is schematically illustrated in Figure 2b. The FCL network is composed of input neurons, the hidden layer, and output neurons for the classification. In the original VGG-16 network, the number of output neurons from the last convolution layer was 22 528 (channel Â width Â height ¼ 512 Â 11 Â 4). If the aforementioned GAP layer is employed to boost the operation speed, it drastically reduces to 512 neurons. Neurons in the hidden layer are 1,000 neurons, and that of neurons in the output layer are 2, i.e., normal or abnormal. Further detailed information on the neural network is shown in Supporting Information 9. Figure 2c shows the initial images in the left part and the trained images extracted from the ACD dataset in the right part. The light images marked with (i) to (vi) were taken in daylight, and the dark images marked with (I) to (VI) were taken at night. Therefore, it is confirmed that abnormal car detection is possible with high accuracy regardless of the external environment including sunny, dark, rainy, and snowy conditions (refer to Supporting Video).

Software-Based Simulation Results of Neural Network by Reflecting Measured Synaptic Characteristics
To evaluate the performance of the ACD, transfer-learning-based semi-empirical simulations with the reflection of the measured synaptic characteristics from the pEGSTs were carried out. In addition, widely used weight initialization methods, such as Xavier initialization, random initialization, and Gaussian initialization were employed and compared in the FCL. The Xavier initialized weights follow a normal distribution using the number of nodes in the previous and next layers. The randomly initialized weights follow randomly generated initial weight values by using software in the range of weight modulation. The Gaussian initialized weights are normalized to follow the Gaussian normal distribution, which has a mean value of 0 with a standard deviation value of 1. As shown in Figure 3a, the Gaussian initialization method showed the best classification accuracy compared to the other initialization methods for various N bit : 7, 9, 11, and 13. Afterward, based on the Gaussian initialization method, the classification accuracy of the ACD was extracted with the aid of transfer learning simulations by reflecting the measured G DS from the pEGST for various N bit . Figure 3b shows the classification accuracy of the ACD when synaptic weight modulation of the pEGST was applied to the FCL. It is increased with an enlarged N bit . As expected, the highest classification accuracy was observed in the case of N bit ¼ 13. While the classification accuracy of N bit ¼ 7 is less than 50 %, that of N bit ¼ 13 is %85%. It is close to the upper limit of 88 % that can be achievable www.advancedsciencenews.com www.advintellsyst.com with an ideal software neural network by the use of cutting-edge GPUs. Figure 3c shows the test error rate that was traced up to 50 epochs according to N bit . Error rate is defined as a difference of classification accuracy between the ideal and practical case.
Because the upper limit of the best achievable accuracy with an ideal software neural network by use of cutting-edge GPUs is 88 % and the classification accuracy at N bit ¼ 13 is 85%, the error rate becomes 3%. At every training epoch, a higher N bit shows a smaller error rate. The classification accuracy of ACD becomes saturated near 20 epochs because of the binary classification, i.e., normal versus abnormal. The classification accuracy of ACD was also evaluated for various nonlinear weight modulations, which were mostly observed in actual artificial synaptic devices devoted to analog deep neural networks (DNNs). Figure 3d shows various non-linearized shapes of the weight modulation according to the aforementioned nonlinear parameter α. It can be seen in Figure 3e that the classification accuracy is decreased as α deviate more from 1 in both P/D. It is worth noting that degradation of the classification accuracy according to nonlinearity was alleviated as N bit was increased. Figure 3f shows the robustness to degradation of classification accuracy according to N bit . A higher N bit results in better robustness. In the case of N bit ¼ 13, classification accuracy was only degraded by 5.7% even for large nonlinearity of α pot ¼ 5 with fixed ideal linearity of α dep ¼ 1. Based on the aforementioned results, it is inferred that a synaptic device with a larger N bit is preferred for complex AI operation to classify dynamically changeable video image data than the N bit www.advancedsciencenews.com www.advintellsyst.com range required for relatively simple AI operation to categorize static images or acoustic data (See Table S4, Supporting Information). All of the classification in this work is performed by software simulation based on measured synaptic characteristics Figure 4a shows the measured nonvolatile characteristics from the fabricated pEGST to show how long a certain conductance state is sustained, i.e., retention time (t ret ). Sampled G DS of 100 states was sustained for longer than 1,000 sec. Both retention characteristics at low and high conductance weight were also verified in Figure S10, Supporting Information. During the conductance measurements, V GS and V DS were fixed at 0 V and 0.05 V to avoid ion drift caused by the applied gate voltage. Close-up views of the G DS plotted in the inset of Figure 4a show that each state is still distinguishable without any crossover between each measured G DS . For quantitative analysis, the coefficient of variation (CV ) defined as the ratio of the standard deviation (σ) to the mean (μ) is calculated from the ΔG DS . The average CV is 0.08 and hence no crossover is verified up to 1,000 s. Even though a t ret of 1,000 s may not be long enough for off-chip training, it can be accepted for online training, where real-time input data are trained with a periodic and frequent updates of the synaptic weight. [38] Retention characteristics of pEGST are related to the diffusion of protons in the electrolyte layer. Diffusion coefficient (D) in a solid can be found from a fitted Arrhenius plot dominated by the following equation:

Characteristics of Retention, Speed and Energy Consumption
where E A is the activation energy, R is the molar gas constant, and T is the absolute temperature. When a proton commonly engaged in CMOS microfabrication is used as an ion source for a solid-state electrolyte, D 0 value is relatively high thus it is disadvantageous in terms of retention time. But, there is a trade-off relationship between the retention time and operating voltage or switching speed. Therefore, these conflicting demands need to be resolved by further research. Cyclic endurance of weight modulation up to 10 6 operations was also confirmed in Figure S11, Supporting Information. It is well-known that the maximum number of weight updates can be calculated according to the number of training images multiplied by the number of epochs. [39] However, the actual number of updates is less than the maximum number, because not every synapse is updated during training. In Figure S12, Supporting Information, the average number of updates while training each pEGST in layers 1 and 2 was counted. Even though the number of weight updates is linearly increased as the number of epochs is increased, the cycling endurance of the pEGST less than 10 3 is sufficient. This is because the classification accuracy becomes saturated www.advancedsciencenews.com www.advintellsyst.com approximately when the number of training epochs is less than 20, as mentioned earlier. Figure 4b shows the switching time (t switch ) of the pEGST according to device miniaturization. The fabricated pEGST with a large size of L G ¼ 10 μm and W ch ¼ 50 μm showed a t switch of 10 ms. Note that t switch is proportional to the gate capacitance, which is linearly proportional to the gate area, i.e., L G ·W ch . [40,41] Therefore, if a device size is further scaled down according to the International Roadmap for Device and Systems (IRDS), [42] the extrapolated t switch can be reduced to 100 ns. Figure 4c shows that the energy consumption for the P/D operation is reduced by miniaturization of the device area. Note that the switching energy of a pEGST is calculated with E ¼ V G ·I G ·t switch . The result was 10 fJ pulse À1 , which is smaller than the switching energy of a biological synapse (20 fJ pulse À1 ). Here, I G is gate current flowing from a channel to a gate or vice versa. A low level of I G of less than 10 pA is an inherent MOSFET characteristic arising from a gate insulator. The extrapolated energy consumption during the P/D in Figure 4c shows that it can be further decreased by down-scaling according to the IRDS. In addition, if a vertical dimension of the pEGST such as a gate dielectric thickness is thinned, it is expected that the switching voltage to achieve the same vertical electric field will also be reduced Detailed values of the extracted switching time and energy consumption from the fabricated pEGSTs according to their footprint area are summarized in Table S6, Supporting Information. In this work, energy consumption was briefly estimated for switching energy of a single synaptic device while potentiation and depression, like other device-level researches for the neuromorphic system. [16,43,44] Later on, detailed estimation including energy consumption for training and inferring in the entire neural network system

Conclusion
We demonstrated ion synaptic transistors with analog multiple states of 13 bits for in-memory computing applicable to autonomous vehicles or an IoT device. pEGSTs were fully integrated on a wafer with CMOS compatible microfabrication. An ultrathin all-solid-state pEGDMA electrolyte with numerous ethylene oxide links allows ion hopping and its deep activation energy (E A ) supports nonvolatile memory characteristics, which are advantageous for an artificial synapse. Such multiple states of 13 bits are more than the number of bits necessary to classify static images or acoustic patterns, however, they are indispensable to classify dynamic video images for abnormal car detection (ACD) with high accuracy. A synaptic array composed of pEGSTs with nonlinearity parameters of α pot ¼ 1.17 and α dep ¼ À0.41 achieved a classification accuracy of 85.1% in ACD, which is close to the practical limit with a modified fast neural network. Degraded classification accuracy that arises from the inherent nonlinearity of synaptic devices was countervailed by increasing the number of conductance states. In ACD applications, the classification accuracy cannot be compromised with any other factors. Hence high accuracy ascribed to the linear weight modulation and the large number of conductance states in the pEGST is greatly advantageous. Low energy consumption of less than 10 fJ pulse À1 is another important advantage for an analogue hardware neural network for in-memory computing, especially for IoT applications. Even though current work reflects the measured synaptic characteristics from the fabricated devices, it still uses software-based simulations for the extraction of a recognition rate. Nonetheless, it is expected that the multiple-state pEGSTs can provide a feasible pathway to realize non-volatile computing-in-memory (nvCIM) for use in actual edge computing.

Experimental Section
Device Fabrication: See Supporting Information 2 and our previous works for details of the fabrication process of the pEGST. [22,45] iCVD Process: Vaporized ethylene glycol dimethacrylate (EGDMA) monomer flowed into a pressure-controlled vacuum chamber with a tert-butyl peroxide (TBPO) initiator. The ratio of the flow rate between the monomer and initiator was the same, and the chamber pressure was kept at 60 mTorr by using a proportional-integral-derivative (PID). A heated filament decomposed the injected initiator and produces the radical, which activated the vinyl group in the monomer. The polymerization and adsorption of pEGDMA occurred simultaneously on the surface of the samples, with a stable temperature of 40 C.
Neural Network Configuration: A modified VGG-16 network was used. In the case of a CNN, the kernel weights learned through PyTorch were transferred to the proposed nv-CIM neural network. The fully connected layers of the modified VGG-16 network are composed of three layers: an input layer, hidden layer, and output layer. There were 512 neurons in the input layer, 1,000 neurons in the first hidden layer, and two neurons in the output layer due to binary classification. Each output neuron distinguished between an abnormal (label: 0) and normal (label: 1) car state. The hidden neurons had a ReLU activation function, while the output neurons have a soft-max activation function. The modified VGG-16 network had 13 number of 3 Â 3 convolutional filters for 12(channels) Â 355(width) Â 130 (height) input images and the global average pooling (GAP) layer at the end of the feature extractor. In addition, the ReLU activation function was used after the convolution layers of the feature extractor and the sigmoid activation function at the end of the fully connected layer.
Device Characterization: DC and pulse measurements were performed in ambient conditions at room temperature by using a B1500A semiconductor parameter analyzer with a pulse generator unit (PGU) module (Agilent Technologies).
TEM Analysis: High-resolution cross-sectional TEM images were taken by using corrected scanning transmission electron microscopy (JEM-ARM200F) with EDS mapping (Bruker QUANTAX 400).

Supporting Information
Supporting Information is available from the Wiley Online Library or from the author.