Si‐Based Dual‐Gate Field‐Effect Transistor Array for Low‐Power On‐Chip Trainable Hardware Neural Networks

Herein, dual‐gate field‐effect transistors (DG FETs) fabricated on Si substrate and a corresponding NOR‐type array designed for low‐power on‐chip trainable hardware neural networks (HNNs) are presented. The fabricated DG FET exhibits notable endurance characteristics, with the subthreshold swing remaining consistently within a 2.45% range of change and ΔV th per cycle maintaining stability at 4.5% over repetitive program and erase operations, up to 104 cycles. Furthermore, a multilevel characteristic is achieved through low‐power program/erase operations based on Fowler–Nordheim (FN) tunneling, which exhibit 0.09 and 0.99 fJ per spike, respectively. These characteristics provide the HNN stability, along with high performance and power efficiency. The NOR‐type array in this work demonstrates selective update and bidirectional vector‐by‐matrix multiplication capabilities. This enables on‐chip training based on a gradient descent algorithm, without the need for an additional array for backpropagation. Subsequently, a simulation of the Modified National Institute of Standards and Technology classification is conducted to evaluate the accuracy and training power consumption of the proposed device in comparison to other two‐terminal memristor devices. The results show that the DG FET array achieves superior accuracy while maintaining over 180.4 times higher energy efficiency, demonstrating the potential of the DG FET as a promising candidate for low‐power HNN applications.


Introduction
[6] To address more complex and general problems, neural networks have evolved to include more parameters along with structural improvements. [7]owever, the exponential growth in the number of parameters has resulted in a significant increase in the time and cost required to train neural networks.17] HNNs can be categorized into two types: off-chip trained HNNs and on-chip trainable HNNs. [18]In off-chip trained HNNs, the weight values are trained in software and transferred to the synaptic array. [19]Consequently, only forward inference of a neural network is conducted in the hardware.On the other hand, in on-chip trainable HNNs, the weight training is performed directly on the hardware during runtime. [20]he implementation of on-chip trainable HNNs has been found to be a challenging task, and as a result, a majority of studies on synapse devices have focused on off-chip trained HNNs.
[23][24] Moreover, the retention characteristics of synaptic devices can cause weights to change over time, requiring periodic reconnection to external software for training to maintain performance. [25]This iterative process involves the measurement of conductance for numerous synaptic devices and subsequent retraining based on the measured changes in conductance for each synapse.The substantial power consumption and long latency associated with this process pose significant challenges to the practical implementation of HNNs in lowpower applications, such as edge devices or embedded hardware systems.As a result, there is growing interest in the research of on-chip trainable HNNs capable of autonomous training without external software.
In this regard, we fabricated a dual-gate field-effect transistor (DG FET) and a corresponding NOR-type array to enable on-chip trainable HNNs with a gradient descent algorithm.
The concept of using transistors with serial gates in NOR array structure has previously been introduced as a solution to address issues in conventional NOR-flash arrays: device degradation, and reduction in program speed due to the temperature increase during hot carrier injection (HCI). [26][31][32] These materials are currently a topic of significant interest due to their flexible, recyclable, and cost-effective characteristics.However, these applications lack the memory characteristics required for storing synaptic weights, since their major target is sensor applications. [31,32]oreover, due to the larger dimensions of these devices, they are not yet suitable for hardware neural networks (HNNs) that necessitate the integration of a substantial number of synaptic devices.35][36] The program/erase operation of fabricated DG FET is based on FN tunneling which exhibits low-power consumption, and the corresponding NOR-type array enables bidirectional current summation and selective updates of individual synapse cells.
This article is organized as follows.Section 2 presents the fundamental principle of the gradient descent algorithm utilized in on-chip trainable HNNs, along with the specific requirements for the synapse array to enable the algorithm in low power.Subsequently, considering these requirements, a comparative analysis of various array structures of FET-based synaptic devices is conducted.In Section 3, the DG FET is introduced as a solution to the issues encountered with conventional flash arrays discussed in Section 2. This section provides an overview of the fundamental synaptic characteristics of DG FET, including its bidirectional nature and selective update capabilities as applied to the NOR-type structure.Subsequently, a simulation is conducted for the classification of the Modified National Institute of Standards and Technology (MNIST) dataset using the synaptic characteristics of DG FET.The results are then benchmarked against various synaptic devices in terms of classification accuracy and training power, thereby demonstrating the potential of the DG FET as a suitable candidate for low-power HNN applications.

Gradient Descent Algorithm
The gradient descent algorithm has been known as the most effective way to train artificial neural networks. [37,38]This method includes the backward propagation of the error vector from the output layer to compute the weight gradients. [39]For a loss function L, error vector δ, weight matrix W, activation function f act , and input vector z, the weight gradient is calculated as follows.
where ∘ and l represent the element-wise multiplication and the index of the specific hidden layer, respectively.During the backpropagation, the calculation of the error vector necessitates a vector-by-matrix (VMM) multiplication between the error vector and the transposed weight matrix.In this regard, the synapse array has to be capable of bidirectional operation, allowing it to perform VMM for both the weight matrix and its transpose, depending on whether the neural network is in the feedforward phase or backpropagation phase.This ensures that the synapse array is capable of computational demands for both phases, without an additional synapse array for the backward path.
There have been several attempts to use resistive randomaccess memory (RRAM) to implement the on-chip trainable HNN, exploiting the bidirectional VMM and selective conductance modulation capabilities of the one transistor one resistor (1T1R) crossbar array structure. [40,41]However, RRAM devices typically exhibit high-power consumption during conductance modulation, since their switching mechanisms are based on the formation/annihilation of a metallic filament or oxygen vacancies with a current of several μA. [42,43]Considering that numerous weight updates are required during the training of a neural network, it is crucial to investigate synaptic devices that consume low energy during conductance modulation.In this regard, FETbased synaptic devices are capable of selective program/erase operations using the FN tunneling mechanism which consumes low energy during training, [44] depending on the structure of the array.The subsequent section provides an overview of various array structures of FET-based synaptic devices.

Array Structures of FET-Based Synaptic Devices
FET-based synapse arrays can be classified into three types, AND, NAND, and NOR, depending on the connection of the source, drain, and gate.The AND structure shown in Figure 1a allows selective program/erase operations based on FN tunneling, enabling low-power training of HNNs. [45,46]However, the structural characteristic involving the parallel connection between the source and drain of adjacent flash devices limits the calculation of the weighted sum for the transposed matrix.To enable both feedforward and backward propagation within a single synapse array, current summation should be possible in both the red-and blue-highlighted directions depicted in Figure 1.In the case of an AND-type array architecture, VMM can only be performed using the flash cells highlighted in red, as they are interconnected through the source-line (SL) and bit-line (BL).On the other hand, the flash cells highlighted in blue are only connected to the wordline (WL) via the gates, preventing VMM between these cells.In this regard, bidirectional VMM cannot be achieved in the AND-type flash array of Figure 1a, thereby requiring an additional array for backward propagation.
The NAND structure exhibits a high integration density and mature technology as a memory device, [47] which is advantageous for implementing HNNs that require numerous synaptic devices.However, the erase operation of the NAND-type flash array is blockwise, [48] which poses limitations for on-chip trainable HNNs that require selective update capabilities.Furthermore, due to the serial connection of the source and drain of the flash devices (highlighted in blue in Figure 1b), it is not possible to perform VMM for the transpose matrix as in the AND structure.In contrast, the SL and BL are perpendicular in the NOR structure as shown in Figure 1c.This enables a bidirectional VMM within the array, depending on whether the weighted sum is computed through SL or BL.Therefore, among the three structures of flash-based synapse arrays, only the NOR structure is capable of performing VMM of the transpose matrix without an extra array for backpropagation.However, the selective program operation in conventional NOR flash memory requires a large amount of power since it is based on HCI. [49]This feature causes excessive power consumption during training, as a large number of repetitive program operations are typically required during the training phase.
Additionally, the erase operation in NOR flash arrays is based on block-wise FN tunneling, making it unsuitable for on-chip training that requires selective weight modulation capability.A previous study [50] proposed a scheme for selective erase in conventional NOR-type arrays using gate-induced drain leakage, but this approach is not entirely free from power consumption issues since it is still based on HCI mechanism.Therefore, to enable on-chip trainable HNNs, a synaptic device capable of selective weight training at low power within the NOR structure is necessary.To address the aforementioned issues, we implemented the NOR-type DG FET synapse array, which exhibits low-power consumption in selective program/erase operations as well as bidirectional VMM capability.The detailed characteristics of the DG FET and corresponding array structure are represented in the following section.

Device Structure and Characteristics of DG FET
Figure 2a-c shows the cross-sectional schematic, transmission electron microscope (TEM), and top scanning electron microscope (SEM) images of the fabricated synaptic device, respectively.The synaptic device exhibits a dual-gate structure, which consists of a select-gate (SG) with a gate insulator of 10 nm SiO 2 and a memory-gate (MG) with a gate insulator stack of SiO 2 /Si 3 N 4 /SiO 2 (3/6/9 nm) for charge-trap operation.The lengths of SG, MG, and the thickness of the Si body are 0.6, 1.0, and 100 nm, respectively.The SG governs the switching behavior of the synaptic device, whereas the MG controls the synaptic weight by adjusting the conductance of the device.The conductance of the device can be changed by controlling the V MG bias or the amount of charge stored in the charge-trap layer.The bandgap diagram of the channel according to the VMG and VSG biases is represented in Figure S2, Supporting Information.
Figure 2d shows the measured I D -V SG curves of the DG FET as a parameter of V MG at V D = 1 V.With an increase in V SG , the drain current increases as the device turns on and eventually saturates after reaching a certain threshold current level.This behavior is determined by the relative ratio of the channel resistance under SG and MG.At low V SG , the SG channel resistance is higher than that of the MG, dominating the overall conductance of the device.Therefore, as the V SG increases, the channel resistance of the SG decreases, leading to an increase in the drain current.In contrast, when V SG is high enough to fully turn on the SG, the channel resistance of the MG becomes dominant.Hence, the drain current saturates even when V SG is further increased.Thus, the subthreshold swing (SS) of the transfer curves is determined by SG, and the on-current is controlled by MG. Figure 2e shows the measured I D -V D curves as a parameter of V MG at a fixed V SG of 2.5 V.The drain current saturates when the V D is higher than 0.5 V, and the saturated current level is determined by the V MG bias.
The saturation characteristic of the DG FET can prevent variations in the weighted sum caused by the IR drops in the metal line of the synapse array.With an increase in the size of the synapse array, the current sum flowing through the metal line also increases, causing a shift in the voltage difference between both ends of the synapse device.53] Compared to the 2-terminal memristor devices where the synaptic current fluctuates in response to voltage variations at both terminals, the DG FET shows high stability for implementation in large-scale synapse arrays, which is required for enabling HNNs to process complex computational tasks.

Synaptic Characteristics of the DG FET
Figure 3a,b shows the measured I D -V SG curves of the DG FET when 100 erase pulses (À8.5 V, 1 ms) and 200 program pulses (9 V, 50 μs) are applied to the MG, respectively.The source, drain, and SG of the device are set to 0 V during the program/erase operations.As the program and erase pulses are applied to the DG FET, electrons or holes in the channel are trapped in the nitride layer by FN tunneling.To explain the FN tunneling mechanism in the DG FET, the energy band diagrams along the charge trap layer during the erase and program operations of the DG FET are shown in Figure 3e,f.Each time a program or erase pulse is applied to the n þ -poly-Si gate, the energy band in the insulating layer stack between the gate and the substrate bends.This can lead to an effective reduction in the height and thickness of the potential barrier, eventually increasing the tunneling probability of the charge carriers in the p-body of DG FET.The amount of charges trapped in the nitride layer determines the conductivity of the DG FET at fixed V MG , thereby modulating its synaptic weight.Please note that the program and erase operation based on the FN tunneling enables the DG FET to exhibit low-power consumption while modulating the weight of the device, whereas a typical flash device used in a NOR-type array uses an HCI mechanism which consumes relatively high energy.
To calculate the energy consumption during the program/ erase operations in DG FET, a DC sweep of the voltage bias for MG was performed to measure the tunneling current through the MG, as shown in Figure 3c.The tunneling currents observed under the program and erase conditions (8.5, and À9 V respectively) have magnitudes below 100 fA, which is the minimum resolution of the measurement equipment, Agilent B1500A.Therefore, the maximum possible current of 100 fA is used to calculate the power consumption.Additionally, the current component injected into the nitride layer is derived from the V th shift of the DG FET, as described in other studies. [50,54]he average current injected into the nitride layer when a single-update pulse is applied can be calculated using the following equation.
where C Box represents the capacitance of blocking oxide, N pulse is the number of update pulses applied, ΔV th is the change of threshold voltage when N pulse pulses are applied, and t pulse is the width of the update pulse.Using these two current components, the maximum energy consumption for each program and erase operation is calculated to be 0.09 and 0.99 fJ, respectively.Considering that the training of a neural network requires an extensive number of updates, the low-power operation of the DG FET provides a significant advantage for implementing the on-chip trainable HNN with reduced power consumption.
Figure 3d shows the long-term potentiation (LTP) and longterm depression (LTD) characteristics of the DG FET.The read condition is set to V SG = 2.5 V, V MG = 1.0 V, and V D = 1.0 V.The LTP and LTD characteristics of a synaptic device are modeled as follows. [22]n þ where G(n) is the conductance of a synaptic device when n pulses are applied, G max and G mix are the maximum and minimum conductance values, respectively.α p , α d , β p , and β d are the fitting parameters for LTP and LTD curves which determine the nonlinearity of the synaptic device.The fitting parameters for the proposed DG FET are as follows: α p ¼ 9.2 Â 10 À12 , α d ¼ 1.6 Â 10 À10 , β p ¼ 0, and β d ¼ 4.9.
The endurance and retention characteristics of the fabricated DG FET are represented in Figure S3, Supporting Information.Figure S3a, Supporting Information, shows the change in the SS and ΔV th per cycle when up to 10 5 update pulse cycles are applied.The SS of the DG FET remained consistent within a range of 2.45% over 10 5 cycles, while ΔV th also maintained stability within 4.5% over 10 4 cycles, demonstrating the notable endurance characteristic of the fabricated DG FET. Figure S3b, Supporting Information, shows the retention characteristics of the fabricated DG FET, with a variation of À4.34 mV dec À1 in the programmed state and 12.2 mV dec À1 in the erased state.Considering that the fabricated device and algorithm are proposed to aim on-chip trainable HNN, these endurance and retention characteristics can be adaptively applied to the network, consistently upholding the network performance and accuracy.

Measured Characteristics of the NOR-type DG FET Array
Figure 4a shows the top SEM image of the fabricated 8 Â 4 NORtype DG FET, and the inset represented in the dotted line shows 4 DG FETs included in the array.Here, the MG, SG, source, and drain terminals of the DG FET are connected to a WL, a selectgate line (SGL), a SL, and a BL in the synapse array, respectively.Figure 4b shows the selective program/erase characteristics of the array, when 200 program pulses (9 V, 50 μs) and 100 erase pulses (À8.5 V, 1 ms) are applied to M1.The specific bias conditions for selective program/erase operations are denoted in Figure 4c.By applying the turn-off voltage to the SG, the NOR type array enables selective program and erase operations with FN tunneling, not by HCI which costs high-power consumption.Under the given bias and pulse conditions, the synaptic current of selected cell M1 changes by %1.5 nA, while the synaptic currents of unselected cells (M2, M3, and M4) remain constant, exhibiting changes below 25 pA.
Then, the bidirectional characteristic of the NOR-type DG FET array is investigated.Figure 5 illustrates a VMM operation performed on an 8 Â 4 NOR-type DG FET array.Performing the VMM operation in the software is computationally intensive, but it can be easily achieved by summing currents within a synapse array.Figure 5a shows the synaptic current of eight individual DG FET cells connected along the SL.The current sum of the eight individual DG FETs (I 1 þ I 2 þ…þI 8 ), depicted as a red line in Figure 5b, closely matches the total SL current (I SL,total ) of the eight DG FETs, represented by a black line.From Figure 5c, it is shown that the fabricated synapse array successfully performs VMM operation along the SL within 0.54% error.Similarly, the VMM operation along the BL is shown in Figure 5d-f, with an error rate of 0.22%.These results affirm the capability of the fabricated NOR-type DG FET array to perform bidirectional VMM within a single array.exhibit bidirectional characteristics or not.During the forward propagation step, input data is conveyed through each layer and neuron circuit within the network.Subsequently, the resulting output is compared with the target vector to compute the loss vector.The calculated loss vector is then propagated backward to calculate the error vectors (δ) of each layer.As the weight gradient is calculated by multiplying the hidden values from the forward propagation phase and the error vectors from the backward propagation phase, both vectors are stored in the local memory throughout each phase.

On-Chip Trainable HNN Architecture
In the case of using a synapse array without bidirectional characteristics as shown in Figure 6b, the array utilized during the forward propagation phase cannot be reused for the backward propagation step.Consequently, additional arrays are necessary for the backward propagation phase, leading to an increased architecture area.Additionally, the latency and power consumption are increased due to the additional steps, which involve reading the weight values from the forward propagation array and copying them into the arrays used for the backward propagation.In order to copy the weight matrix, the conductance of each  synaptic device within the weight matrix must be read and written to a corresponding position in the additional array.This sequential process closely resembles the bottleneck problem inherent in the von Neumann architecture. [55]Given the numerous number of parameters typically present in neural networks, it substantially increases the overall training time and power.
On the contrary, when the synapse array exhibits bidirectional characteristics, it offers the advantage of utilizing the same synapse array for both the forward and backward propagation phases, as depicted in Figure 3b.This results in a substantial reduction in area, latency, and power consumption of the architecture by eliminating the need for additional arrays and the subsequent steps.In this regard, the proposed NOR-type DG FET array demonstrates its bidirectional characteristic along with the low-power program/erase capability through FN tunneling, thereby enabling the on-chip trainable HNN to operate in reduced power and latency.Moreover, considering the temperature increase during HCI in the conventional NOR-flash array causes the reduction in the program speed, the proposed system reduces the response time of the on-chip HNN to address influences that affect the network performance.The detailed operational schemes of the proposed architecture are introduced in the subsequent sections.

Simulation Result and Benchmark
To evaluate the accuracy and training power of the DG FET when applied to the on-chip trainable HNNs, a simulation for MNIST classification with a fully connected neural network with dimensions of 784-216-10 has been conducted.The on-chip training process in HNN consists of three main phases: feedforward phase, backpropagation phase, and update phase.
The first feedforward phase is represented in the red arrow in Figure 7a.The output of the former layer is received as input, and the VMM is conducted to calculate the hidden output, which is then propagated to the next layer.In the output layer, the output values are compared with the true label to calculate the loss value.Note that the weight is defined as the difference in conductance between two paired synapses representing positive (G þ ) and negative (G À ) values in HNNs, following W ¼ G þ À G À . [56]his is due to the physical limitation that the conductance value itself cannot have a negative value, while the weight value can be negative.
Figure 7b denotes the voltage bias and inputs applied to the DG FET array and corresponding current summation during the feedforward phase.Only G þ synapse array is represented in the figure for simplicity, but the same biases are applied to the G À synapse array.The voltage biases of 2.5, 1, and 0 V are applied to the SGL, SL, and BL, respectively, and the input values represented in pulse widths are applied to the WLs.In this configuration, the weighted sum of synapses highlighted in red is computed through SLs.Subsequently, the final weighted sum output is calculated by subtracting two current components from the G þ and G À synapse arrays.
The second phase is the backpropagation phase, indicated in the blue arrow in Figure 7a.In this phase, the loss value computed in the feedforward phase is propagated backward at each iteration.From the loss calculated at the i þ 1 th layer (δ iþ1 ), δ i is computed through VMM with the transpose of the weight matrix and transmitted to the former layer.In this process, the bidirectional characteristic of the NOR-type DG FET array enables the reuse of the synapse array used in the feedforward phase.
How voltage biases and inputs are applied to the synapse array is represented in Figure 7c.Unlike the feedforward phase, the loss vector from the next layer is applied to the SGLs, and the corresponding weighted sum is calculated through the BL.
The third phase is the update phase, where the synaptic weights are trained using the hidden inputs from the feedforward phase and the loss values from the backpropagation phase.An update algorithm based on the Manhattan rule [57] is used instead of the conventional gradient descent method.This approach reduces the burden on peripheral circuity in an on-chip trainable neural network by offering a relatively simple training method.In this algorithm, the weights are updated by considering only the polarity of ΔW ij .When ΔW ij is positive, a single erase pulse is applied to the G þ,ij synapse, while a single program pulse is applied to G À,ij synapse to increase the weight value.Conversely, when ΔW ij is negative, a single erase pulse is applied to the G À,ij synapse, and a single program pulse is applied to G þ,ij , decreasing the weight value.
Figure 8 shows the pulse scheme to implement the proposed algorithm within the NOR-type DG FET array.For illustrative purposes, the G þ and G -synapse arrays of a size of 2 Â 2 are considered, where specific hidden inputs (z 1 > 0, z 2 ¼ 0) and loss vector (δ 1 0, δ 2 h i0) are applied.From Equation ( 2), the update of weight value W ij depends on the polarity of δ i z j .Under given conditions, the polarity of weight updates are as follows: ΔW 11 > 0, ΔW 21 < 0, and ΔW 12 ¼ ΔW 22 ¼ 0. Therefore, to increase W 11 , an erase pulse is applied to G þ,11 while a program pulse is applied to G À,11 .The other cells are inhibited following the pulse scheme proposed in Section 3.3.Likewise, G þ,21 and G À,21 are selectively programmed and erased, respectively, to decrease W 21 .
Following the aforementioned update method, the accuracy and training energy of the proposed NOR-type DG FET array in MNIST classification were investigated.[60][61][62][63][64][65][66] By serially connecting a select transistor to the memristor device, the two-terminal memristor devices can construct a 1T1R crossbar array structure, which possesses bidirectional VMM functionality and selective conductance update scheme as in NOR-type DG FET array.Synaptic characteristics including the number of conductance steps, energy required to update the conductance, and nonlinearity fitting parameters for each synaptic device are investigated and given in Table S1, Supporting Information.
Figure 9a presents the required number of update pulses for MNIST classification accuracy.The DG FET exhibits the second-highest number of required update pulses to reach the maximum accuracy.This is due to the larger number of conductance levels in the DG FET, which necessitates more iterations for convergence.The results show that achieving high accuracy is influenced by two key factors: a larger number of conductance levels and the presence of linear LTP/LTD characteristics.In this regard, the DG FET demonstrates a linear LTP behavior and exhibits a minimum of 100 conductance levels, resulting in the highest accuracy among the compared devices, reaching 97.65%.
In Figure 9b, the training energy is plotted against the corresponding classification accuracy.Although the DG FET requires one of the largest numbers of update pulses during training, it exhibits over 180.4 times lower training energy consumption compared to other devices while maintaining the highest accuracy.This notable advantage is attributed to the capability of DG FET to modify channel conductance through charge trapping based on FN tunneling, which differs from twoterminal memristors that require relatively high currents to change their material states.
Figure 10 shows the distribution of the number of update pulses applied to each synapse.For layer 1, the average and maximum number of update pulses applied throughout the entire training phase are 149.9 and 1095, respectively.Similarly, for layer 2, the corresponding numbers are 160.4 and 732, all of which are located within the stable range (%10 4 pulse cycles) of the fabricated DG FET, shown in Figure S3, Supporting Information.

Conclusion
In this study, we successfully fabricated DG FET and the corresponding NOR-type array suitable for on-chip trainable HNNs.The saturation characteristic of the fabricated DG FET provides robustness against voltage fluctuations, maintaining the performance of the neural network.Moreover, the energy consumption during the training phase is remarkably low due to the FN tunneling-based program and erase operations, with calculated values of 0.09 and 0.99 fJ, respectively.The fabricated NOR-type DG FET array has bidirectional current summation and selective conductance update capabilities, thereby eliminating an additional synapse array for backward propagation during on-chip training.To enable its application in on-chip trainable HNNs, the operation schemes of the NOR-type DG FET array in the feedforward phase, backpropagation phase and update phase have been proposed.Subsequently, a simulation of MNIST dataset classification was conducted based on the measured characteristics, and the results were benchmarked against various types of synaptic devices.The results demonstrated that the DG FET exhibits superior accuracy (97.65%) and lowest training energy consumption, due to its large number of conductance levels, linear LTP characteristics, and nature of FN tunneling during the conductance updates.

Experimental Section
Fabrication of DG FET: Figure S1, Supporting Information, shows the main fabrication process steps for the DG FET on the 6 in.silicon on insulator (SOI) wafer.First, the active Si regions were patterned and p À -channel implantation was performed.The doping concentration of the pchannel was 1 Â 10 18 cm À3 .Next, a 10 nm-thick gate oxide was grown by dry oxidation, and n þ -doped poly-Si with a thickness of 100 nm was deposited by low-pressure chemical vapor deposition.Select-gate which governs the switching behavior of the DG FET was then patterned, and the charge-trap layer of SiO 2 /Si 3 N 4 /SiO 2 was formed after removing the exposed SiO 2 layer.Then, n þ -doped poly-Si was deposited and patterned for the memory-gate.The memory gate controlled the current level flowing through the DG FET by changing the amount of the charge stored in the charge-trap layer.Ion implantation for source and drain was then performed, and rapid thermal annealing was followed.The annealing process cured the crystalline damage caused from collision between highenergy ions and the substrate during ion implantation and activated the implanted dopants by moving the dopants from initial interstitial sites into bound substitutional sites. [67]Then, the SiO 2 passivation layer was deposited, and contact holes were formed.Finally, Ti/TiN/Al/TiN stack was formed by sputtering and patterned to complete metal wiring.
Simulation on MNIST Classification: The simulation was conducted with Python, reflecting the synaptic characteristics of the various types of synaptic devices given in Table S1, Supporting Information.Training was conducted for 60 000 MNIST handwritten datasets until the classification accuracy converged.During the feedforward phase, the input value propagated forward through the layers following where z l represents the output of the l th layer and f act denotes the activation function.
Except for the last layer, the output value passed through the rectified linear unit (ReLU) activation function.At the last layer, the softmax activation function was used to generate the output probabilities for classification.Once the feedforward phase was completed, the loss δ L was calculated by comparing the output probabilities from the network with the target values using the cross-entropy function.
In the subsequent backpropagation phase, the loss was propagated backward through the network using the bidirectional characteristic of the synapse array.The related equation is given by where δ l represents the loss for the l th layer, and f ' act denotes the derivative of the activation function.The error from the next layer was multiplied by the transpose of the weight matrix and element wise multiplied with the derivative of the activation function to compute the error for the current layer.Note that f ' act was either 0 or 1 for the ReLU activation function, enabling a simple hardware implementation.
After the backpropagation phase, the Jacobian matrix of the loss with respect to the weight matrix was calculated in order to minimize the loss of the neural network.The Jacobian matrix is given by In this simulation, the weight values were updated depending on the polarity of the Jacobian matrix.The conductance changes for the G þ synapse and G À synapse when a single program/erase pulse was applied were given as follows. if G max ÀG min ΔG l À,ij ¼ α p exp Àβ p G l À,ij ÀG min G max ÀG min (9)

Figure 1 .
Figure 1.Structures of a) AND-type, b) NAND-type, and c) NOR-type arrays with FET-based synaptic devices.The synaptic devices highlighted in red and blue indicate the directions required for current summation during feedforward propagation and backpropagation, respectively.

Figure 2 .
Figure 2. a) Cross-sectional schematic of the DG FET fabricated on the 6-inch silicon-on-insulator (SOI) wafer.b) Cross-sectional TEM and c) top SEM images of the fabricated DG FET.d) I D -V SG curves of the DG FET as a parameter of V MG bias.e) I D -V D curves of the DG FET as a parameter of V MG bias.

Figure 3 .
Figure 3.I D -V SG curves when a) 100 erase pulses (À8.5 V, 1 ms) and b) 200 program pulses (9 V, 50 μs) are applied to DG FET.c) Tunneling current through MG depending on V MG bias.d) Measured LTP/LTD characteristics of the proposed device.Energy band diagram along the charge trap layer during the e) erase and f ) program operations.

Figure 6
Figure 6 compares the overall on-chip trainable HNN architecture, with consideration for whether the employed synapse arrays

Figure 4 .
Figure 4. a) Top SEM image of the fabricated NOR-type DG FET array.The inset shows the schematic of 4 DG FET cells included in the array.b) Selective weight update characteristic of the NOR-type DG FET array.c) Bias conditions for selective updates.

Figure 5 .
Figure 5. a) Synaptic current of individual DG FET cells connected along the SL.b) Comparison between the current sum of individual cells and total current through SL. c) VMM result along the SL.d) Synaptic current of individual DG FET cells connected along the BL.e) Comparison between the current sum of individual cells and total current through BL. f ) VMM result along the BL.

Figure 6 .
Figure 6.Overall on-chip training system architecture based on synapse arrays a) without bidirectional characteristics and b) with bidirectional characteristics.

Figure 7 .
Figure 7. a) Schematic of a fully connected neural network used in the simulation.b) Voltage biases and pulse scheme during the feedforward phase.c) Voltage biases and pulse scheme during the backpropagation phase within the same array as (b).

Figure 8 .
Figure 8. Voltage biases and pulse scheme during the update phase.To increase weight, a) G þ synapse is erased to increase conductance, while b) G À synapse is programmed to lower the conductance.When decreasing weight, c) a program pulse is applied to G þ synapse, while d) an erase pulse is applied to G À synapse.

Figure 9 .
Figure 9. Benchmark analysis between DG FET and two-terminal synaptic devices based on a) number of updated pulses, and b) training energy consumption for classification accuracy.The DG FET demonstrates higher accuracy and lower training energy consumption.

Figure 10 .
Figure 10.Distribution of the number of update pulses applied to each synapse in a) layer 1 and b) layer 2.