Lateral Migration‐based Flash‐like Synaptic Device for Hybrid Off‐chip/On‐chip Training

An increase in the demand for artificial intelligence is leading to advanced research in the field of neuromorphic systems, which imitate human brain functions with the hope of increasing computational speed and lowering power consumption. Especially, the development of energy‐efficient and reliable synaptic devices is critical as synapses are fundamental building blocks of neuromorphic systems. In this study, by adjusting the charge injection pathway of conventional flash memory devices, a lateral migration‐based synaptic device is proposed. Using the efficient program/erase method, the proposed device is operable at a significantly low voltage while maintaining formidable retention and endurance characteristics. Furthermore, an efficient hybrid off‐chip/on‐chip training method using the proposed device is presented. The results demonstrate a variation‐robust neuromorphic system, indicating the superiority of the proposed device.

8][9][10] Beyond memory function, synaptic devices also need to meet several other conditions for efficient computing architecture, including complementary metal-oxide semiconductor (CMOS) compatibility, high density, and high reliability. [11,12]here are typically two methods used for training neuromorphic systems.The first method is off-chip training, where weight values obtained through training in software are transferred to the synaptic devices. [13]However, off-chip training is known to be sensitive to hardware variations, particularly in neuromorphic systems that utilize analog conductance synaptic devices. [14]The process of transferring pre-determined weight values to the synaptic devices is greatly influenced by device-to-device variations and pulse-topulse variations. [15]On the other hand, the second method is on-chip training, where weight values of synaptic devices are updated during runtime using additional circuitry. [16,17]The approach allows the network to adapt to device variations during the training phase, providing more robust performance.However, since on-chip training requires repeated weight updates, the synaptic devices require a low program (PGM) / erase (ERS) energy for the low-power operation of the overall system.Also, to store the on-chip trained weights, good retention characteristics are needed.Considering these aspects, a training method with strong immunity to variation while operable at low power is required.
[28][29] However, the large bias pulse and current for data storage by Fowler-Nordheim (FN) tunneling and hot carrier injection, respectively, cause significant dynamic power consumption, which limits the capability of the low-power operation. [30]In addition, trap generation in the tunneling oxide due to the large electric field and electron tunneling is a major endurance failure mechanism in conventional Flash devices. [31]reviously, our research group reported research on removing the tunneling barrier in the gate insulator stack of conventional Flash devices. [32]By removing the tunneling barrier, charge transfer between the channel silicon and the charge trap layer becomes easier, enabling relatively low-voltage operation.Furthermore, the absence of a tunneling barrier demonstrates strong endurance characteristics by excluding the large electric field and the injection of high-energy carriers.Thus, the device is able to mimic repeated signal accumulation in neuron circuits and leakage operation in neurons, potentially replacing the membrane capacitance of neuron circuits.However, the poor retention characteristics pose a significant drawback to being used as synaptic devices.As a result, while the utilization of a charge injection path without a tunneling barrier for low-power operation is promising, the retention characteristic of the device should be improved.Here, we suggest utilizing one of the well-known issues in charge trap Flash, charge lateral migration, to build a new memory cell that dramatically improves the trade-off relationship between retention and endurance/low-power operation. [33]n this paper, by combining the charge migration characteristics of charge trap Flash and the direct charge transfer without a tunneling barrier, the lateral migration-based Flash memory called Side Path device is proposed as a synaptic device.First, lateral migration of charges is verified through fabrication and measurement.The results show that with an efficient PGM/ERS method, low operational voltage can be utilized while maintaining good retention characteristics.Secondly, using the proposed device, a hybrid off-chip/on-chip training method is proposed.The proposed method demonstrates strong immunity to deviceto-device variation, yielding an up to 6.3× accuracy degradation reduction caused by the variation compared to the conventional method.Consequently, the proposed device has the advantage of low-power operation, improved retention, and improved endurance characteristics, making the device suitable for neuromorphic computing.

Device Structure
Figure 1a shows a schematic cross-sectional view of the proposed Side Path synaptic device.The detailed device fabrication process is depicted in detail in the Experimental Section and Figure S1 of the Supporting Information.The fabricated device has a poly-silicon channel, and it features electrically separated source and drain regions, along with the p-type doped body that provides holes during memory operations, specifically during ERS operation.The height of the source and drain electrodes becomes lower than that of the p-body during the fabrication process, resulting in a curved channel.The curved shape enhances immunity to the short-channel effects as the gate bias is concentrated on the curved channels near the source and drain. [34]Notably, the tunneling oxide on the source/drain side of the gate insulator stack is removed, allowing direct contact between the source and drain n + -doped poly-Si and a Si 3 N 4 layer serving as a charge trap layer.This side path enables the injection and lateral movement of electrons and holes through the side of the charge trap layer.
For the purpose of analysis, three different versions of the Side Path device were fabricated depending on the ΔLO of (1), as shown in Figure 1a-2.
ΔL O equals the overlapped length of the tunneling oxide and source or drain; thus, the negative value indicates that the tunneling oxide region is narrower than the p-body region.Figure 1c shows the measured transfer curves of three devices, which exhibit stable transistor operation.
Figure 1d compares the PGM mechanisms of conventional charge trap NAND Flash with Al 2 O 3 /Si 3 N 4 /SiO 2 (A/N/O) as a gate insulator stack, a device without tunneling barrier (A/N), and the proposed Side Path device.In the charge trap NAND Flash array, strong program bias (V PGM ) applied to the gate of a selected cell incurs FN tunneling through the tunneling oxide, and the tunneled electrons are trapped in localized traps of the Si 3 N 4 layer (Figure 1d-1). [35]A spacer region, which is not gated and electrically separated by adjacent cells, exists next to the selected cell.Since the spacer region is only biased by the weak fringing field, the lateral migration of trapped electrons is limited.Hence, even though the charge trap layer is connected, adjacent cells operate independently.On the other hand, in the case of the A/N device, electrons are directly injected to charge trap Si 3 N 4 layer due to no tunneling SiO 2 , as shown in Figure 1d-2, thus being operable with low V PGM .In the case of the Side Path device, FN tunneling through the tunneling oxide is suppressed due to low V PGM .However, electrons are directly injected and trapped into the Si 3 N 4 region attached to the source/drain, similar to the PGM mechanism of the A/N device, as shown in Figure 1d-3.Since all the regions along the channel direction are gated and biased by V PGM , the potential barrier for lateral migration is lowered.The injected and trapped electrons diffuse and drift laterally through the Si 3 N 4 layer due to the lateral distribution of the electron density and repulsion between closely packed electrons, respectively.When the PGM pulse is eliminated, the conductivity in the Si 3 N 4 layer decreases and electrons are localized in traps.

Lateral Migration Validation
In order to prove that charges migrate laterally, the electrical characteristics of the proposed device should be compared with the device with a conventional gate insulator stack (A/N/O) and the device without a tunneling barrier (A/N).Three different devices (A/N/O, A/N, Side Path) were fabricated using the same process but with different gate insulator stacks.Basic electrical characteristics of A/N/O and A/N devices are shown in Figure S2 of the Supporting Information.Figure 2a shows the measured I D -V G transfer curves in different memory states of the A/N/O device with a width/length (W/L) of 1 μm/1 μm.Note that the typical A/N/O device requires a 9 V / 100 μs pulse for program operation and a -9 V / 10 ms pulse for erase operation.Figure 2b shows the measured I D -V G transfer curves in different memory states of the A/N device with W/L = 1 μm/1 μm.The A/N device allows for easier charge injection and ejection compared to the A/N/O device, thus requiring significantly lower V PGM (6 V / 100 μs) and V ERS (-5 V / 100 μs).
Based on the operating conditions of the aforementioned devices, the electrical characteristics of the Side Path device are analyzed.Figure 2d depicts the schematic of the memory operation of the proposed Side Path device.Using low V PGM , electrons are supplied and injected only through the side path.The injected electrons migrate inside the device by a drift-diffusion mechanism through the Si 3 N 4 layer due to the existence of the spatially distributed traps.Therefore, by applying repetitive low V PGM , electrons accumulate in the Si 3 N 4 layer above the SiO 2 tunneling oxide layer.Electrons that have not migrated to the Si 3 N 4 layer above the SiO 2 layer can be erased with low V ERS .To erase the electrons trapped in the Si 3 N 4 layer above the tunneling oxide, high V ERS , the conventional voltage required to erase Flash memory that incurs FN tunneling through the tunneling oxide, is applied.Conversely, when a high V PGM is applied to the gate, electrons are injected through both the side path and tunneling oxide, resembling typical memory operations, as shown in Figure 2c.Note that the PGM characteristic of the A/N/O device with the operating condition of the A/N device (6 V / 100 μs) shows no shift in the transfer curve, indicating electron tunneling through the SiO 2 layer at 6 V is negligible, as shown in Figure 2a.On the other hand, the measured I D -V G transfer curve of the Side Path device with ΔL O = −0.1 μm shows that the device is programmable with the 6 V / 100 μs, as depicted in Figure 2c.At the negative V G side, I D mainly originates from the gate-induced drain leakage (GIDL).The amount of the I D -V G curve shift on the negative V G side (ΔV GIDL ) is significantly larger than the threshold voltage shift (ΔV th ).Since the side path region coincides with the region where GIDL occurs, charges are more effectively injected into the side path region compared to the Si 3 N 4 layer above the tunneling oxide at the same V PGM .Since the diffusion or drift (electron repulsion) is concentration-driven, the injected electrons move toward the center of the channel through Si 3 N 4 .However, a higher electron concentration exists near the source/drain, resulting in larger ΔV GIDL than ΔV th .
For further validation of lateral migration, repetitive V PGM pulses are applied to the gate of fabricated devices with different ΔL O s.In Figures 2e-f, V PGM of 6 V / 100 μs × 100 is repeatedly applied 10 times and ΔV th was plotted using the constant current method of the I D -V G transfer curves.Note that as ΔL O decreases, the programming efficiency increases due to increased direct injection to the Si 3 N 4 layer.In addition, after the V PGM pulses are repeatedly applied, low V ERS pulses of (-6 V / 100 μs or -7 V / 100 μs), which are insufficient for erase operation in a typical A/N/O device, are applied to the gate to discharge the charges injected into the Si 3 N 4 layer (Figure 2e).In all three devices, after the low V ERS pulses are applied 10 times, ΔV th saturates before reaching the initial V th , which indicates the injected charges in Si 3 N 4 were not completely erased.Even though the rate of decrease in ΔV th with applying single V ERS before saturation was significant compared to the case with applying V ERS of -6 V / 100 μs, the ΔV th saturation was also observed by applying a series of V ERS pulses of -7 V / 100 μs.The change in ΔV th exhibits a decreasing trend followed by saturation.The rate of decrease increases with  1a is simply illustrated to be flat.e) Measured ΔV th of the Side Path device with different ΔL O s by repeatedly applying different V ERS pulses (Sold: V ERS = -6 V / 100 μs, Open: V ERS = -7 V / 100 μs) after V PGM (6 V / 100 μs) pulses are applied.ΔV th saturates after a certain number of V ERS , indicating lateral migration of charges.The higher V ERS (-7 V / 100 μs) increases ERS efficiency.f) Measured ΔV th after applying A/N/O V ERS (-9 V / 10 ms) pulses to remove migrated charges (Solid: V ERS = -6 V / 100 μs, Open: V ERS = -7 V / 100 μs).g) ΔV th extracted by repeatedly reading transfer curves for the Side Path devices with different ΔL O s after the V PGM (6 V / 100 μs) is applied.ΔV th decreases gradually as charges in the side path region discharge naturally.h) V th s of the Side Path, A/N, and A/N/O devices after several PGM/ERS cycles, which exhibit similar standard deviations.
increasing V ERS , indicating that higher V ERS effectively removes the stored charges in the Si 3 N 4 layer.However, even though those are filled during the low-voltage PGM operation, the saturated ΔV th is similar for both cases, indicating that both the V ERS s of −6 and -7 V are insufficient to remove all the charges.
Note that the asymmetry in PGM and ERS operations is evidence of lateral migration of charges.The remaining ΔV th can be understood as the effect of laterally migrated charges in the Si 3 N 4 layer near the channel center (x = L/2) which are blocked by SiO 2 and only can be removed by FN tunneling through the bottom SiO 2 layer.Note that the migrated charges hardly migrate reversely from the channel center to the source/drain since the electric field applied by low V ERS is in the vertical direction.Also, the lateral diffusion during ERS is not significant compared to the PGM case due to the lower electron concentration gradient.Therefore, in order to remove migrated charges to the Si 3 N 4 layer above the SiO 2 layer, high V ERS , V ERS for the A/N/O device, should be applied, as shown in Figure 2f.Applying a high V ERS of -9 V / 10 ms resets the device into the initial state.
In addition, the read disturbance characteristic of the proposed Side Path device is analyzed.After the V PGM pulses are applied 10 times, the I D -V G transfer curve is repeatedly measured (Figure 2g) until the ΔV th is saturated over time to observe the natural discharge of charges.As ΔL O decreases, the change in ΔV th during the repeated read increases as well, indicating more charges are discharged.On the other hand, ΔL O = +0.1 μm device, which is similar to the conventional A/N/O device, shows little change in V th even after 50 reads, indicating the majority of charges affecting V th migrated to the Si 3 N 4 layer above the SiO 2 layer.
Since the lateral migration in the Si 3 N 4 layer originates from the stochastic movement of charges, repeated PGM/ERS cycles may result in different ΔV th s.Hence, the controllability of the proposed Side Path device needs to be confirmed.In Figure 2h, the cycle-to-cycle variations of the A/N/O, A/N, and Side Path devices are measured.After several PGM/ERS cycles, the three devices exhibit similar negligible cycle-to-cycle variation, confirming that the lateral migration is reasonably controllable.Along with the lateral migration, the tunneling through the SiO 2 layer in the A/N/O device or conventional Flash devices (or direct injection in the A/N device) also relies on the quantum-mechanical stochastic processes.However, due to the large number of electrons (or holes) and the high attempt-to-escape frequency, the average movement of charges is predictable according to the law of large numbers.Therefore, the Side Path device is believed to have an intrinsically similar cycle-to-cycle variation level compared to conventional Flash devices.
Further measurements were performed to validate the degree of lateral migration of charges, as shown in Figure 3. Charge pumping is a widely used technique that applies varying pulses to the gate to measure the DC body current for the purpose of extracting interface trap density. [36]With slight modifications of fixing the low level and rise/fall speed of the gate pulses, analyzing the lateral difference in V th along the channel length direction, or in other words, the extent of lateral migration to the Si 3 N 4 layer above the SiO 2 layer of the Side Path device, is possible. [37]he charge pumping measurement technique incorporates the notions of local V th and local flat band voltage (V fb ), as shown in Figures 3a-1 and 3a-2.An arbitrary coordinate x is designated along the interface between the gate insulator and the silicon substrate, signifying the distance from the edge of the body toward the center of the channel.Here, local V th (V th (x)) is the V G required for inversion at a specific position x, while local V fb (x) represents the gate voltage required for accumulation at a specific x.As a result, when the single charge pumping pulse is increased to V high and applied to the gate, the inversion region extends only up to x inv ≡ x| Vth = Vhigh , as indicated in Figure 3b.Conversely, when the pulse is lowered to V low , the accumulation region extends up to x acc .As a result, the current measured during charge pumping (I CP ) is a result of recombination at the interface traps between x inv and x acc .Consequently, with V low fixed and increasing V high , x acc remains constant while x inv gradually increases.When V high is low, only the ends of the channel contribute to the I CP , but as V high and x inv increase, the entire channel contributes to I CP , which enables the extraction of the V th profile along the channel length direction.
Using this technique, the Side Path device with ΔL O = −0.1 μm is analyzed by measuring the initial I CP from the source side and the I CP after PGM (6 V / 100 μs × 1000) and ERS (-6 V / 100 μs × 500) operations, as shown in Figure 3c. Figure 3d shows the local ΔV th extracted from I CP before and after PGM.The x-axis represents the distance from the source to the midway of the pbody of the Side Path device.Since the charges are injected in x ≤ 0.1 and migrate laterally through the +x direction, ΔV th is more significant in x ≤ 0.1 where the region where Si 3 N 4 is exposed (left of the blue dashed line) and gradually decreases as x increases.A similar measurement is performed at an elevated temperature (80 °C), and the amount of injected charges and lateral migration in the +x direction (toward the center of the channel) increase.Conversely, Figure 3e shows the local ΔV th caused by ERS of the initially programmed device.ΔV th is more dominant in the Si 3 N 4 region, indicating that the remaining charges in the Si 3 N 4 region are removed with weak V ERS .In addition, as x in-creases, |ΔV th | decreases since the migrated charges to the Si 3 N 4 layer above the SiO 2 region are not fully removed.
Since the charges migrate to the Si 3 N 4 layer above the SiO 2 layer with low V PGM , the reliability characteristics of the Side Path device are expected to be improved as well.First, the retention characteristics of the A/N/O device and the A/N device are measured for comparison.Figure 4a shows the schematic energy band diagram of the A/N/O device during the retention mode after PGM operation.Due to the tunneling barrier, the trapped electrons cannot be easily emitted, as shown in the measured change in ΔV th for 10 3 seconds (Figure 4b).On the other hand, Figure 4c depicts the schematic energy band diagram of the A/N device.The absence of a tunneling barrier causes trapped charges to be easily emitted, as depicted in the measured change in ΔV th for 10 3 seconds (Figure 4d).
The retention characteristics of the Side Path device are expected to show intermediary characteristics between the A/N device and A/N/O device due to the mixed contribution of charges directly lost in the side path region, and charges migrated and preserved in the Si 3 N 4 layer above the SiO 2 layer, as schematically shown in Figure 4e. Figure 4f-g shows the retention of ΔV GIDL and ΔV th of the Side Path device with ΔL O = −0.1 μm after applying V PGM of 6 V /100 μs × 1000 and 7 V / 100 μs × 1000.ΔV GIDL decreases faster than ΔV th during the retention mode.
Together with Figure 2c, this indicates that charges can easily injected and detrapped in the region near the side path, but the charges migrated to the Si 3 N 4 layer above the SiO 2 layer are relatively well preserved.In addition, the Side Path device was programmed to have similar ΔV th values to those of the A/N/O and A/N devices, and the changes in ΔV th are measured for 1000 seconds (Figure 4h).The result shows a relatively stable retention of electrons, indicating the influence of migrated charges to the Si 3 N 4 layer above the SiO 2 layer.The rapid data loss in the Side Path device is attributed to the shallowly trapped electrons in the Si 3 N 4 layer near the SiO 2 layer.As a result, the concept of soft ERS/PGM, applying a small pulse to quickly remove the responsive charges, can be utilized to improve retention characteristics. [38]This approach will be elaborated further in the subsequent section.
In addition, the device degradation characteristics of A/N/O and Side Path devices are investigated, as shown in Figure 4i.To ensure that the same amounts of charges were stored in A/N/O and the Side Path devices, different V PGM and V ERS were used for the cycling endurance test.The A/N/O device exhibits significant degradation in subthreshold swing (SS) after 10 4 cycles.On the other hand, the Side Path device shows minimal difference in SS even after 10 7 cycles because the endurance failure source of typical flash devices, the trap generation in the tunneling oxide, is eliminated.As shown in Figure 4j, while the trap generation site at the tunneling SiO 2 /Si interface is liable for the degradation of the A/N/O device, the direct injection of electrons to the Si 3 N 4 layer is less susceptible to deterioration of the device. [39]o verify whether the devices have actually degraded or electrons are not fully removed, ultraviolet (UV) treatment with wavelength 254 nm was conducted to remove the remaining charges in the Si 3 N 4 layer, and the change in I D -V G characteristics is observed, as shown in Figure S3 of the Supporting Information.UV is known to provide energy high enough to free the trapped charges in Flash devices. [40]The A/N/O device did not return to its original state even after the UV treatment.On the other hand, for the Side Path device, the device returned to its initial state, suggesting that the change in SS resulted from the distribution of accumulated charges in the Si 3 N 4 layer, and the performance degradation of the device rarely occurs.
In summary, the reliability and operational voltages of three different devices (A/N, A/N/O, Side Path) are compared in Table 1.The Side Path device is operable with two distinct operation voltages.The A/N/O part of the Side Path device (center region) requires ±9 V for tunneling (see Figure 2c,f), while the side path (A/N) for lateral charge migration of the Side Path device operates at ≈±6 V. Since charges are mainly injected through the side path region with low voltages, overall reliability characteristics are highly improved.

Hybrid Training with the Side Path Synaptic Device
In this section, the application of the proposed Side Path device as a synaptic device in neuromorphic systems is discussed.A hybrid training approach that leverages the physical characteristics of the Side Path device to exploit the advantages of both off-chip and on-chip training is proposed.Figure 5a schematically represents the hybrid training sequence.Initially, off-chip training is performed using software to obtain a weight set, which is then transferred to the conductance of synaptic devices.The weight transfer is performed by PGM/ERS voltage higher than 7 V, denoted by "strong" PGM/ERS, for a wider dynamic range.On the other hand, the subsequent on-chip training for fine-tuning is conducted by "weak" PGM/ERS with voltages lower than 7 V for lowpower operation without endurance failure.In addition, right after the weight transfer, "soft" PGM/ERS can be performed for the dynamic range adjustment and retention enhancement.Details of the hybrid training in terms of the device physics are described in this section, and the algorithmic details are described in the Experimental Section.The effectiveness of the proposed hybrid training is verified through simulations of the Modified National Institute of Standards and Technology (MNIST) database classification task.
The conventional off-chip training method involves iterative conductance mapping in a closed-loop fashion, incurring time, energy, and circuit burden. [41]However, the proposed method using the Side Path device aims to transfer weights using only the number of pulses applied without read-verify steps.Also, a wide dynamic range, or larger |ΔV th |, is necessary to ensure a large max/min ratio of the synaptic weight, which is a critical factor for enhancing the accuracy of a hardware-based neural network. [42]Although the Side Path device is operable with weak PGM/ERS voltages (< 7 V) by charge lateral migration, using strong PGM/ERS voltages (> 7 V) for A/N/O-device-like behavior provides a larger dynamic range, as shown in Figure 2c.When strong PGM/ERS is performed, not only does FN tunneling occur in the central region with a tunneling oxide, similar to A/N/O devices, but also a greater charge transfer takes place in the side path region, ensuring a large dynamic range.The black square symbols of Figure 5b show the I D change of the Side Path device while repeatedly applying identical strong ERS and strong PGM pulses.Similar to other reported synaptic devices, PGM operation changes I D more abruptly compared to ERS operation. [43]Since evenly distributed I D levels are advantageous for multi-level weight representation, strong ERS pulses are employed for the initial weight transfer after the off-chip training.In other words, while transferring the weight obtained by software, a specific number of strong ERS pulses are applied to the Side Path device for each corresponding weight level.
Nevertheless, the excess charges at the side path region incurred during the weight transfer (strong ERS) may result in relatively unstable retention and degraded network accuracy after on-chip training, which will be shown afterward.To address these issues, after applying a number of strong ERS pulses, a final distinct PGM (6.5 V / 100 μs) pulse is introduced at the end of the weight transfer process to preemptively eliminate excessive charges in the side path region.Referred to as the "soft PGM," this pulse is exclusively utilized in the weight transfer process, setting it apart from the iterative application of weak PGM pulses during on-chip training.Red circle symbols in the left part of Figure 5b show the weight-transferred states by employing soft PGM.Conversely, the "soft ERS" (−6.5 V / 100 μs) scheme can be adopted after strong PGM pulses, as depicted by red circle symbols in the right part of Figure 5b.
As previously mentioned, transferring weights without readverify steps and relying solely on the number of pulses is advantageous in reducing the weight transfer burden.However, it may result in reduced network accuracy since the actual conductance may deviate from the target value due to device-to-device variation.To address this, additional on-chip training is conducted after weight transfer, as on-chip training is known to find an optimal point under the pressure of variation. [17]Unlike previously reported hybrid training approaches, [44] which utilize the same voltage for weight transfer and on-chip training, the Side Path device employs weak PGM/ERS pulses for on-chip training to overcome the following two issues observed with strong PGM/ERS pulses: 1) increased energy consumption due to repeated high-voltage operations, and 2) endurance failures, as shown in Figure 4i.The use of weak PGM/ERS pulses mitigates these issues, as the Side Path device can modulate weight at low voltages, unlike traditional Flash or A/N/O devices.Therefore, weak PGM/ERS pulses are utilized for on-chip training.
Emulating the on-chip training condition, the measurement results of repeatedly applying weak PGM/ERS pulses to the Side Path device are shown in Figure 5c,d, with each starting point corresponding to the strong ERS levels observed in Figure 5b.The use of weak PGM/ERS inhibits FN tunneling through the tunneling oxide, preventing endurance failure during on-chip training.Concurrently, weight modulation is achieved by injecting/ejecting charges through the side path region.Note that in Figure 5c, if soft PGM is not utilized during the weight transfer, starting from any level, applying weak PGM followed by weak ERS does not restore the I D to its initial level.This is because the excessive charges entering the Si 3 N 4 layer of the side path during the strong ERS operation significantly increase the I D .After the weak PGM removes these charges, the subsequent weak ERS pulses alone cannot induce the same amount of charges as initially injected.As shown in Figure 2f, repeated weak ERS pulses result in a saturated ΔV th (or I D ), which can be further changed by a single strong ERS pulse.The discrepancy in the range of synaptic weights during initial weight transfer and on-chip training negatively impacts the network training.On the other hand, if soft PGM is utilized during the weight transfer (Figure 5d), the I D can return to its initial value after weak PGM and ERS pulses.
Figure 5e shows the retention characteristics at several states of the proposed scheme.States 1-3 depict the retention of different strong ERS levels without soft PGM in Figure 5b, corresponding to the weight-transferred states after the off-chip training.Differing from the retention characteristics shown in Figure 4h, the retention is outstanding regardless of the number of pulses applied since the strong ERS pulses remove electrons trapped in the side path region.Notably, since the shallowly trapped electrons in the tunneling oxide region, which lead to short-term retention loss, are also removed, the retention is even better than the PGM state of the A/N/O device (Figure 4b). [38]State 4 depicts the retention of strong ERS level with soft PGM in Figure 5b.By preemptively removing charges in the side path region through soft PGM, more stable retention is demonstrated.Lastly, maintaining the trained weight even after on-chip training is also essential.State 5 shows the retention emulating post-on-chip training situation, which is also stable.
Using the proposed bias scheme and the corresponding measured data, a neural network simulation has been performed.First, in the off-chip training phase, a fully connected neural network for the MNIST database classification task is constructed and trained.Detailed methodologies are presented in the Experimental Section.Here, the weights are quantized to the measured strong ERS levels.When the trained weights are transferred to the hardware using strong ERS pulses with or without soft PGM, the device-to-device variations are modeled using the following equation: where I D is the actual transferred synaptic current, I D0 is the weight obtained by the off-chip training, and N(1,  2 ) is the Gaussian distribution with a mean of 1 and standard deviation of . [27]s depicted in open symbols of Figure 5f, as  increases, the test accuracy decreases due to the discrepancy between trained and actual weights.Second, in the on-chip training phase, the synaptic weights are updated following the weak PGM/ERS curves of Figures 5c,d.If the soft PGM is not applied during the weight transfer, the accuracy degrades after the on-chip training even when  is 0. In Figure 5c, the application of weak PGM/ERS pulses without soft PGM during the on-chip training causes the weights to deviate from their originally trained values from offchip training, which results in degraded accuracy.If the soft PGM is applied, the proposed hybrid training method effectively recovers the accuracy drop caused by the variation, reducing the accuracy drop by 6.3× at  = 0.5.Note that the strong (high voltage) PGM/ERS with power consumption burden is used only once when transferring the weights, and the on-chip training is conducted by low-power weak PGM/ERS.

Conclusion
We proposed a lateral migration-based Flash-like synaptic device and demonstrated its applicability to neuromorphic computing.By utilizing the direct charge transfer through the interface between the Si 3 N 4 charge trap layer and the poly-Si channel without a tunneling barrier, the proposed Side Path device is operable with significantly low voltage, making it suitable for low-power neural networks.Through a series of measurements, lateral migration of charges was verified.The degree of charge migration was further analyzed using the charge pumping method.Using the proposed method, a variation-robust hybrid off-chip/on-chip training method was proposed.The proposed Side Path synaptic device demonstrated promising potential as a synaptic device due to its low-power operation, improved retention, and enhanced endurance characteristics.

Experimental Section
Fabrication Process of the Device: The proposed devices were fabricated on a 6-inch Si wafer using conventional CMOS process technology (Figure S1 of the Supporting Information).First, a 200-nm thick poly-Si layer was formed on a 450-nm thick SiO 2 layer, which was grown thermally via a wet oxidation process.After patterning and boron implantation of the poly-Si layer for p-body formation, a 7-nm thick SiO 2 film and 5-nm thick Si 3 N 4 layer were formed to electrically isolate the p-body from the source and drain, and serve as the stop layer for the CMP process performed later.Then, n + -doped poly-Si for the source/drain of the device was deposited, followed by a chemical mechanical polishing (CMP) process to isolate the source and drain.Additionally, the thickness of n + -doped poly-Si was further lowered by chemical dry etching (CDE) for the formation of the curved channel.Before the deposition of channel material, the Si 3 N 4 and SiO 2 layers were removed via wet etch to expose the top of the p-body layer.Then, a 25-nm-thick amorphous Si layer was deposited as a channel material and re-crystalized by annealing at 600 °C for 24 hours.After the channel patterning, the high-dose BF 2 ion implantation was performed at the contact area of the p-body, which was followed by rapid temperature annealing (RTA) for dopant activation.After the device isolation, a 3-nmthick SiO 2 tunneling oxide layer was deposited.An additional mask was used to wet etch the tunneling oxide for the formation of the Side Path and A/N devices.Then, the charge trap layer of Si 3 N 4 and blocking oxide of Al 2 O 3 were deposited subsequently.A 40-nm thick layer of TiN was then deposited as a gate and patterned.After the deposition of 300-nm-thick silicon oxide using tetraethyl orthosilicate (TEOS) for inter-layer dielectric (ILD), contact holes for the gate, source, drain, and p-body were formed.Finally, Ti/TiN/Al/TiN metal wires were formed through sputtering and patterning.
Neural Network Simulation: A fully connected neural network with the size of 784-256-256-10 with a Rectified Linear Unit (ReLU) activation function and no bias was constructed based on the PyTorch framework.For the off-chip training phase, 60 000 train data were fed to the network for 100 epochs with a batch size of 100.The test accuracy was evaluated using the test set of 10 000 data.The weights were updated using the Mean Squared Error (MSE) loss, standard backpropagation, and stochastic gradient descent (SGD) method with a learning rate of 0.2.To accommodate for the eleven quantized levels of the strong ERS, the QNN training method was used, where the floating-point weights were quantized to the measured I D levels for every forward propagation during the off-chip training phase. [45]or the hardware implementation of the network, a differential synapse pair scheme was considered, in which the single weight was represented by the difference between the conductance of two synaptic devices. [46]The transferred conductance of both the synaptic devices representing positive and negative weights exhibit variation according to Equation (2).During the on-chip training phase, the network was trained for 2 epochs with a batch size of 1 and a learning rate of 0.1.In the standard software SGD method, the update of a weight value W was calculated as follows: where  is the learning rate and Δ is the gradient of the weight acquired by backpropagation.In the hardware neural network, since two devices compose a single weight, opposing update pulses were applied to the synaptic pair to prevent conductance saturation.If Δ was positive, a weak PGM pulse with its width proportional to Δ was applied to the positive synaptic device, and a weak ERS pulse was applied to the negative synaptic device, and vice versa.Also, the weight update is nonlinear with respect to Δ.For the applied pulse number n and the normalized synaptic conductance G in the range of between 0 and 1, the following equation describes the nonlinear conductance update. [47]= a + ln (n + c) ∕ When  is close to 0, Equation (5) converges to Equation (3).For the eleven levels of strong ERS obtained from the Side Path device (Figure 5b), accompanying weak PGM/ERS curves (Figure 5c,d) exhibited slightly different a and  values.During the on-chip training, each synaptic conductance was updated following Equation ( 5) with a and  determined by the initially transferred state.Without soft PGM, the average  of weak PGM and ERS are −6.62 and 18.06, respectively, and the average  of PGM was improved to −2.34 by adopting soft PGM.

Figure 1 .
Figure 1.a-1) Schematic view of the proposed Side Path synaptic device.a-2) Enlarged schematic view of the gate insulator stack of the Side Path device.For simplicity and emphasis on the side path, the curved channel is ignored and depicted as flat.ΔL O indicates the length where the tunneling oxide is extended over the source and drain.The figure illustrates a negative ΔL O case.b) Cross-sectional TEM image of the fabricated Side Path device.c) Measured transfer characteristics of the three different fabricated Side Path devices with different ΔL O s. d-1) Schematic of the stored electrons distribution during the PGM operation of charge-trap NAND Flash.d-2) Schematic of the stored electrons distribution during the PGM operation of the Flash device with Al 2 O 3 /Si 3 N 4 (A/N) as a gate insulator stack.Due to the absence of a tunneling barrier, electrons are directly injected.d-3) Schematic of the stored electron distribution of the proposed Side Path device.Electrons are directly injected from channel poly-Si through the side path and migrate laterally in the charge trap layer (Si 3 N 4 ) above the SiO 2 layer.

Figure 2 .
Figure 2. a) Measured transfer curves of the A/N/O device.9 V / 100 μs and -9 V / 10 ms pulses are required for PGM and ERS operation, respectively.Measured transfer curves of the A/N/O device with V PGM for A/N device (6 V) show no shift, indicating no tunneling of charges through the SiO 2 layer at 6 V. b) Measured transfer curves of the A/N device.6 V / 100 μs and -5 V / 100 μs pulses are required for PGM and ERS operations, respectively.c) Measured transfer curves of the proposed Side Path device.The Side Path device is programmable with A/N V PGM (6 V).The transfer curve shift on the negative V G (GIDL) side is larger than that of the positive V G (subthreshold) side.Applying A/N/O V PGM (9 V) results in a larger threshold voltage shift (ΔV th ) compared to the A/N/O device since electrons are injected through both the side path and tunneling oxide.d) Schematic of the write operation mechanism of the Side Path device.To emphasize the charge transfer, the curved channel shown in Figure1ais simply illustrated to be flat.e) Measured ΔV th of the Side Path device with different ΔL O s by repeatedly applying different V ERS pulses (Sold: V ERS = -6 V / 100 μs, Open: V ERS = -7 V / 100 μs) after V PGM (6 V / 100 μs) pulses are applied.ΔV th saturates after a certain number of V ERS , indicating lateral migration of charges.The higher V ERS (-7 V / 100 μs) increases ERS efficiency.f) Measured ΔV th after applying A/N/O V ERS (-9 V / 10 ms) pulses to remove migrated charges (Solid: V ERS = -6 V / 100 μs, Open: V ERS = -7 V / 100 μs).g) ΔV th extracted by repeatedly reading transfer curves for the Side Path devices with different ΔL O s after the V PGM (6 V / 100 μs) is applied.ΔV th decreases gradually as charges in the side path region discharge naturally.h) V th s of the Side Path, A/N, and A/N/O devices after several PGM/ERS cycles, which exhibit similar standard deviations.

Figure 3 .
Figure 3. Schematic of the charge pumping method.a-1) Increasing V G increases the depletion region of the device.a-2) Decreasing V G increases the accumulation region of the device.b) Schematic of the device and potential energy when a single charge pumping pulse is applied to the gate of the device.Increasing the gate pulse to V high causes the inversion of the device to x inv .Decreasing the gate pulse to V low causes accumulation of the device to x acc .c) Normalized I CP measured at the source side of the Side Path device after applying V PGM and V ERS at 2 MHz with rising and falling slopes of 10 ns/V.V low of the charge pumping is -2 V. d) Extracted local ΔV th from I CP before and after PGM.The x-axis represents the distance from the edge of the source to the center of the channel of the Side Path device.Charges are mainly distributed in the side path region, which is at the left of the dashed line.e) Extracted local ΔV th from I CP before and after ERS.Most of the charges in the Si 3 N 4 region are emitted.

Figure 4 .
Figure 4. a) Energy band diagram of the A/N/O device during the retention state.Trapped electrons in the charge trap layer (Si 3 N 4 ) cannot be easily emitted due to the SiO 2 barrier.b) Measured retention characteristics of the A/N/O device.c) Energy band diagram of the A/N device.Trapped electrons in the Si 3 N 4 layer are easily emitted due to the absence of a barrier.d) Measured retention characteristics of the A/N device.e) Schematic of trapped electrons in the Side Path device during the retention state.f) Comparison of the retention characteristics of the Side Path device after applying V PGM (6 V / 100 μs × 1000).Retention of ΔV GIDL is significantly worse than that of ΔV th , indicating charges are mainly emitted through the side path region.g) Comparison of the retention characteristics of the Side Path device after applying V PGM (7 V / 100 μs × 1000).h) Measured retention characteristics of the Side Path device.The retention characteristics of the Side Path device show significant improvement compared to the A/N device.i) Comparison of measured SS of the A/N/O device and the Side Path device after applying repeated V PGM and V ERS pulses.SS of the Side Path device does not change even after 10 7 pulses.j) Energy band diagram of the A/N/O device and A/N device during PGM operation.Trap generation at the SiO 2 layer of the A/N/O device causes device degradation.

Table 1 .
Comparison of operating voltages, retention, and endurance characteristics of the A/N device, A/N/O device, and the proposed Side Path device.

Figure 5 .
Figure 5. a) Overall sequence of the hybrid training using the Side Path device.When transferring weights to the Side Path synaptic devices by strong ERS pulses, electrons can tunnel through the tunneling oxide.During the on-chip training, charges move by lateral migration and through the side path by weak PGM/ERS.b) Response to repeated strong ERS (−8.7 V / 10 ms) and strong PGM (7.5 V / 100 μs) pulses, without and with soft PGM(ERS) after strong ERS(PGM).The device is initially fully programmed by sufficient strong PGM pulses.The left part of b represents the weight-transferred states after the off-chip training.c) and d) show the response to weak PGM (6.5 V / 100 μs) and weak ERS (−6.5 V / 100 μs) pulses, starting from each weight-transferred state in b without (c) and with (d) soft PGM.e) Retention characteristics of the Side Path device starting from the states depicted in b and d. f) Test the accuracy of the neural network using the proposed Side Path device.The proposed hybrid training scheme mitigates the accuracy degradation even when device-to-device variation increases.
) a, c, and  are fitting parameters, where  indicates the nonlinearity.Considering n ← n -Δ and eliminating n from Equation (4), the following equation for the weight update of hardware was derived.exp ( (G − a))