Recent Advancements in Emerging Neuromorphic Device Technologies

The explosive growth of data and information has motivated technological developments in computing systems that utilize them for efficiently discovering patterns and gaining relevant insights. Inspired by the structure and functions of biological synapses and neurons in the brain, neural network algorithms that can realize highly parallel computations have been implemented on conventional silicon transistor‐based hardware. However, synapses composed of multiple transistors allow only binary information to be stored, and processing such digital states through complicated silicon neuron circuits makes low‐power and low‐latency computing difficult. Therefore, the attractiveness of the emerging memories and switches for synaptic and neuronal elements, respectively, in implementing neuromorphic systems, which are suitable for performing energy‐efficient cognitive functions and recognition, is discussed herein. Based on a literature survey, recent progress concerning memories shows that novel strategies related to materials and device engineering to mitigate challenges are presented to primarily achieve nonvolatile analog synaptic characteristics. Attempts to emulate the role of the neuron in various ways using compact switches and volatile memories are also discussed. It is hoped that this review will help direct future interdisciplinary research on device, circuit, and architecture levels of neuromorphic systems.


Introduction
Artificial intelligence has currently become widespread and has permeated social life. Electronic devices are connected among each other, wirelessly and via other networks, and can constantly communicate. Thus, a substantial amount of data is generated every second worldwide, and the data creation period is shortening. With the unprecedented explosion in data, a new industry has been launched to extract more valuable information and utilize it beyond simply storing and managing data traffic worldwide. For example, driving skills of autonomous vehicles have advanced rapidly by recognizing information about the surrounding environment that is constantly being input to the system in real time and accurately classifying them into specific objects and signals. One of the reasons for the new wave of data-centric paradigms was the development of semiconductor technology in the past few decades. The performance and cost of transistors, a representative semiconductor device, have been improved due to Moore's law scaling. [1] Consequently, several innovative products have been manufactured at reasonable prices, thereby creating numerous derivative industries. More specifically, increasing the number of tiny transistor elements integrated into a given silicon chip allows more versatile processing and arithmetic operations per clock cycle to be performed promptly. The memory elements based on laterally scaled and vertically stacked structures can also significantly increase memory capacity. [2,3] As we advance into the big-data era, the demand for improved performance of computing systems primarily consisting of these two fundamental components, i.e., central processing units (CPUs) and memories, to handle the exponentially growing amount of data is increasing. However, in the conventional von Neumann computing architecture, data executed at the CPU must be frequently moved back and forth to the memory for storage, which can lead to a memory wall or the von Neumann bottleneck, as shown in Figure 1. [4] Power-constrained computing systems are gaining further importance because all electronic devices should function continuously in alwaysconnected environments. Analyses of the workload of traditional computing systems have clearly indicated that real-time applications, such as hand-tracking services and audio recognition, consume more than half of the total energy when moving and storing their data rather than performing computations. [5] These problems have necessitated the development of new computing systems to overcome power inefficiencies by minimizing the sequential processing.
The implementation of the energy-efficient processor was based on the basic structure of the brain, which comprises biological synapses numbering on the order of 10 15 connected to neurons on the order of 10 11 . [6] The data in the form of synaptic DOI: 10.1002/aisy.202000111 The explosive growth of data and information has motivated technological developments in computing systems that utilize them for efficiently discovering patterns and gaining relevant insights. Inspired by the structure and functions of biological synapses and neurons in the brain, neural network algorithms that can realize highly parallel computations have been implemented on conventional silicon transistor-based hardware. However, synapses composed of multiple transistors allow only binary information to be stored, and processing such digital states through complicated silicon neuron circuits makes low-power and low-latency computing difficult. Therefore, the attractiveness of the emerging memories and switches for synaptic and neuronal elements, respectively, in implementing neuromorphic systems, which are suitable for performing energy-efficient cognitive functions and recognition, is discussed herein. Based on a literature survey, recent progress concerning memories shows that novel strategies related to materials and device engineering to mitigate challenges are presented to primarily achieve nonvolatile analog synaptic characteristics. Attempts to emulate the role of the neuron in various ways using compact switches and volatile memories are also discussed. It is hoped that this review will help direct future interdisciplinary research on device, circuit, and architecture levels of neuromorphic systems.
weight (w) is transferred neuron-to-neuron through the synapses in parallel. When the sum of the weights in the neuron exceeds a certain threshold, the neuron responds by generating signals and passing them to other synapses. Because of the parallelconnected synaptic configuration, high-level cognitive functions in the brain can be performed by consuming only tens of watts. [7] Based on the expectation of attractive low-power benefits, understanding the brain's structure and essential roles has initiated the development of neuromorphic algorithms through the building of artificial neural networks. [8][9][10][11][12] The input and output neurons in each layer are linked through hidden layer neurons, which is a perceptron structure. [13] The basis of the neural network algorithms is to classify specific outputs by multiplying input vector signals and synaptic weights through forward propagation. Each neuron plays a role in linear (or binary) classification to determine whether to continuously process the signal based on the sum of the calculations. By inserting more hidden layers to perform additional perceptron processes, the multilayer neural network enables the solution of complex problems and extension of the functionalities to logical functions such as Boolean logic. Thus, such deep neural network (DNN) algorithms outperform conventional methods specifically in case of recognition and classification tasks to determine the desired output from unknown inputs.
The algorithm relies substantially on iterative arithmetic calculations such as vector-matrix multiplication (VMM), or multiply-accumulate (MAC) operation, which runs on graphics processing unit (GPUs)-based platforms [9] that are appropriate for parallel processing or application of specific integrated circuits. [14] The time-consuming computation is architecturally accelerated by using the cross-point array architecture in which synaptic elements are positioned between lines carrying input and output signals crossing each other. [15] Neuromorphic hardware systems are simply described as having multiple synaptic arrays as weight matrix blocks, as shown in Figure 1. [16] Neurons located on the edge of each array convey inputs and outputs to communicate with other segments. The voltage inputs via the word lines (WLs) in parallel reach the synapses and are subsequently multiplied by the stored synaptic weight encoded in the form of the conductance (G), according to Ohm's law. Unlike normal memory operations in the cross-point array that read conductance at a single selected cell, the multiplication takes place at every cross. The weighted sum current as a result of the sum of each output along the bit line (BL), based on Kirchhoff 's current law, is fed to peripheral circuitries (e.g., analog-digital converters and multibit sense amplifiers) serving as the neuronal element. When the output results differ from the expected values, the signal moves back to the synaptic array, and the synaptic weights are adjusted using a gradient descent method to reduce errors based on the back-propagation algorithm, [8] which is the method used by the neuromorphic system to learn newly acquired information and provide accurate inferences. VMM operations are performed where the weights are physically stored, alleviating memory wall problems. [17] Therefore, for the in-memory computing platform based on the cross-point array architecture, [18] selecting the appropriate devices as the Figure 1. Transition to non von-Neumann architecture where the multiple synaptic array blocks for executing VMM in the place where the memories are stored in a similar manner are implemented, thereby eliminating memory wall bottleneck. Instead of binary synaptic weights based on SRAM, nonvolatile analog synaptic weights are preferred to maximize hardware performance in the view point of recognition accuracy and power efficiency. Single transistor structure or the resistive memory connected to either the transistor (1T-1R) or selector (1S-1R) configurations can be suitably used for the architecture, as shown in the bottom box. The portion of the neuronal elements also needs to be compact by exploring new devices and volatile memories, as shown in right box.
www.advancedsciencenews.com www.advintellsyst.com fundamental building blocks for synaptic and neuronal elements is important for implementing the neuromorphic systems in hardware.
Recently, significant advances in neuromorphic hardware have been successfully reported and demonstrated. Most studies used static random-access memory (SRAM) with eight transistors arranged as the synaptic device. [19][20][21] However, the SRAM with digital synaptic weights "0" and "1" is unable to satisfy the numerous parameters used in the algorithms. [21] Although the single transistor unit has been significantly reduced to a few nanometers of technology nodes, [22] the large footprint occupied by multiple transistors creates an area overhead. This problem has garnered significant attention to emerging memory technologies for compact and analog weight storage. [23][24][25][26][27][28][29] Notably, the newly available memory options are based on resistance changes, in contrast to the conventional storing of charges in a capacitor or floating gate. [30,31] Most resistive memories are thus essentially simple metal-insulator-metal systems, which allow the highest memory capacity in the lowest occupied cell area. The specific denotation of each resistive memory is determined by how the material systems respond to external electrical stimuli. Magnetic random access memory (MRAM) [32] utilizes the orientation of the spin while the rotating objects become dipoles in ferroelectric memory devices. [33] The reversible phase transition between amorphous and crystalline states in chalcogenide materials leads to a difference in resistance, known as phase change memory (PCM). [34] Ion migration in most nonstoichiometric materials, driven locally or globally by an electric field, enables the resistance change as in resistive switching RAM (RRAM) [35,36] or electrochemical RAM (ECRAM). [37] State-of-the-art resistive memory technologies, excluding the ECRAM, have been integrated into %20 nm nodes. [38][39][40][41] For a fair and systematic comparison, the latest SRAM is assumed to scale up to a few tens of nanometer nodes. [42] The accelerator performances are benchmarked while considering end-to-end design options from the device-and circuit-to algorithm-level. Unlike the SRAM, the assigned multiple weights are retained even when the power supply is turned off, thereby minimizing standby leakage power. [43] This implies that by using the resistive synapses that function optimally with the cross-point array architecture, the entire system can afford superior throughput and energy efficiency.
The neuron node adjacent to the synaptic array is often neglected in the neuromorphic system study. After the analog computation in the cross-point array, the weighted sum current at the end of each BL should be processed (e.g., converted to voltage spike or digital pulse), [44] which is a vital role of the biological neuron that receives the current from the synapses and thereafter decides whether to activate an action potential to the next neurons in the neural network. Typically, the silicon complementary metal-oxide-semiconductor (CMOS)-based neuronal circuits comprising tens of transistors with a capacitor are used for implementing the integrate-and-fire neuron model. [45] The weighted sum current is first integrated into the capacitor placed at the end of the BL. When the charged voltage exceeds the threshold, digitized output voltage spikes are generated through the circuitry. By counting the number of the output spikes that are designed to be proportional to the amplitude of the read-out current, the neuron node is capable of determining the output firing strength following activation functions such as sigmoid, tanh, softmax, and rectified linear unit. [46] However, the complex neuronal circuit with a capacitor clearly occupies a substantially larger footprint than the BL pitch of the cross-point array. The pitch mismatch problem inevitably causes a single neuron node to be shared with multiple BLs, which implies that the weighted currents computed in parallel from the synaptic arrays have to be sequentially processed.
Herein, we first discuss the advances in the PCM and RRAM, where significant progress has been achieved, to address the requirements of the neuromorphic synaptic devices in Section 2. Recent strategies based on the prominent specific characteristics of other candidates such as ECRAM, ferroelectric memory, and MRAM to overcome relevant challenges have also been explored. In Section 3, we have introduced studies that explored compact neuromorphic neuronal devices based on either two-terminal switches or volatile memories, highlighting the advantages of these devices from an area and energy perspective. Finally, we have concluded the article by indicating future study based on the current status to boost neuromorphic system performances.

PCM
Emerging resistive memory technologies are well-developed in the order of MRAM, PCM, and RRAM from a typical memory application perspective. However, in neuromorphic applications, the PCM led to the introduction of new analog synaptic weight elements by identifying and defining new important characteristics (e.g., linearity and symmetry) as well as conventional requirements for the memory functions (e.g., endurance and retention). The resistance of the PCM depends on the crystal structure of the chalcogenide materials such as Ge 2 Sb 2 Te 5 (GST). [47] In general, it is relatively easy to transmit electrons in a crystalline state, whereas the electrical conductivity is lowered when the structure is transformed to an amorphous state. The two phases can be reversibly changed by first melting the solid-state chalcogenides into a glassy state and subsequently controlling the time required for the ions to be rearranged. To effectively generate heat, a confined electrode serving as a heater is normally used to maximize the current density by reducing the region in which current flows. Applying a pulse that drives a current induces Joule heating, and the phase near the electrode begins to melt, resulting in a mushroom-shaped switching area. Instant cutting off of the pulse satisfies the glassy state of the chalcogenide. It results in a significantly disordered amorphous state, showing a high resistance state (HRS), known as a reset process. Meanwhile, when sufficient time to relocate the ions to a thermodynamically stable position is provided during the molten state, the crystalline state can be formed to obtain a low resistance state (LRS), known as a set process. The analog behavior in the PCM was observed by subdividing and finetuning the intermediate pathways that changed from the HRS to LRS, or vice versa. It was possible to experimentally achieve a distinguishable 3-bit state corresponding to the synaptic weight precision by elaborately adjusting the switching current directly related to the volume of the phase transition. [48] Two important stages are performed in the neuromorphic systems implemented with the cross-point PCM synaptic arrays, as shown in Figure 2a,b. [49] In the inference phase, weights predefined from the software or external cloud servers, which is a training (or learning) process, are assigned to each PCM device and mapped to the array to extract the correct value according to input patterns after the VMM execution. The capability of the multiple weights in the PCM allows more numerous and complicated input patterns to be recognized accurately. The accuracy and robustness of the inference are thus related to the state-stability of each state. However, despite the exclusion of the disturbance contributed by accumulative stress induced by the repeated input voltage, the states in the PCM were drifted to the HRS over time due to structural relaxation of the amorphous phase, [48] making it difficult to ensure each state with a reasonable margin of error. To improve state-stability, an additional metallic liner was introduced to mitigate the drift. Consequently, nearly negligible drift and noise reduction were achieved. [50] In addition to inference accelerators, where the system recognizes and categorizes provided information, there is a demand for the systems to respond in real-time to unknown trends. Because the power consumption is mostly hindered by data movement, the training should be performed within the hardware itself. In the training phase, the synaptic weight within the provided dynamic range of the multilevel states is updated and plays an important role in achieving high recognition accuracy ( Figure 2c). [49] The resistance in the PCM was freely modulated in both upward and downward directions, but the amplitude of the reset pulse must occasionally be higher than the previous step ( Figure 2d). [48] Identifying a specific-state first and changing the pulse conditions appropriately became an area overhead in the peripheral circuit and extra burden on its complexity, consuming more power and increasing latency. Therefore, the state should be updated only by the number of identical pulses having similar widths and magnitudes. In the PCM, however, different switching dynamics from crystalline to amorphous or vice versa caused an asymmetric response in the resistance states. [47] When the identical set pulse was applied to the initial HRS of the PCM, the partially crystallized portion expanded in direct proportion to the pulse number. In the situation in which this conductance increase was defined using potentiation, the intermediate states were controllable. Meanwhile, the identical reset pulse applied for depression, which refers to a decrease in conductance, caused a rapid drop in resistance from the LRS and reached the HRS promptly. Specifically, the degree of the change in conductance during the potentiation was initially high during the pulse event and thereafter reduced, resulting in a nonlinear response. This implies that the PCM devices, which have the states close to the LRS in the array, are not trained properly, thereby degrading the recognition accuracy of the Neuromorphic systems are generally based on two important stages: a) inference and b) training based on forward and backward propagation algorithms, respectively. The input signal vectors (x A ) of the input neurons drive analog weights (w) to next hidden neurons. The simple sum of each weight multiplied by the vector is performed on the neurons. When the output signal vectors (x D ) differ from the expected values (g), the signals go back to the synapses and adjust the weights to reduce the error term (δ). c) Here, how the weights are updated plays an important role in achieving high recognition accuracy for the training and learning of new information. The weights should be modulated linearly and symmetrically. d) The multiple conductance states were achieved using an identical pulse for potentiation, while the pulse amplitude needed to be increased for depression. Reproduced with permission. [48] Copyright 2018, AIP Publishing. e) Therefore, for hardware implementations, the weight is encoded by a pair of two PCM devices. The PCM device for G þ is used to increase the conductance, whereas the other PCM device for G À is intended to lower the conductance. In other words, the input vectors (x i ) in the form of voltage are applied and multiplied by the weight in the form of conductance (G) assigned to each PCM synapse. Then, actual weight is represented by subtracting G À from G þ . Reproduced with permission. [49] Copyright 2015, IEEE.
www.advancedsciencenews.com www.advintellsyst.com system. Moreover, the amount of increase or decrease in any given state of the PCM should be similar because the state of the PCM does not consistently change in a similar direction in the systems. However, due to the weak linearity and symmetry of the PCM, training cannot be effectively conducted.
One of the approaches used to overcome the asymmetric response of the PCM was to only use a potentiation regime that exhibits analogous conductance by periodically resetting (or refreshing) all information to its original state. [51] For this technique, a pair of two PCM elements for positive conductance (G þ ) and negative conductance (G À ) comprise a single synaptic device to encode actual weight (w ¼ G þ -G À ) and also represent its negative value (Figure 2e). The weight was increased to a target value by a single step of applying the identical pulses. To lower the weight, depression was performed using a two-step method in which the both PCMs were reset to the initial state. Thereafter, one of the PCMs in the pair, which is responsible for the positive conductance, was only activated again by the pulses while the other PCM representing negative conductance maintained its state. A multilayer perceptron comprising 500 Â 661 PCM arrays using the technique has been experimentally implemented. [49] A recognition accuracy of %82% was achieved for Modified National Institute of Standards and Technology (MNIST) dataset; however, it was lower than the expected level of 97% due to imperfect PCM device characteristics.

RRAM
The relevant findings, as detailed in previous sections, regarding the device guidelines for an analog synapse using the PCMs have highlighted the potential of RRAMs. In case of the RRAMs, which generally represent devices that use oxygen vacancies (or oxygen ions) as mobile species, oxygen vacancies are created by breaking the bonds between metal and oxygen either at the bulk oxide or interface. [35,36] Alternatively, cations are supplied from electrodes such as Cu or Ag outside the materials, which is known as conductive-bridge RAM (CBRAM). [52] Whether the mobile species are anions or cations, the ions driven by the applied set field are clustered, eventually bridging the two separate electrodes. Instantaneously, high current can thus be observed in the RRAM through the formation of a conductive filament. Meanwhile, as the opposite reset voltage spreads the oxygen vacancies from the filament, the filament starts to dissolve through an electrochemical reaction. The current flow is limited as the filament is disconnected. In general, a compliance current that limits excess current over a preset value is applied to the RRAM to prevent permanent breakdown. The magnitude of the compliance current directly determines the amount of current flowing through the RRAM, which implies that the size of the filament is provided. As the filament thickens by increasing the compliance currents, a lower LRS is continuously achieved. In contrast, the higher negative voltage removes more oxygen vacancies from the filament, thereby forming a switching gap between the electrode and the remaining filament. The extended gap can have multiple HRS in the lower direction.
Through using a cross-point array with only a single RRAM [53,54] or one-transistor and one-resistor (1T-1R) configuration, [55][56][57][58][59] diverse classification and recognition features and functions have been explored and demonstrated experimentally. A two-layer perceptron has been constructed by the building of 128 Â 64 Ta/HfO x /Pt (from top to bottom) based 1T-1R arrays. [55] The conductance toward a higher level was precisely tuned by the gate voltage of the monolithically integrated transistor, as shown in Figure 3a. Due to the use of the two pairs of the RRAM as the single synaptic element discussed in case of the PCM, the conductance in the lower direction was achieved by first applying the reset pulse to initialize the state, and the gate voltage was thereafter increased. The tunable linear and symmetric update of the conductance with minimal variation allowed the hardware neural network to be trained properly, experimentally achieving an accuracy of 91.71% of the MNIST dataset.
Although the inference task has been successfully demonstrated using the well-trained analogous conductance states, the neuromorphic hardware system can further be made to be energy efficient by making a device environment, where the weight update can be driven by an identical pulse scheme, [60] as discussed earlier. As identical pulses are successively addressed to the HfO 2 -based RRAM, the asymmetric response of the conductance due to nonlinear potentiation was observed, [56] which was an exactly opposite property of the PCM, as shown in Figure 3b. Once abrupt conductance jumped at the initial pulse due to the formation of the filament, no more conductance increase was observed in the potentiation. The conductance was adjusted by the number of negative pulses and the slope of the decrease in conductance was determined by the amplitude and width of the pulse. A microscopic physical description of the RRAM that investigated the link between the filament evolution and the electrical behavior revealed that the formation of a strong filament caused the binary state during the potentiation. [61] In contrast, it was discovered that an alternative scenario, where the radial size of the filament is changed, is preferred to have a linear current response. The first attempt was to engineer the filament dynamics from the next cycle as desired because an abruptly grown filament in the initial state was difficult to control in working principle. Introducing an additional barrier layer of AlO x featuring a slower oxygen mobility compared with that of the HfO 2 caused the dissolution process of the filament during the reset to be retarded. [56] It resulted in an incompletely disconnected filament. In the subsequent set cycle, the weakest constriction part of the filament, where the bilayer was contacted, was to be a plausible switching region by moving back-and-forth in the vacancy while the filament was still connected between two electrodes. Instead of growing in a vertical direction, the lateral expansion of the filament was discovered to be facilitated to update the conductance linearly, depending on the identical pulses. Other methodologies to manage the generation and migration of the oxygen vacancies in the initial stages, prior to these vacancies making the strong filament, have been proposed. By using a thermal barrier of TaO x with low thermal conductivity, the heat that is produced during the device operation can be confined into the HfO 2 layer. [62] The heat spreads the distribution of the vacancies extensively while the vacancies were electrically driven to form the filament as is normal. The laterally expanded filament shape seemingly enabled the analog set transition in the I-V curve and pulse switching. In addition, to exploit the temperature as another kinetic terminology in ion transport, formation energy of the vacancies was reduced to lower the probability of generating the vacancies using an electric field. [63] It was realized that by incorporating dopants into the HfO 2 matrix, bonding strength was reduced. The uniformly distributed dopants facilitated the broad making of multiple filaments, resulting in analogously updated behaviors in both polarities. Even at a high temperature, the multiple states were distinguishable, ensuring the information at the peripheral sensing circuit was accurate. Thus, hardware systems with eight processing blocks comprising 128 Â 16 TaO x /HfO 2 -based 1T-1R analog synapse arrays were successfully integrated to implement a five-layer convolutional neural network to perform MNIST image recognition. [58] The clear distribution of 1024 RRAM devices in 5-bit state within the conductance range of 100-900 μS without any overlap was also achieved by an identical pulse train with a substantially fast speed of 50 ns, as shown in Figure 3c. Consequently, a high accuracy of more than 96% can be achieved. More importantly, the neuromorphic systems indicated that more than two orders of magnitude resulted in better power efficiency, whereas one order of magnitude resulted in better performance density compared with the CPU-based accelerator.
Similar hardware performance was also verified through mass-produced Ta 2 O 5 /TaO x -based 1T-1R synapses. [59] The uniform analog states linearly tuned from 20 to 50 μA with a verification technique that allowed a maximum recognition accuracy of 90% on the MNIST database. The 180 nm Ta 2 O 5 /TaO x -based RRAM exhibited a similar number of synapses per unit area when compared with a 65 nm SRAM. However, due to the reduced operational power of the RRAM device, the efficiency in performance and acceleration inference was more than three times, which was sufficiently high to enable a real-time recognition service. Furthermore, due to the local filamentary switching, the RRAM was scaled in a 40 nm test-chip, the efficiency running the neural network workloads can be further improved. In Table 1, reported array-level RRAM-based synapses were compared to identify the normal range of the conductance states and the pulse conditions that were typically required to control the states in most of the HfO 2 device stacks.
Considering the operating power, the maximum conductance of the synaptic device is one of the key governing factors in determining and boosting the neuromorphic hardware performance. This is because if the conductance is significantly high, the size of the transistor of the 1T-1R and peripheral circuit (e.g., multiplexer) should be increased to avoid voltage drop. [42] Significant area overhead occurs and the systems operate slowly, resulting in longer latency and reduced throughput. Accordingly, the noticeable advantage of the RRAM over the PCM is a lower operating current due to non-Joule heating-related switching mechanism, implying synaptic weights in a lower conductance range. However, in practice, non-negligible parasitic components such as line resistance are involved in the cross-point array. [66] The voltage drop due to the line resistance is spontaneously increased when the feature size of the interconnect line is scaled. In the column of the array nearest to the voltage source, most of the Figure 3. a) For 1T-1R configuration, the conductance of the RRAM-based synapse was tuned by the gate voltage of the monolithically integrated transistor. Scale bars of 10, 2, and 0.2 μm in the third, fourth, and fifth images, respectively. Reproduced with permission. [55] Copyright 2018, Springer Nature. b) Materials and device engineering of the RRAM underlying filamentary switching mechanism allowed the conductance to be adjusted linearly and symmetrically through identical pulses. Reproduced with permission. [56] Copyright 2016, IEEE. c) The 5-bit state achieved at fast speeds of tens of ns in the obtained conductance range of the RRAM with TaO x /HfO 2 stack was highly reliable. Reproduced with permission. [58] Copyright 2020, Springer Nature.
www.advancedsciencenews.com www.advintellsyst.com applied read voltages are delivered properly to the synaptic devices without any noticeable loss, accurately executing the multiplication. However, the read voltage decays along the line, and the voltage is significantly lowered in the farthest cell. The weighted sum current is thus lower than expected because the lowered read voltage is multiplied even though the given weight remains unchanged. It has been reported that the operating current of the RRAM can be reduced to %1 μA. [67] Note that the low current operation in the RRAM indicates that the filament weakly comprises a lesser number of vacancies and no longer ensures metallic behavior exhibited by the stronger filament clustered from denser vacancies. Consequently, in the current-voltage (I-V ) curve, the current at the LRS started to get distorted nonlinearly with respect to the voltage. It caused the conductance measured at the reduced read voltage to be lowered exponentially, and deviation of the actual computed weighted sum results became pronounceable. Therefore, studies have been conducted to carefully design electrode materials that can modify the conical shape of the filament to dissipate heat appropriately [68] or to compensate the nonlinearity with circuits. [69] Strengthening the I-V trace of the RRAM linearly allows the achievement of constant conductance, which can be less affected by the voltage drop. Nonideal factors such as nonlinearity, asymmetry, and limited conductance range have been intensively studied in device and system-level analysis, [70] but reliability concerns such as data retention, cycling endurance, variability, and failure have been less discussed and explored. [71,72] The conductance states can be affected in unexpected ways by various reliability issues. For simplicity, the conductance degradation trends were categorized in two major ways by considering whether the weighted sum current was consistently changed toward a certain direction or not. When the external parasitic components such as line resistance or conductance drift were considered at a specific BL, the output current was always changed uniformly due to the lowered input voltage or changed conductance with respect to the time, causing accuracy deterioration. Due to the consideration of the variation of the RRAM as a true stochastic behavior due to an inherent working principle, each weight could either be lower or higher than the criterion (e.g., the median). Thus, the lowest weight was compensated by the highest weight at the selected BL. This result indicated that the total weighted sum at the end of the BL was near the expected value. This explains the reduced effect of the nonuniformity of the individual devices on the accuracy of the recognition. The non-uniformity can introduce advantages that can help overcome the challenge depending on neural networks. [73] Learning with a gradient descent scheme allows finding the optimum value defined by the global minimum; however, the learning process can converge to the local minimum level and be stuck while finding a route. The variation in the weights caused by non-uniformity can act as the driving force to escape the minimum area.
To minimize spatial and temporal variations in the filamentary RRAMs that affect accuracy, a novel material engineering technique was proposed. Instead of filaments formed randomly during device operation, dislocations in the material were deliberately threaded to confine the path. [74] Thus, ions preferred to travel through the 1D channels, significantly improving uniformity. In the early stage of the RRAM-based synaptic element, an electrical barrier, such as Schottky barrier modulation, which is smoothly controlled by the movement of ions in the entire active area along the electric field, was used as a more uniform switching mechanism. [64,65,75] The gradual conductance update that is proportional to the number of identical pulses was available, but the conductance increase (or decrease) was substantial at the very first step from the initial HRS (or LRS), respectively. It resulted in a highly asymmetric conductance response versus the pulse number. A slow speed of a few ms to drive the ions over the entire area was also another critical problem (see Table 1). Therefore, the exploration on the interfacial mechanism has been rarely studied currently in two-terminal device structures, but a similar ion movement mechanism has regained substantial attention and expanded by using a three-terminal structure and new materials, as will be discussed below.

ECRAM
The ECRAM, which was only designed for ultimate linear and symmetric synaptic characteristics, has been proposed due to the need to improve the controllability of ion transport. By using the traditional three-terminal transistor structure, the channel 1T-1R TiN/HfO 2 /AlO x /TiN 1T-1R Ir/ Ta  The value was represented in the form of current; b) In the 1T-1R configuration, bit-line voltage (V BL ) was applied for the potentiation, whereas source-line voltage (V SL ) was used for the depression. The state was tuned by adjusting the gate voltage (V g ).
www.advancedsciencenews.com www.advintellsyst.com conductance between the source and drain is precisely tuned by the number of mobile ions provided by the gate. Motivated by the principle of a solid-state ion battery in which mobile Li ions stored at the cathode are transported to the anode, [76] a channel material of LiCoO 2 , which is capable of providing Li ions due to weak bonding, was used as the channel material, [37] as shown in Figure 4a. To promote effective migration of the Li ions, LiPON material was used as an electrolyte. When a negative voltage was applied to the gate and source (V GS ), the intercalated Li ions in the LiCoO 2 channel were pulled to the gate, which was a write operation. In the empty position where the Li ion was released from the LiCoO 2 , the valence of Co ion was changed from 3 þ to 4 þ to maintain charge balance, generating positive charge. When the n-type oxide semiconductor MoO 3 was used as the channel, due to the formation of the electrons at the positions where Li ions were removed, an increase in conductance was observed by applying a positive gate voltage. [78] The read path was decoupled with the write operation by applying voltage to the drain and grounding it to the source with a zero V GS signal source. The current (I SD ) flowing between the source and drain separated by a long channel distance of 2 μm can thus be analogously adjusted by the proportionally modulated quantity of the Li ions moved under the number of applied V GS signals.
As shown in Figure 4a, the conductance continued to increase when V GS was simultaneously provided. A steady and constant current was observed when the gate voltage was removed to identify the altered channel state. In general, the changed conductance lasted for several weeks, and it was expected to continue to last for several months.
Other mobile cations such as H ion that emulates the role of the Li ions have also been examined, as shown in Table 2. [77,[79][80][81][82] Unlike the Li ions embedded in the host material, the gate voltage pushed the cations (e.g., H ion) toward the bottom of the electrolyte of WO 3 while attracting the electrons to the top of the channel. [82] It was discovered that the film quality and physical properties of each layer played a crucial role in determining the dynamic range of conductance. Recently, a fully CMOS compatible ECRAM device was reported by exploiting fab-friendly oxygen anions and metal oxides as the mobile source and electrolyte/ channel, respectively, as shown in Figure 4b. [77] The ECRAM satisfied the requirements of the basic synaptic characteristics, and it was also experimentally demonstrated in small-sized arrays. [83] However, the achievable conductance range and operating conditions such as speed and voltage seemed to be strongly and sensitively correlated to the materials and geometry of each layer. Therefore, it is important to consider and design a material Figure 4. a) A voltage was applied to the gate, and source was grounded for a write operation, thereby providing mobile ions from the gate. The degree of the conductance change in the channel was fairly constant as a function of identical gate pulse in the Li cation-based ECRAM, which implied that the weight update was truly linear and symmetric. Scale bar, 500 nm. Reproduced with permission. [37] Copyright 2017, Wiley-VCH. b) The nearly perfect linear conductance update response has been also shown in fully CMOS compatible material stacks using oxygen anions as the mobile source. Reproduced with permission. [77] Copyright 2019, IEEE.
www.advancedsciencenews.com www.advintellsyst.com aimed at specific applications such as defining the required array size and implementing appropriate drive circuitries. Moreover, similar to the challenge of the interface-type RRAM, the ECRAM required a long pulse to drive ions through the entire area. Although the operation was demonstrated in less than 10 ns, the tunable conductance range became very short as a result of a trade-off relation.

Ferroelectric Memory/MRAM
When the device structure is not limited to the compact twoterminal structure, it is expected that highly uniform and reliable synaptic characteristics will be achieved by exploring domain switching dynamics in ferroelectric (or magnetic) materials instead of the ion migration that accompanies the inevitable inherent stochasticity. As the voltage is applied to the ferroelectric oxides such as PbZrTiO 3 or SrBiTa 2 O 5 , the dipoles in the material begin to be rotated. [33] The orientation of the dipoles aligned in a similar direction allows a spontaneous polarization and holds the state even when the voltage is removed. The ferroelectric material can directly be implemented to the gate dielectric of the transistor, resulting in ferroelectric field-effect transistor (FeFET). [84] However, the complex ternary oxide systems require a substantial thickness (greater than 100 nm) to realize the ferroelectricity, making it difficult to integrate into the FeFET structure. The discovery of the ferroelectric behaviors in thin HfO 2 materials (less than 20 nm) has led to the renaissance of the ferroelectric memory toward the semiconductor industry [85] To realize the ferroelectricity, it has been believed that phases of the HfO 2 film need to be transformed to a particular orthorhombic phase. Additional dopants such as Al, Zr, and Si have been proposed to facilitate structural evolution and stabilize the metastable phase, as well as high-temperature (or pressure) annealing as a driving force. [86] Due to the use of the HfO 2 for the gate dielectric, a thinned ferroelectric layer that meets CMOS compatibility simultaneously enables the scaling potential to be regained. It was proved that the latest FeFETs have been demonstrated in 14 [87] and 28 nm [39] technology nodes.
The FeFET operation is the similar to that of the FLASH memory. Applying positive gate voltage (V g ) not only induces the channel inversion in the p-type silicon substrate as is normal but also causes spontaneous polarization in the ferroelectric gate dielectric that promotes the accumulation of electrons. Due to the ease of supply of sufficient electrons, the condition of creating an inversion layer is satisfied at a lower threshold voltage (V th ) than that expected in the nFET. Meanwhile, the negative gate voltage switches the direction of the dipoles in the doped HfO 2 and negatively polarized charges induced near the channel, pushing electrons away. As the V th is shifted in the positive direction, the memory window in V th is exhibited. As the polycrystalline-doped HfO 2 comprised multiple ferroelectric domains, it was possible to be partially polarized, enabling finetuned threshold voltages, as shown in Figure 5a. [88,89,94,95] Consequently, continuous channel conductance can be extracted from diverse traces of I ds -V g of the FeFET. For the FeFET-based synapse, three available pulse schemes were evaluated (Figure 5b). The identical pulse showed the gradual potentiation, whereas several states were only achieved due to a significant drop in the depression. By modulating the pulse width, the nonlinear response in the depression was improved. This is because the long pulse sufficiently converts the dipoles in the domain. Instead, the amplitude modulation scheme at a given pulse width of 50 ns increased the amount of the switched domains each time the pulse was applied, exhibiting the highest states (5-bit) and its symmetry. Due to the uniform synaptic behavior operated at a high speed, the neuromorphic system's performance indirectly verified by circuit-level macro simulators was discovered to have better accuracy of %90% on the MNIST and faster latency than other emerging memory-based synaptic candidates.
It has also been presented that the ferroelectric layer was implemented into more advanced transistor structures such as finFET [90] and nanowire FET [91] (Figure 5c). Interestingly, both scaled FeFET-based synaptic devices seemed to have analogous conductance controlled by the identical pulse train. However, the linearity deviated from the ideal trajectory, which caused a reduced accuracy of %80% compared with that of the planar FeFET-based synapse (see Table 3).  [77] www.advancedsciencenews.com www.advintellsyst.com The ferroelectric capacitor alternatively denoted as ferroelectric tunnel junction (FTJ) was also used as a stand-alone memory. [92,93,96] Unlike the FeFET-based synapse, where the conductance from the source to drain is adjusted by polarization change occurring in the ferroelectric oxide between the gate and silicon substrate, conductance through the FTJ is directly affected by up or down direction of the dipole. The conductance of the FTJ is transmitted smoothly by the identical pulse. Due to the simple structure, the FTJ-based synapse can be integrated into a 3D vertical NAND structure, where the FTJ is formed on the sidewall (Figure 5d). [92] Inherently low conductance range from 1 to 3 nS of the usual FTJ can slow down the system during the read operation. However, the neuromorphic systems usually sense the weighted sum of the multiple FTJs. Thus, the weight mapping and array size must be carefully designed to calculate the proper output current that does not affect the speed read by peripheral circuitries.
Meanwhile, MRAM utilizes the orientation of the spin rotated by the direct voltage or magnetic field of magnetic metal electrodes placed on either side of a thin tunneling oxide, which is a magnetic tunnel junction (MTJ) structure. Due to the achievement of only two HRS and LRS in the MRAM driven by spin-transfer torque (STT), implementation is expected for limited neuromorphic systems that routinely perform inference on the small-sized input data by adopting binary neural network (BNN) algorithms, where the weights were quantized and binary. [97,98] The digital state can be further extended while the multiple MTJs are stacked. [99] Recently, a new writing  [88] Copyright 2017, IEEE. (b) Reproduced with permission. [89] Copyright 2018, IOP Publishing. (c) top) Reproduced with permission. [90] Copyright 2018, IEEE. bottom) Reproduced with permission. [91] Copyright 2018, IEEE. (d) top) Reproduced with permission. [92] Copyright 2018, Royal Society of Chemistry. bottom) Reproduced with permission. [93] Copyright 2019, IEEE.
www.advancedsciencenews.com www.advintellsyst.com mechanism called spin-orbit torque (SOT) has been suggested by passing the write current through an additional in-plane SOT layer, typically composed of heavy metals such as Pt and Ta. [100][101][102] The current flowing via the SOT layer creates a spin current in the vertical direction, where the MTJ is located, due to spin Hall effect. The resistance state of the MTJ can be fine-tuned because the write current flows through the low-resistance heavy metal to generate spin-orbit coupling (see Table 4). However, as the range of achievable resistance is small, the MRAM has been studied primarily as memory applications so far. It is a preliminary stage for the synaptic applications, so a lot of parts need to be further studied.

Novel Hybrid Synaptic Configuration
To date, several studies have attempted to improve the linear response of the conductance as a function of voltage and identical pulse train in the analog emerging memories for weighted sum and weight update operations, respectively. The absolutely small dynamic conductance step adversely becomes the most significant problem to have considerable effect on the accuracy of the hardware level beyond unit device improvement. Therefore, to compensate the imperfect synaptic devices, hybrid synaptic configuration has been proposed, as shown in Figure 6a. [103,105] The purpose of the configuration is to subdivide the role in the training, thus relaxing stringent demands to be satisfied by a single synaptic device. Depending on the numerical importance in the neuromorphic systems, two pairs of conductance elements were newly configured to be a single synaptic element as follows where F defines a significant factor that indicates the numerical significance of the weight, G þ and G À denote the normal conductance values of the higher significant conductance (HSC) pair, and g þ and g À represent newly introduced conductance values of the lower significance conductance (LSC) pair. It has been recently discovered that the use of capacitors can result in significantly linear conductance update (Figure 6b). [104] More specifically, the capacitor based synaptic configuration comprised three parts: 1) a readout FET connected to the capacitor, 2) a p-type FET (pFET), and 3) an n-type FET (nFET) for adding and subtracting charge to the capacitor, indicated in 3T-1C configuration. The charge on the capacitor represented the synaptic weight, and it was elaborately varied by the gate voltage for charging and discharging to the capacitor node. However, intrinsic volatile    [102] a) The value was shown in the form of resistance; b) The value was shown in the form of Hall resistance.
www.advancedsciencenews.com www.advintellsyst.com properties of the capacitor, which take natural decay of the charge, should be refreshed periodically. Through exploiting the benefit of the linearly updated synaptic weight even for a short duration, the volatile component was defined to be LSC (g þ and g À ). In other words, during training, only the LSC pair was updated linearly and bidirectionally. The trained weights Figure 6. a) A pair of two nonvolatile memories and two volatile capacitors for representing synaptic weight. Reproduced with permission. [103] Copyright 2018, AIP Publishing. b) An important step in this hybrid configuration is to realize a perfectly linear conductance update through the capacitor and then transfer the updated conductance to the nonvolatile components. Reproduced with permission. [104] Copyright 2018, IEEE. c) This resulted in obtaining the hardware accuracy equivalent to the software when recognizing from relatively simple MNIST database to a more complex CIFAR-10/100. Reproduced with permission. [105] Copyright 2018, Springer Nature.
www.advancedsciencenews.com www.advintellsyst.com were thereafter transferred to the nonvolatile PCM devices serving as HSC (G þ and G À ), so that the weights could be stored for a long time. Consequently, the 3T-1C and 2 PCMs represented the weight. This approach enabled software-comparable hardware performance with accuracy of %98% and 88% for the MNIST and Canadian Institute for Advanced Research (CIFAR)-10, respectively, as shown in Figure 6c.
As an extended concept, the role of the volatile capacitor component was replaced by a nonvolatile FeFET device, thereby saving the area and power substantially. [106] The pFET and nFET were used for a similar purpose of providing and distracting charges, but the degree of the charge accumulation proportional to the gate automatically affected the polarization in the FeFET. Gate voltage was applied to update the weight induced through the polarization; thus, the update was automatically encoded at the FeFET. This simplified process in the two-source transistors and one-FeFET (2T-1F) can eliminate leakage concern due to the nonvolatility and minimal device area occupied by the capacitors and 2 PCMs. These hybrid synapses were expected to exhibit better training accuracy at the expense of the area.

Neuromorphic Neuronal Device
The cross-point array of densely arranged analog synapses used for producing VMM results for the inputs discussed in this article represents one of the layers of the neural networks. Identification of the outcome and communications between the arrays is typically conducted via the silicon-based CMOS neuron circuit by converting the weighted sum, in analog manner, to digital bits or spikes. It is discovered that a crucial function of the neuronal device is to turn on and off depending on the inputs, similar to the switch element. Fortunately, the selector serving as the two-terminal switch has been intensively studied and developed for constructing large memory arrays and stacking them in three dimensions. [107] Based on a particular V th , the current difference of off-state (R off ) and on-state (R on ) of the selector occurred because of several orders of magnitude. This threshold-switching behavior has been demonstrated in Mott insulators such as VO 2 and NbO 2 that are driven using a metal-insulator transition (MIT) mechanism. [108] Various binary, ternary, and quaternary chalcogenide systems also exhibited the current response known as ovonic threshold switching (OTS) due to a lone-pair electrons in the chalcogen atoms. [109] 3.1. Threshold Switching for Integrate and Fire Neurons

Selector with Capacitor for Neuron
When specific conditions are met, the selector supplies high current temporarily, thereby serving as the CMOS circuits in the neuron designed for the fire function. The input spikes, which are related to the amplitude of the weighted sum, were consequently addressed, and charging and discharging at the capacitor occurred repeatedly. No output response was initially detected until the capacitor was completely charged. When the charged voltage of the selector reached the V th , spike currents began to be generated, as shown in Figure 7a. [110] The rate of output spike generation can increase and decrease by modulating either input pulse interval or device parameters of the threshold switch such as R off , R on , and V th related to how sensitively charging and discharging is performed at the given input conditions.
The threshold-switching behavior can be also realized in the aforementioned CBRAM. [112] When the amount of the sources such as Cu (or Ag) ions constituting the filament or the repulsive force between each ion inside the filament was respectively limited or increased, the resultant instability of the filament promoted spontaneous dissolution. This resulted in the CBRAM returning to the off-state when the applied voltage was removed. Using the volatile memory with a Pt/Ag/Ag: SiO x /Ag/Pt structure, [111] the output spike generation adjusted by the spacing and amplitude of the input signals was demonstrated. In addition to the unit neuronal element, a prototype of fully integrated emerging devices based on neuromorphic systems showing the interactions between the nonvolatile RRAM-based synapses and the volatile RRAM-based neurons with capacitors were demonstrated experimentally (Figure 7b). To perform an inference task on letter patterns, the synaptic weights were pre-encoded in the 1T-1R device with the Pd/HfO 2 /Ta structure as we discussed earlier (Figure 3a). For each pixel of the pattern, different amplitudes of the input voltage were given and fed into the synapse array. The VMM results were concurrently filtered, activating corresponding neuron properties.

Capacitor-Less Neuron Design Exploiting Selector
For accumulation that dynamically tracks history of the addressed input signals, the selector-based neuron seems to inherently require the capacitor. Attempts have been made to get rid of the capacitor, and the idea here is to deliberately make the selector devices vulnerable to the external stress using glassy materials for the volatile memory [113] or by strengthening the Joule heating mechanism on the VO 2 -based selector. [114] Even when a voltage below V th was applied to the selector, the input pulses were stimulated to migrate the ions to form the filament in the volatile memory or induce the phase transition in the selector. This continued to steadily lower the R off and eventually turned on the selector, which implied that a single selector could emulate both integrate and fire behaviors. The degree of sensitivity of the accumulation of the damage under stress in the selector with weak immunity determines the integration and timing of fire in this capacitor-less neuron design.
Meanwhile, the progressive evolution of the HRS and its reach to the LRS in the nonvolatile PCM (Figure 8a) [115] and FeFET (Figure 8b) [116] observed during the potentiation have achieved integration and fire functions. However, at the cost of the nonvolatility of the memories, the reset process to restore the initial state corresponding to the HRS for the next neuronal function should be processed periodically with additional circuitry. Accordingly, the MRAM has been proposed as an alternative. [117] The binary resistance of the MRAM was reversibly changed through spin-torque transfer. However, during normal operation, the LRS unexpectedly returned to the HRS due to a backhopping oscillation, which was considered as one of the failure mechanisms, as shown in Figure 8c. Therefore, the switching on www.advancedsciencenews.com www.advintellsyst.com Reproduced with permission. [111] Copyright 2018, Springer Nature.
www.advancedsciencenews.com www.advintellsyst.com and off was regularly observed at the specified pulse. Although the obtained frequency of the on/off switching was a stochastic and probabilistic, frequency was discovered to be proportional to the output current intensity. The 4-bit precision that can be distinguished by the MRAM-based neuron without a capacitor and reset circuit achieved an accuracy of 82% to be obtained for the CIFAR-10 image recognition.

Threshold Switching for Oscillation Neurons
The fired output can be represented in different ways. When the NbO 2 -based selector was connected to a load resistor, where the load resistance (R load ) is in between R off and R on of the selector (i.e., R off > R load > R on ), in a voltage divider configuration, an oscillation in voltage was monitored in real time. [118][119][120][121] Most of the voltage was initially applied to the selector because the R off of the selector was greater than the R load . As the charged voltage at the selector exceeded the V th , the off-state of the selector was rapidly switched to the on-state. Because the R on of the selector was now lowered, the voltage began to discharge until the voltage remaining on the selector reached a hold voltage (V hold ), which is the minimum driving force required to maintain the on-state, below which the on-state of the selector was promptly switched to the off-state. The reversible transition of the selector repeatedly induced the back-and-forth voltage charging and discharging, causing an oscillation with a specific frequency in the range of V hold and V th . Taking one step forward based on the single oscillation neuron with an off-ship discrete load resistor, an 1D 12 Â 1 crossbar array that structurally resembles a column of the weight matrix, where one neuron is connected with multiple synapses in parallel for on-chip integration, has been demonstrated, as shown in Figure 9. [122] The single input pulse was delivered to only one of the RRAM-based synapses, and the remaining synapses were floating. The input pulse multiplied by the conductance at the selected RRAM was expected to be observed as a read-out current along the BL at the NbO 2 -based neuron, resulting in an oscillation with a slow frequency of 110 kHz. When more input vectors were loaded into the multiple rows of the synaptic array, larger amounts of the weights were summed along the BL, resulting in a larger FeFET captured the accumulation and fire behavior, additional circuitry was required to return the nonvolatile memories to its original state. c) Meanwhile, the reset circuitry can be eliminated by deliberately exploiting the disadvantages of the MRAM. The stochastically repeated on/off switching in the MRAM showed the rule that switching frequency was statistically proportional to the input voltage.
www.advancedsciencenews.com www.advintellsyst.com read-out current corresponding to the equivalently reduced total resistance. Faster oscillation frequency was measured to be proportional to the analog column current. This compact neuron facilitated the number of synaptic columns shared by one neuron to be reduced, thereby outperforming the conventional silicon neuronal circuit in latency, area, and energy consumption.

Conclusion and Outlook
To perform cognitive and recognition workloads in the most efficient manner, hardware systems that implement neural network algorithms are required. A typical performance metric for computing systems, tera operation per second (TOPS), is extended to account for energy efficiency as TOPS per watt (TOPS/W). Figure 9. The input information can be judged by identifying the frequency of oscillation observed in NbO 2 -based neuron. The integration of the neuron at the edge of the RRAM synaptic array, which converts the weighted sum to the oscillation frequency, was experimentally demonstrated. Reproduced with permission. [122] Copyright 2019, IEEE.
www.advancedsciencenews.com www.advintellsyst.com This study showed that nonvolatile resistive memories and selectors are attractive technologies that not only boost the TOPS/W in the systems up to a few tens of magnitude, which was a sufficient class to be used for recognition in real time but also ensure software-equivalent accuracy on various recognition tasks. Compared to digital SRAM as the binary synapse, the resistive memories stored analog information even in the small cell area. However, for accelerating the neural network computations on the entire neuromorphic system rather than single device, the multiple states of the analog resistive synaptic element needed to be tuned linearly by identical voltage pulses. Therefore, the aim of this review was to address recent progresses and strategies to solve the problem, considering the underlying working principle of each memory candidate. In summary, benchmarking and comparing key performance indicators was shown in Figure 10 to provide design options for building neuromorphic systems. Due to the commercialization of the PCM in the memory field, a solid understanding of physical mechanism and thorough reliability analysis that lead to the development of reliable devices with advanced compensation circuits can continue to expand the possibilities of the PCM for the neuromorphic computing systems.
In addition to DNN, another spiking neural network for the nextgeneration neural network was implemented on PCM-based neuromorphic chips, [123] motivating and highlighting the need and importance of analog synaptic devices. However, the PCM seemed to be far from the ideal synaptic device due to the limited achievable conductance states and its nonlinear and asymmetric response to the consecutive identical pulses; hence the degradation of the recognition accuracy during the training. This is because the phase-change behavior is very sensitive to the compositions of the chalcogenide material. The composition of wellknown GST materials was the result of optimizing the trade-off relationship between speed and operating current, making it difficult to modify the composition and materials to improve synaptic properties. Therefore, the studies have primarily attempted to subdivide the synaptic components such as arranging two PCM devices and adding 3T-1C devices to offset the shortcomings of unit PCM. The RRAM, which can operate at a lower operating current than the PCM and can be scaled at 10 nm, has been extensively studied for the synapses and has reasonably satisfied most requirements. In addition to achieving linear and symmetric weight update through innovative material and device engineering, defect-tolerant algorithms and circuitries have been developed to evaluate the reliability of each state and various failure modes. For memory applications, the range from 1 to 10 μA was the preferred operating current required for unit RRAM device considering the array size and sensing speed. Meanwhile, for neuromorphic VMM accelerators, most RRAM devices in the column may be required for simultaneous reading depending on the input vector in the worst case. Thus, the number of the RRAM devices placed in the column is related to the quantity of current that the external drive and sense circuits can handle, constraining the maximum allowable array size. In addition, it should be considered that a reduced current level of the RRAM distorted I-V linearity induces an actual weighted sum current that is lower than expected, causing inference error. The most neuromorphic test-chips with peripheral circuits have been demonstrated with the PCM and RRAM synapses with 1T-1R configurations. The three-terminal transistor will eventually be replaced by the two-terminal selector depending on the applications. The area improvement is clear with the introduction of the selector, but conductance linearity as a function of voltage sweep and pulse for weighted sum and weight update, respectively, can be affected. [66] The increase in the operating voltage in the 1S-1R synapse due to the additional selector needs to be optimized while considering the operating power consumptions. The ECRAM that utilizes the ion transport across the entire area, not locally, is still in the early stages of research. The lateral conductance states can be maximally achieved because the ions provided vertically were sophistically controlled from the gate in the ECRAM. However, the dynamic range of the conductance extracted from minimum and maximum levels was low. Even at the expense of the occupied area loss, the nearly perfect synaptic behaviors of the ECRAM was attractive to be used as synaptic elements dedicated for on-chip training. The slow speed to drive the ions and uncertain reliability issues that can be affected by scaling need systematic further investigation through a deep understanding and linking of each role of the selected ions. The use of the ferroelectric polarization mechanism rather than ion-migration-enabled reliable conductance of the FeFET synapses to be controlled symmetrically and promptly. Nevertheless, the conductance related to the number of ferroelectric domains that are rotated in the device and updated by energy-and area-inefficient nonidentical pulse scheme. The variability, which is one of the noticeable reliability issues in the other resistive synaptic devices, is significantly low, but the retention and endurance of the multilevel conductance should be further verified. To date, the synaptic properties have been evaluated in the usual FeFET fabricated for memory applications. Specific engineering methodologies aimed at neuromorphic applications leave design spaces to allow for conductance update in the ferroelectric materials through the identical pulse. Device-level studies on the FeFET-based synapses have been improved in recent years, but it is noteworthy that simulation modeling that accurately describes the physical ferroelectric behavior and matches the experimental results is wellestablished. [124] Design exploration for kernel operation of convolutional neural networks and DNN accelerators based on simulated FeFET devices has been extensively studied to pioneer more diverse and appropriate options for using the FeFET synaptic elements. [125] For the MRAM with the highest maturity among www.advancedsciencenews.com www.advintellsyst.com the emerging memory technologies from manufacturing process and physics perspectives, the analogous behavior beyond reliable binary state has been observed by adopting a new writing mechanism called SOT. However, application flexibility is expected to be low because it is difficult to control the current range that can be obtained and the small on/off ratio (%2Â). Using different types of resistive memories and conventional devices in a hybrid configuration is considered the fastest way to implement fully functional neuromorphic systems compared with developing a single universal memory to perfectly satisfy all the tough criteria. This approach complemented the drawbacks of each memory with other devices, relaxing and alleviating requirements of synaptic devices. It also increases the degree of the freedom to use certain resistive memories that exhibit particularly prominent features such as excellent linearity of the conductance within a very short duration. For ultimate parallel computing systems, to process what is computed at the synapse in neuron, preference is given for the implementation of the devices with the same size as the BL of the synapse array. Utilizing the capability to provide instantaneous current by the selector-based compact neurons enabled effective classification of the analog weighted sum current based on integrate-and-fire or oscillation frequency modulation technique. By precisely fine-tuning the ion migration and phase transition to have multiple states of the nonvolatile PCM and RRAM for the analog synapse, and intentionally enhancing the volatility of the memory for the neuron, all emerging memory-based neuromorphic systems have been reported.
Several aspects of the implementation and utilization of the neuromorphic hardware have remained unexplored. Hence, important features of the synaptic and neuronal devices may differ from speed, energy, and capacity perspectives depending on the applications ranging from cloud, fog, and edge computing. In particular, unlike the conventional silicon transistors, in which performances have been improved primarily by geometrical scaling and cell design, the synaptic and reliability characteristics of each emerging device are strongly related to the materials used. Further, we believe that unconventional computing platforms are not limited to emerging device technologies, and it can be realized by CMOS and new devices integrated systems. [126] Mixed CMOS-emerging memories hardware can make cognitive tasks more efficient, and will be an intermediate step before ultimately implementing future computing systems implemented entirely with non-CMOS devices. Therefore, it is hoped that the findings and approaches discussed in this article will be a stepping stone toward significant technological advances that can lead to social change beyond building neuromorphic hardware systems.