Parallel Operation of Self-Limited Analog Programming for Fast Array-Level Weight Programming and Update

Memristive neural networks perform vector matrix multiplication ef ﬁ ciently, which is used for the accelerator of neuromorphic computing. To train the memristor cells in a memristive neural network, the analog conductance state of the memristor should be programmed in parallel; otherwise, the resulting long training time can limit the size of the neural network. Herein, a novel parallel programming method using the self-limited analog switching behavior of the memristor is proposed. A Pt/Ti:NbO x /NbO x /TiN charge trap memristor device for the programming demonstration is utilized


Introduction
The use of memristive neural networks for neuromorphic computing has attracted enormous interest recently. [1][2][3][4][5][6][7][8][9][10][11] In memristive neural networks, memristor crossbar arrays are used as artificial neural networks, where the conductance of the memristor at each crosspoint refers to the weight value of the synapse. To mimic synaptic behaviors such as long-term potentiation (LTP) and long-term depression (LTD), or spike timing-dependent plasticity (STDP), the memristor should have analog conductance switching characteristics. [12][13][14] Self-rectification is also highly desirable to eliminate sneak current-associated issues, such as programming and sensing disturbances, and higher energy waste while driving the crossbar array device. In this regard, a charge trap memristor, whose structure and operating principle are similar to those of a charge trap flash memory, is very suitable for memristive neural networks because of its low current analog switching and self-rectifying characteristics. [15][16][17][18][19] Also, its forming-free characteristic can eliminate the stochastic nature of the forming process, and thus lead to higher uniformity, along with simpler device operation. [20][21][22][23] Authors have reported self-limited analog programming (SAP) method that can be used to program the analog data in the charge trap memristor efficiently. [24] In the SAP method, the memristor (M) is initially in a high-resistance state (HRS) and serially connected to a variable resistor (R S ). Once a programming voltage (V PGM ) is applied in the M-R S configuration, the conductance of the memristor gradually increases and the voltage potential on the memristor (V M ) decreases accordingly, due to the voltage divider effect produced by the constant R S . [25][26][27] Eventually, the memristor conductance is saturated at a specific value and the self-limited set switching is achieved; that is, the R S limits the total current, such as a compliance current. Then, by appropriately controlling the resistance of R S , the saturated conductance state of M can be controlled. The SAP method, which allows the V PGM to be fixed and changes the R S, allows faster and more accurate programming of the analog states than the conventional programming method, which changes the amplitude of V PGM or changes the number of pulses.
During the training of the memristive neural network, the weight values of all cells are programmed dynamically and actively. In the conventional programming method, each cell must be programmed sequentially with a sequence of writeand-verify steps. [5,[28][29][30] In that approach, the total weight programming (or updating) time is proportional to the array size. Thus, reducing programming time is a significant issue in a high-density memristive neural network. The parallel programming capability of the crossbar array can resolve this problem.
Gao and co-workers proposed a parallel write-and-read scheme for a memristive crossbar array and showed that switching speed could be theoretically improved by 30 times at an array size of 100 Â 300. [31] Although the study showed the parallel programming method could reduce programming time, many unresolved issues remain before the parallel programming method that can be implemented in a practical system. For example, the variation of the LTP and LTD, and sneak currents needs to be considered in the sizeable array-level analysis. Also, the method should be able to compensate for uneven weight update behavior, which depends on the previous states during the training process, due to the nonlinear and nonsymmetric behavior of the LTP and LTD of the synaptic device. Furthermore, the method needs to work in the presence of cycle-to-cycle and device-todevice variations. Otherwise, it will not be possible to realize a high-performance memristive neural network with sufficient programming accuracy and efficiency.
In this study, we show the parallel operation capability of the SAP method (which we designate SAPP, or SAP in parallel) in a crossbar array utilizing a Pt/Ti:NbO x /NbO x /TiN charge trap memristor ( Figure S1, Supporting Information). We developed a novel biasing scheme for the SAPP and its optimized operation algorithm, and examined how SAPP reduced programming time for various array sizes and number of analog levels. We used the optimized SAPP method in the on-chip training of a convolutional neural network (CNN) and evaluated the improvement in its programming efficiency.

SAP in Parallel
The SAP method performs data programming sequentially; therefore, the total programming time is proportional to the array size ( Figure S3, Supporting Information). Considering the high frequency of weight update events during the training, the increase in programming time at larger array sizes will eventually degrade the high-performance neuromorphic computing.
This programming speed issue can be resolved by performing the SAP operation in parallel (SAPP). Figure 1a shows a crossbar array layout and biasing conditions for SAPP to program multiple selected cells along the selected word (vertical) line to the target conductance state. Herein, the selected word line in the memristor array is biased to a programming voltage (V P ), the selected word lines in the resistor array are grounded (V G ), and all unselected word lines are floated (F), which are identical to the biasing conditions used for the SAP operation. The only difference between SAPP and SAP operation is the biasing condition of the bit (horizontal) lines. For parallel operation, the multiple bit lines connected to the multiple target cells to be programmed are floated together. In Figure 1a, for example, two cells (yellow dashed squares) are selected. Other unselected bit lines are biased to ½ V P . Figure 1b shows an equivalent circuit of Figure 1a. In the circuit, two individual SAP units comprising one memristor (M) and one selected resistor (R Sel ) are present in parallel, and their middle nodes are connected through parasitic unselected resistor (R Unsel ) components. In this configuration, if the initial resistance of the two memristors and the selected R Sel are identical, the potential difference between the two middle nodes is zero. Then, the SAPP operation can be understood as multiple independent SAP operations at once.
However, in the practical case, the initial state of M can be variable. Therefore, we examined the effect of variation in the initial state on the final programmed state during SAPP www.advancedsciencenews.com www.advintellsyst.com operation. Figure 1c shows a SAPP simulation setting where eight cells with all different initial states (from state 1, the lowest conductance state, to state 8, the highest conductance state) are to be programmed to state 8. Figure 1d shows the conductance change of each cell as a function of programming time. The dynamic conductance change characteristic was simulated based on the pulse height-dependent potentiation data of the device ( Figure S2, Supporting Information). The inset enlarges the moment of the end of programming. Figure 1e shows the change in the memristor potential (V M ) on all memristor cells as a function of programming time. At the early time, when the initial conductance is lower, the applied voltage is higher. As a result, the lower conductance state can be quickly switched to the higher conductance state, which reduces the conductance variation over time. After a saturation time of 0.193 ms, the conductance of all cells approaches state 8, with variation reduced as low as 1%. The results confirm that the final states programmed by the SAPP operation are independent of the initial state. In the SAPP operation, all of the selected cells can be programmed together regardless of the initial states as long as the programming time is long enough. Figure 1f shows the total programming time needed to program eight cells to the target state in parallel, depending on the initial states of the cells. It shows that programming from the lowest conductance state (state 1) to the highest conductance state (state 8) takes the longest time. The experiments confirmed the simulated SAPP operation. For the experiments, an 8 Â 8 crossbar array of NbO x -based charge trap memristor was prepared. Then, eight cells along the first column were programmed to eight discrete states from the lowest to the highest conductance state by the SAP operation. The inset of Figure 1g shows the top-view image of the device. The crosssectional area of the device was 5 μm Â 5 μm. Figure 1g shows the currents read at 3.5 V before (initial states) and after the SAPP operation (final states). As the simulation showed, all cells were programmed to the target state successfully in parallel.

Incremental Conductance Programming Algorithm for SAPP
The results in Figure 1f indicates that the challenge of fast programming is programming the highest conductance state as quickly as possible. To accomplish that, we developed a programming algorithm that partially programs the higher conductance states during programming of the lower conductance state. We call this i-SAPP, an incremental conductance programming algorithm applied to the SAPP operation. Figure 2a shows the baseline task used for the performance comparison of the programming methods, where the size of the memristor array is identical to Figure 1c, and the eight cells along the shared word line are initially programmed to state 1, and are to be programmed to eight different states. Figure 2b compares the programming algorithms of SAPP method and i-SAPP method. In both algorithms, the programming operation starts from the initialization step that resetting all cells to state 1, the lowest conductance state. Then, the first programming step performs programming of the selected cells to the next conductance state, i.e., state 2. At this stage, the SAPP method only selects cell #2 to program it to state 2, whereas the i-SAPP method selects all of the cells that will be at a higher www.advancedsciencenews.com www.advintellsyst.com conducting state than state 2, i.e., from cell #2 to cell #8. Then, programming of the next conductance state, i.e., state 3, starts from state 1 by selecting only cell #3 in the SAPP method, whereas it starts from state 2 by selecting the cells that will be higher conducting states than state 3, i.e., cell #3 to cell #8. Figure 2c shows the estimated working time for executing the baseline task in Figure 2a using the SAPP method, where the programming of each state is executed sequentially. The SAPP method is sequential so that the next programming starts as soon as the previous programming ends; it took 1.085 ms in total to complete the task. Figure 2d shows the estimated working time using the i-SAPP method to program eight states as described previously. Here, each programming sequence starts from the previous states, and the total working time was 0.699 ms, which is about 35.6% faster than the SAPP method. Figure 2e shows the total time required to program cells on a single word line as a function of the number of bit lines. The i-SAPP algorithm shows a constant programming time, regardless of the number of cells. Therefore, the benefit of the i-SAPP method increases as the array size increases. Meanwhile, interconnect wire resistance may affect the i-SAPP operation.
Considering a typical resistivity of metal wires (ρ ¼ %10 À7 Ω m) and a feature size (F ¼ %10 nm) in modern technology, the wire resistance (R w ) between the nearest cells is %10 Ω. The R w value is negligible compared with the resistance range (10 10 -10 11 Ω) of the charge trap memristor. The simulation results confirmed the immunity of the i-SAPP operation against the wire resistance ( Figure S6, Supporting Information).

Influence of Sneak Currents on i-SAPP
In array programming, sneak currents can affect programming time. Figure 3a shows a crossbar array layout composed of an (n Â m) memristor array and an (n Â l) resistor array, where n is the number of rows in the array, and m and l are the number of columns in the memristor and resistor array, respectively. The l also refers to the programmable number of analog states. Here, two leakage current paths are shown: one is from the memristor array (path 1, a blue arrow) and the other one is from the resistor array (path 2, a green arrow). Here, the former path can be negligible because the memristor possesses a self-rectifying characteristic as high as 10 6 at the reading voltage. However, the sneak current from the resistor array (path 2) is crucial as there is no selection device with the resistors. Figure 3b shows the self-limited saturation behavior for programming various states as a function of programming time considering (lines) and neglecting (dashed lines) the sneak currents. Here, the sizes of the memristor and resistor arrays are 8 Â 8 and 8 Â 8, respectively. We define the programming time (t PGM ) as the time until the conductance of M reaches 99% of the target state. The square dots indicate the t PGM , the endpoint of the programming. The saturation characteristic confirms that the SAP method works well even in the presence of sneak currents. Figure 3c shows the change in V M as a function of time with and without the R Unsel components, for programming eight conductance states. When the sneak currents are present, it lowers the V M and the conductance change slows. As a result, the t PGM www.advancedsciencenews.com www.advintellsyst.com grows while programming the same target state. Figure 3d,e show the t PGM while programming eight cells to different states using the SAP and i-SAPP operations, as functions of the number of words and bit lines of the memristor array. In both the SAP and i-SAPP cases, the increase in the number of bit lines has a greater effect on the increase in t PGM than the increase in the number of word lines, because the primary sneak current paths originate from the unselected bit lines biased to 1/2 V P . Figure 3f shows the influence of the sneak currents on the total t PGM using the SAP and i-SAPP operations, where the sizes of the memristor and resistor arrays are 8 Â 8 and 8 Â 8, respectively. After considering the sneak currents, the programming times for both SAP and i-SAPP increased by 25% and 13%, respectively, which suggests that the i-SAPP operation has more tolerance against the sneak current issue. This is because, during i-SAPP operation, multiple cells are selected together so that the number of sneak paths from unselected cells can be reduced. Considering such sneak current tolerance of the i-SAPP operation, the advantage of the i-SAPP operation is effective regardless of the array size.

Application of i-SAPP to a CNN
The i-SAPP method can reduce the weight update time, so it allows faster on-chip training of the neural network. During the on-chip training process, the weight values (W ) of the neural network should be updated for every training iteration. The weight update can be performed in two ways: by reprogramming (initialization followed by programming) and by differential updating. Figure 4a schematically shows the reprogramming and differential updating methods for modifying the weight values. In reprogramming, the memristor cell is initialized to the initial state by reset switching, and then reprogrammed to the target state. In differential updating, the target state is attained directly from the previous state. If the weight differential is a positive value (i.e., the new weight value is higher than the old one), the i-SAPP method can be used. If the weight differential is negative, the i-SAPP cannot be used, and the reprogramming method should be used. Considering the one-directional weight update capability of the differential updating method, a dual memristor system was adopted, where two memristors, representing the positive and negative values of the weight, respectively, are combined to define one synapse. [32][33][34] Figure 4b shows a conductance diamond map where the G þ and G À axes refer to the conductance of the two coupled memristors. Here, the vertical component of the sum of two vectors represents the weight value of one synapse. The blue and orange arrows represent the positive and negative weight update vectors, respectively. The figure also provides an example of weight value changes during neural network training. For the positive (or negative) weight update, the G þ (or G À ) memristor is potentiated. In this method, repeating the weight update process will eventually lead to the highest conductance state (G max ) of the memristors. Then, no further update is possible, and the effective conductance range decreases, which will deteriorate the neural network performance. Accordingly, in the dual memristor system, reprogramming of both cells should be performed regularly as per a fixed number of iterations during training. We define this as the reprogramming interval. Figure 4. Memristive CNN simulation for MNIST data recognition. a) A schematic illustration of reprogramming and differential updating methods for tuning the memristor conductance state. Reprogramming method includes an initialization step (reset step) and a programming step. b) Conductance diamond map representing a weight value of synapse by two coupled memristors (G þ and G À ). The G þ and G À axes refer to the conductance of two coupled memristors, and the y-axis vector components of the sum of G þ and G À vectors represent the weight value. c) The hierarchy of the CNN. Feature extraction (left panel) sequences are executed by software-based digital computing, and classification sequences are done by memristive hardware-based simulation. d) The flow chart of the CNN learning process.
www.advancedsciencenews.com www.advintellsyst.com The i-SAPP programming method was applied to a softwarebased neural network emulator, and the reduction in programming speed was evaluated. In this study, we took a CNN with a transfer learning method as a demonstration platform, and the training and inference test was performed using the MNIST dataset. [35][36][37] We also performed the same simulation with a multilayer perceptron (MLP); those results can be found in Figure S5, Supporting Information.
In the CNN, the convoluted data size was greater than the original data size. Thus, its array requires more bit lines than the MLP, making the parallel programming method more beneficial. Figure 4c shows the hierarchy of the CNN. In the CNN, the input data with a 28 Â 28 size were delivered to a pretrained convolutional layer composed of 20 kernels of 9 Â 9 size with a valid convolution and one stride. The output data from the convolutional layer, called a feature map, have a 20 Â 20 Â 20 volume, and delivered max pooling layers of 2 Â 2 size for downsampling. Then, a 2000 Â 1 sized downsampled feature map is delivered to a fully connected layer in which a vector matrix multiplication (VMM) is executed in a 2000 Â 10 memristor crossbar array. Figure 4d shows a flow chart of the CNN training process. Pretrained convolution, pooling, and activation function were executed by digital computing, whereas the VMM was executed in the hardware-mimicking memristor crossbar. Figure 5a shows MNIST data recognition accuracy as a function of the number of conductance states of the memristor. At eight conductance levels, the accuracy was 98.25%, which was slightly lower than the ideal value, 98.74%, due to the 5% intrinsic programming error rate. Figure 5b shows the training time for the SAP and i-SAPP methods as a function of the reprogramming interval. During the simulation, the programming time considered the sneak currents, as shown in Figure 3. Note that the presence of the sneak current does not degrade the programming accuracy, but it increases the programming time. Therefore, reducing the programming time is crucial at the given sneak currents. Overall, the total programming time was significantly reduced by adopting the i-SAPP method by a factor of 1/130 to 1/20 depending on the reprogramming interval. Figure 5c shows the recognition accuracy as a function of the reprogramming interval. As the reprogram interval becomes smaller, the total training time is slower, but the recognition accuracy is higher. The decrease in the accuracy is due to the increasing number of the G max cells, which do not allow further weight update ( Figure S4, Supporting Information).
Lastly, the influence of cycle-to-cycle or device-to-device variability of the memristive neural network on the i-SAPP operation is investigated to confirm the practical feasibility of the method. Figure 6a shows an example of the cycle-to-cycle variability of the device during the programming. Although the conductance increases linearly in average, a subtle but obvious conductance fluctuation is observed. The device possesses about 5% of the coefficient of variation in conductance (Fig. S1b, Supporting  Information). Figure 6b shows a simulated programming accuracy considering the variability. It shows that the cycle-to-cycle variability does not harm the programming accuracy by the i-SAPP operation. It is because the saturated programming conductance (S PGM ) is determined by the R S irrespective of the device variation. Accordingly, the neural network performances are almost constant independent of the cycle-to-cycle variability. Figure 6d shows the programming characteristic representing the device-to-device variability. Unlike the cycle-to-cycle variability, as shown in Figure 6e, the programming error rate of the i-SAPP operation increases as the device-to-device variability increases. It is because of the nonsaturation of the conductance during the i-SAPP operation, which results in inaccurate programming. Interestingly, despite the high error rate at high device-to-device variability, the neural network performances shown in Figure 6f are not deteriorated. It infers that the programming error can be self-adjusted during the on-chip training process of the neural network, which is the well-known errortolerant characteristic of the neural network.

Conclusion
In this study, we proposed a novel SAP method working in parallel (SAPP) for fast weight updating of a high-density memristive neural network. We also demonstrated the i-SAPP method, which applied an incremental programming algorithm to the SAPP, to maximize the efficiency of the SAPP method. The estimated programming time showed that the SAPP method can perform analog data programming in less time, regardless of the array size. This makes it suitable for programming larger arrays. We applied the i-SAPP method to the CNN and the MLP to estimate programming efficiency and concluded that the total www.advancedsciencenews.com www.advintellsyst.com programming time, and the energy consumed in the training process, could be reduced by 1/130-1/20, and 1/60-1/6, respectively, depending on the reprogramming interval.
In closing, there are some possible questionnaires associated with the practical issues of the device operation. First, the nonideality of the device such as a wire resistance may influence the programming accuracy during the parallel programming operation, considering the required large array size for the deep neural network. In our device, due to the low current operation capability, the wire resistance can be negligible ( Figure S6, Supporting Information). However, for other devices where the wire resistance is not negligible, the parallel programming method may be revised, which can be the next issue of the technology. In addition, the cell-to-cell or device-to-device variability of the device may affect the ideal i-SAPP operation. Our simulation confirmed that those variabilities may affect the programming accuracy of the device. However, they do not harm the performance of the on-chip trained neural network because the weight update process can compensate for the possible errors in previous weight values. In conclusion, there are no drawbacks due to the adoption of the i-SAPP in the memristive neural network.

Supporting Information
Supporting Information is available from the Wiley Online Library or from the author. Figure 6. Influence of the device variation on the i-SAPP method. a) A schematic example of the cycle-to-cycle conductance variation of the device. b) The programming error rate as a function of the cycle-to-cycle variability. The programming error rate of the i-SAPP operation was below 1% regardless of the variability. c) The estimated MNIST recognition accuracy at the MLP and CNN as a function of the cycle-to-cycle variability. The accuracy was almost constant regardless of the variability. d) A schematic representation of the device-to-device conductance variability. Each line with different slopes indicates the average conductance change rate for different devices. e) The programming error rate as a function of the device-to-device variability. f ) The MNIST recognition accuracy at the MLP and CNN as a function of the device-to-device variability. Despite the high error rate, the recognition accuracy was almost constant regardless of the device-to-device variability. It infers that the programming error can be self-adjusted during the on-chip training process of the neural network.