Forming-Free Resistive Switching Memory Crosspoint Arrays for In-Memory Machine Learning

In‐memory computing (IMC) with crosspoint arrays of resistive switching memory (RRAM) has gained wide attention for accelerating machine learning, data analysis, and deep neural networks. By IMC, matrix‐vector multiplication (MVM) can be executed in the crosspoint array in just one step, thus accelerating a broad range of tasks in machine learning and data analytics. However, a key issue for RRAM crosspoint arrays is the forming operation of the memories which limits the stability and accuracy of the conductance state in the memory device. In this work, a hardware implementation of crosspoint array of forming‐free devices for fast, energy‐efficient accelerators of MVM is reported. RRAM devices with a 1.5 nm‐thick HfO2 layer show an initial low resistance without forming and an analogue‐mode programming behavior for high‐accuracy IMC. Accurate hardware MVM is demonstrated by experimental eigenvalue/eigenvector calculation according to the power‐iteration algorithm, with a fast convergence within about ten iterations to the correct solution. Deflation technique and principal component analysis (PCA) enable the classification of the Iris dataset with 98% accuracy compared with floating‐point implementation. These results support forming‐free crosspoint arrays for accelerating advanced machine learning with IMC.

DOI: 10.1002/aisy.202200053 In-memory computing (IMC) with crosspoint arrays of resistive switching memory (RRAM) has gained wide attention for accelerating machine learning, data analysis, and deep neural networks. By IMC, matrix-vector multiplication (MVM) can be executed in the crosspoint array in just one step, thus accelerating a broad range of tasks in machine learning and data analytics. However, a key issue for RRAM crosspoint arrays is the forming operation of the memories which limits the stability and accuracy of the conductance state in the memory device. In this work, a hardware implementation of crosspoint array of forming-free devices for fast, energy-efficient accelerators of MVM is reported. RRAM devices with a 1.5 nm-thick HfO 2 layer show an initial low resistance without forming and an analogue-mode programming behavior for high-accuracy IMC. Accurate hardware MVM is demonstrated by experimental eigenvalue/eigenvector calculation according to the power-iteration algorithm, with a fast convergence within about ten iterations to the correct solution. Deflation technique and principal component analysis (PCA) enable the classification of the Iris dataset with 98% accuracy compared with floating-point implementation. These results support forming-free crosspoint arrays for accelerating advanced machine learning with IMC.
together with strong polycrystallinity of the thin oxide film. The absence of the forming makes these devices ideal candidates for high-density crossbar arrays, together with their analog-grade programmability in a window ranging from 300 nS to 1 mS.
The analogue tuning precision is tested using an 8 Â 8 crossbar to compute several MVMs and power iterations. Finally, a fully memristive architecture is implemented to tackle the Iris dataset classification, mapping the covariance matrix in the RRAM crossbar and extracting the PCs by IMC-based power iteration and deflation. The results show a clustering accuracy comparable with a 64-bit floating point (FP64) processor, with 98% overlap of the projected datasets, thus supporting IMC for high-efficiency, low-power hardware accelerators for machine learning applications.

Device Characterization
RRAM is a two-terminal device where the conductance can be manipulated by externally applied voltage pulses. [1][2][3][4][5][6][7][8][9][10] The RRAM switching mechanism can be explained by the oxide layer being capable of locally changing the oxygen vacancy concentration. [20,21] Metals with high work function (such as Pt or TiN) are usually adopted as Schottky-type bottom electrode materials as they are inert with respect to the oxide interface. [18,20] On the other hand, an active metal with good oxygen affinity, such as Ta, Ti, or Hf, [20][21][22] can act as top electrode material for oxygen scavenging, thus leading to the formation of a thin vacancy-rich oxygen exchange layer. By applying a positive voltage to the top electrode, the oxygen vacancies can migrate and reallocate inside the oxide layer with a consequent change of the electrical properties, where the formed oxygen vacancy-based conductive channel dictates a low-resistance state (LRS). [13,21] The sudden decrease of resistance, known as set transition, is commonly limited by a compliance current to prevent permanent breakdown. [13] The application of a negative voltage induces a vacancy migration toward the opposite direction, thus reducing the conductivity of the channel to a high-resistive state (HRS).
One of the most critical aspects for RRAM is the forming operation, which is the first soft breakdown process, which creates a conductive path in the device. [23,24] The forming voltage is generally proportional to the oxide thickness, [9,22] but also depends on the stoichiometry and the interface with the metallic electrodes. [13,20] Within a crosspoint array, a high forming voltage may cause disturbs to the RRAM [25] devices sharing the same row or column of the selected device. As a result, many efforts have been devoted to reducing the forming voltage, [17][18][19] mainly tuning the oxidized interface between oxide and scavenger metal [20] or changing the oxide stoichiometry. [26] Ideally, one should aim at the elimination of the forming operation, by fabricating the device in the LRS instead of the more common HRS. In a recent work, Wang et al. [27] presented a single forming-free device with initial reset; however, the work only addressed independent devices rather than a full crosspoint array. Prezioso et al. [13] presented RRAM devices with Pt/Al 2 O 3 /TiO 2 /Ti stack where the oxygen concentration of the TiO 2Àx layer was adjusted during the reactive sputtering growth to achieve a forming voltage close to the set voltage. Sharath et al. [28] studied Au/HfO 2 /Ti stack and the effect of Ti doping concentration in the oxide, finding a suitable parameter to lower the forming voltage. Annealing in vacuum [29] or in controlled atmosphere (O 2 , N 2 , or air) [30] has been proposed as solutions to lower the forming voltage amplitude. Eliminating the forming operation in RRAM devices at the level of crosspoint array still remains an open challenge.
We prepared a forming-free RRAM device based on an extremely thin HfO 2 layer, with a thickness of 1.5 nm. HfO 2 is reported to be a stable material fully compatible with standard CMOS processes (for more information see the Experimental Section and Figure S1 in the Supporting Information). [31] Figure 1a shows a scanning electron microscope (SEM) image of an 8 Â 8 crossbar of the forming-free RRAM device. Large bottom electrode lines were adopted to minimize the wire resistance, while RRAM devices were located within an etched hole across a passivation SiO 2 layer, as shown in Figure 1b. To optimize the forming-free RRAM, we conducted extensive tests for increasing HfO 2 thickness, ranging from 7 to 1 nm. Figure 1d shows the measured initial leakage current before forming and the average forming characteristics are collected. As pointed out in previous works, [17][18][19][20] the forming voltage decreases for decreasing oxide thickness, with the thinnest layers showing forming-free characteristics. We explain this behavior as the combination of a substochiometric composition of HfO 2 and a high concentration of defects in the thin film. Also note that atomic force microscope (AFM) results indicate a surface roughness with a round mean square of 1.1 nm, thus comparable with the oxide thickness (see Figure S2 in the Supporting Information). The fabricated crossbar array thus initially showed a variable LRS (see Figure S3a in the Supporting Information).
In contrast to conventional RRAM devices that require initial forming, our forming-free RRAMs can be initialized by a reset operation from the LRS. Figure 1d shows the reset characteristics for all 64 devices of the crosspoint array, with the final HRS having a conductance in the order of few μS (see Figure S3b in the Supporting Information). RRAM devices can be operated in the voltage range between À2 and þ2 V under quasistatic conditions (sweep rate around 1 V s À1 ). The devices show set transition at a voltage around 1 V and reset transition around À0.5 V. During the set transition at positive voltage, the current was maintained below a compliance current I C in the range from 100 μA to 1 mA, which resulted in a negligible impact on the switching variability (see Figure S4 in the Supporting Information). To assess the capability of tuning the device conductance in an analog way, we applied set pulses with increasing I C [1,22] or reset pulses with increasing negative voltage. [22] The first approach, named increasing compliance current program-and-verify (ICCPV) algorithm, is illustrated in Figure 2a, showing the measured I-V curves of the RRAM device at increasing I C . As I C increases, the conductive filament size increases, thus resulting in an increase in the LRS conductance, as shown in Figure 2b. [32][33][34][35] In particular, the LRS conductance increases from 1 μS, corresponding to the HRS, to about 700 μS. The step increase in the LRS conductance is controlled by the I C step adopted in the ICCPV algorithm, which was either 10 or 50 μA, in the figure. A smaller step ensures almost analogue control of the conductance, which is useful to achieve high precision in the crossbar array. The second approach consists of the increasing reset program-and-verify (IRPV) algorithm, where the conductance of the HRS can be increased by increasing the stop voltage V stop along the reset sweep over 4 orders of magnitudes, as shown in Figure 2c,d. The IRPV provides better precision and faster convergence to the target conductance. This can be explained by the reset operation being more gradual than set operation, thanks to the defect migration resulting in a decrease of the field acting as the driving force of migration [19,22] As a result of the more gradual decrease of conductance, IRPV also shows better uniformity with lower statistical cycle-to-cycle variability (see Figure S5 in the Supporting Information). The devices showed excellent properties also with pulsed programming algorithms. The set operation was operated without compliance currents, resulting in an LRS conductance in the range of 600 μS, as reported in Figure S7a and S7b of the Supplementary Information. However, the absence of a compliance or line resistance to limit the current makes the pulsed set operation induce an uncontrolled set or even a permanent short circuit. Constant reset pulses with fixed amplitude and pulse duration ( Figure S7c and S7d of the Supplementary Information) can precisely tune the conductance level of the devices. Pulses shorter than 1 μs displayed the highest precision, although it was not possible to reach the full HRS conductance. The reset transition was more efficient using relatively long pulses; however, the conductance generally showed a saturation effect, thus resulting in poor controllability of conductance. On the other hand, the IRPV technique ( Figure S7e and S7f of the Supplementary Information) showed a tight control of the conductance for both LRS and the HRS. Figure 3a shows the final conductance distribution for an 8 Â 8 crossbar array. Each device in the array was programmed to achieve a linearly decreasing conductance along both the row and column direction. To program the RRAM devices in the array, we applied a single positive sweep with I C ¼ 800 μA to set the device, followed by an IRPV algorithm to reach the desired target with a 10 mV step of V stop , resulting in a maximum error of 3 μS (see Figure S7 in the Supporting Information). To test the accuracy in performing in situ MVM, we applied 0.1 V voltage pulses of 1 ms pulse width to the array rows and collected the column current from the oscilloscope, as shown in Figure 3b. Figure 3c shows a correlation plot of the measured currents as a function of the expected values, indicating an average error of 1.9%. The error with respect to the theoretical output increases with the current, explainable with the IR drop due to the narrow www.advancedsciencenews.com www.advintellsyst.com TE lines with average resistance of 14 Ω (see Figure S8 in the Supporting Information).

In situ MVM and Power Iteration
To further support the MVM capability of our crossbar array, we performed an experiment of power iteration, consisting of repeated MVMs where the current output of a given iteration is converted to a voltage and reapplied as input in the next iteration, for the computation of the principal eigenvector of a coefficient matrix. Power iteration thus allows to assess the propagation of errors, due to IR drop as well as possible noise and drift of the RRAM conductance, during repeated iterations  www.advancedsciencenews.com www.advintellsyst.com of the MVM. [10] As the matrix contains both positive and negative coefficients (see Figure S9a in the Supporting Information), a differential approach is needed to map the negative coefficients with a positive conductance value of the RRAM device. For that purpose, instead of adopting a fully differential scheme, [14] where each positive/negative weight is obtained as the difference between two device conductance values, we adopted a reference-column topology (see Figure S9b in the Supporting Information). [7,8] The power iteration was demonstrated with a 5 Â 5 crossbar array programmed by IRPV between 0 and 300 μS, as shown by the conductance map in Figure 4a. The MVM was then computed in 2 steps, where the first step was used to execute MVM on the first four columns, where weights are shifted to positive values by an offset (Figure 4a), while the second step is used to compute the offset currents only from the fifth column (Figure 4b). More details about the two-step MVM are reported in Figure S10 of the Supporting Information. The net MVM currents were then obtained from the subtraction of the two measurement results in software. The power iteration was then repeated by converting the resulting current vector to the voltage domain and reapplying it to the crosspoint array columns for another two-step MVM operation. Figure 4d shows the resulting currents as a function of the iteration: starting from an initial guess with constant input vector of amplitude 0.1 V, the output rapidly converges to the stationary value in about 12 iterations. The corresponding eigenvalue, which is obtained as the normalization ratio between the output current and the input voltage, also shows a rapid convergence, thus supporting the robustness of the crosspoint array for IMC. Figure 4e shows the correlation plot of the measured four eigenvector components, obtained by averaging the results from the 30 th to the 50 th iteration, as a function of the analytical values, indicating an error of only 2.3%.

In situ PCA
To further support the good accuracy of the forming-free RRAM crossbar array, we demonstrated in situ PCA, [14,15,36] for clustering the Iris dataset. [14] The Iris dataset contains 150 observations of three different species of the Iris genus, namely I. versicolor, I. setosa, and I. virginica, presented in Figure 5a and S11 in the Supporting Information. Each dataset entry records the petal and sepal width and length of a specimen, together with its species acting as classification label. Figure 5b shows the map of measured conductance in the crosspoint array, including the 4 Â 4 covariance matrix of the dataset, shifted by an offset as for the power iteration in Section 3, and two additional rows www.advancedsciencenews.com www.advintellsyst.com and columns for offset correction and for deflation. To execute PCA on the Iris dataset, it is first necessary to extract the first and second principal eigenvectors of the covariance matrix, that can be achieved by a power iteration. The first eigenvector, representing the first principal component (PC1), was characterized as fast convergence to the stationary value in about 6 iterations and a final error of only 1.2% with respect to the analytical value. Then, the found eigenvector was then mapped in the sixth additional row and column in Figure 5b (also displayed in Figure S12 in the Supporting Information) to allow for the computation of the second eigenvector, representing the second principal component (PC2), by the deflation technique. [37] More details about in situ calculation of the second eigenvector by the deflation algorithm are displayed in Figure S13 in the Supporting Information. The second eigenvector was computed with convergence in about 3 iterations with 10.7% of error. The error is larger than that for the first eigenvector, mainly because of the relatively low level of the current close to the instrument sensitivity. By averaging the current over more iterations, the error could be reduced to 5.1%. Figure 5c shows the correlation plot of the extracted eigenvector components as a function of the software results, obtained with double-precision floating point precision, evidencing the good accuracy of our IMC results. To better support the robustness of our IMC approach, we executed power iteration with different initial guesses as input vectors, always normalized to have a maximum amplitude of 0.1 V. In general, the different initial guess may change the number of iterations to converge; however, it does not affect the final accuracy of the computed eigenvector (see Figure S14 in the Supporting Information). The first principal component score, Score(PC1), and the second principal component score, Score(PC2), were than computed by projecting each entry of the dataset by the first and second PCs, respectively, via a scalar product. The computed scores were used for clustering and classification of the Iris dataset, as shown by the plot of Score(PC2) as a function of Score(PC1) in Figure 5d. Our PCs are close to those obtained using software precision, with scores reaching an overlap of 98% between the projected data [38] with the use of the linear regression. In-memory deflation can be applied to highdimensional dataset or dataset having more than two PCs, by repeatedly adding rows to the memory array where the www.advancedsciencenews.com www.advintellsyst.com eigenvectors are stored as they are computed. By appropriately scaling the corresponding row current by the matching eigenvalue, deflation steps can be accumulated in a single operation without the need to reprogram the entire matrix. Nonetheless, errors resulting from imprecise programming of the eigenvector during the storing procedure may accumulate and degrade the accuracy as deeper eigenvectors/PCs are computed. This leads to a trade-off between the number of eigenvectors that can be correctly computed without excessive degradation and the programming accuracy. Though the trade-off is typically problem dependent, previous studies [14] have shown that greedy programming algorithms with σ G /G≃5% can achieve accuracy comparable with floating-point implementations when the first two PCs are computed.

Conclusion
In conclusion, we presented a forming-free RRAM device based on ultrathin switching HfO 2 layer as a promising candidate for large-scale, passive RRAM crossbars. The forming-free operation allows avoidance of large voltages for the forming operation, which might cause disturbs on devices on the same row/column of the selected device. RRAM devices show a tunable analogue programmability which makes them extremely promising for accurate IMC. We show in situ MVM, power iteration for eigenvector computation, and PCA for Iris dataset clustering. The analog architecture reached 98% of accuracy in clustering and recognition, close to FP64, thus supporting the forming-free RRAM array for accurate IMC.

Experimental Section
Device Fabrication: The memristors were fabricated with CMOScompatible processes for BEOL integration, using standard optical lithography, metal deposition, and lift-off processes. See Figure S1 in the Supporting Information for the process flow sketch. SiO 2 layer was grown by chemical vapor deposition to reproduce the passivation layer. 365 nm-UV light Heidelberg MLA100 was used for the lithographic step, together with AZ5214E photoresist for negative processes. The substrate was 70 nm etched with reactive ion etching (RIE) and subsequently filled with 5 nm titanium adhesion layer and 70 nm of platinum for the bottom lines. Acetone-based wet lift-off released the bottom lines. 70 nm SiO 2 grown by CVD acted as the spacer between the electrodes. The small mismatch between the bottom lines and the substrate surface improved the planarity of the oxide spacer. Pad accesses and channels to host the RRAM elements were selectively opened in the oxide spacer via RIE and lithographed protective mask. Final lithography and 10" @ 100 W O 2 plasma cleaning prepared the sample for the oxide and the top electrode. Cleaning process was a critical step to ensure a suitable interface between platinum and oxide, removing organic residual in the oxide spacer channel. HfO 2 functional oxide and titanium top electrodes were e-beam evaporated (EVATEK BAK640) without breaking the vacuum with chamber pressure lower than 3 Â 10 À6 mbar. Oxide evaporation rate was kept at 0.02 nm s À1 with maximum chamber pressure of 10 À5 mbar. Thin gold layer improved the bonding process.
Electrical Characterization: All the DC electrical characterizations and the programming procedures were carried out at room temperature using Agilent HP4156C parameter analyzer. For the program and Verify algorithms, the conductance values were extracted with 100 mV reading sweeps. MVM and PCA experiments were performed using two AimTTi TGA12104 arbitrary waveform generators with 3 mV resolution and a Tektronix MSO58 oscilloscope with 600 nA current resolution @ 50 Ω load. The device was measured in a probe station. The error in the accuracy was calculated as the maximum of the relative error on the single component between the measured and the expected values.

Supporting Information
Supporting Information is available from the Wiley Online Library or from the author.