Gradient Descent on Multilevel Spin–Orbit Synapses with Tunable Variations

Neuromorphic computing using multilevel nonvolatile memories as synapses offers opportunities for future energy‐ and area‐efficient artificial intelligence. Among these memories, artificial synapses based on current‐induced magnetization switching driven by spin–orbit torques (SOTs) have attracted great attention recently. Herein, the gradient descent algorithm, a primary learning algorithm, implemented on a 2 × 1 SOT synaptic array is reported. Successful pattern classifications are experimentally realized through the tuning of cycle‐to‐cycle variation, linearity range, and linearity deviation of the multilevel SOT synapse. Also, a larger m × n SOT synaptic array with m controlling transistors is proposed and it is found that the classification accuracies can be improved dramatically by decreasing the cycle‐to‐cycle variation. A way for the application of spin–orbit device arrays in neuromorphic computing is paved and the crucial importance of the cycle‐to‐cycle variation for a multilevel SOT synapse is suggested.

Inspired by the highly efficient human brain, where the information is stored in plastic synapses, neuromorphic computing architectures using nonvolatile memories [1][2][3] as artificial synapses are expected to achieve ultralow power intelligent systems. [4][5][6][7] Spinobit torque (SOT) devices [8,9] with inherent nonvolatility and radiation resistance, subnanosecond switching dynamics, unlimited endurance, excellent stability, and verified complementary metaloxide-semiconductor transistor (CMOS)-compatibility can also be implemented as the multilevel current-induced magnetization switching, which can be used to emulate the synaptic plasticity. [10][11][12][13][14][15][16][17][18] Beyond the device level, artificial and spiking neural networks [19,20] consisting of spin-orbit synapses have been applied for various purposes such as associative memories, [21] deep belief learnings, [22] on-the-fly learnings, [23] etc. According to the studies on other emerging nonvolatile memories, especially the resistive switching random access memory (RRAM) and the phase change memory (PCM), [24,25] the variation and linearity of the device are crucial parameters for neuromorphic applications. [1,[26][27][28] However, few studies have been conducted on these crucial parameters of multilevel spin-orbit devices, the study of which could be important for developing the neural networks based on SOT synaptic arrays.
Recently, the tunable multilevel SOTinduced out-of-plane magnetization switching was observed by adjusting the magnitude of an in-plane magnetic field, [10] which provides a powerful way to investigate the dependence of device properties on synaptic training behaviors. In this study, we demonstrate prototype pattern classifications based on tunable multilevel SOT synaptic arrays using the basic gradient descent algorithm, where an iterative algorithm optimizes the result by seeking the minimum value of a differentiable function (i.e., the cost function, whose gradient determines target values of updated synaptic weights). First, various indicators of the multilevel SOT synapse, including the cycle-to-cycle variation (the variation of the updated states between different operating cycles for identical initial state and updated input), [29] the linearity range, and the linearity deviation, were characterized as a function of the external in-plane magnetic field H x . Then, the gradient descent algorithm was experimentally carried out on a 2 Â 1 SOT synaptic array under varying H x . The training results were analyzed using the aforementioned multilevel indicators. Finally, an extended m Â n SOT synaptic array with m current-controlled transistors are proposed, and the classification accuracy dependence on the cycle-to-cycle variation is simulated.
As schematically shown in Figure 1a, stacks of Si/SiO 2 substrate/Pt(3 nm)/Co(0.8 nm)/AlO x (2 nm) were patterned into Hall bars with a cross region of 20 Â 20 μm 2 . In the presence of an external magnetic field H x , current pulses I x -induced out-of-plane magnetization (m z ) switching was characterized by detecting the anomalous Hall resistance (R H ) between each pulse with a measuring current of 100 μA. Consistent with the result of our previous work, multilevel SOT-induced m z switching and reductions of the maximum current-induced Hall resistances (AER max ) are obtained with increasing H x , as shown in Figure 1b. The multilevel magnetization is attributed to the magnetic field-induced pinning potential difference, which causes the gradually enlarged reversed domain area as the current-pulse intensity increases. [10] To characterize cycle-tocycle variations of the multilevel SOT device quantitatively, repeated experiments of single pulse-induced magnetization switching from their ÀR max states were investigated under various H x . The measured R H in the linearity ranges of single pulse-induced switching at different H x are shown in Figure 1c, where the minimum and maximum Hall resistances of a fitting line are denoted as R a and R b , respectively. In this article, synaptic trainings are conducted using such linear fitting lines instead of the original measured R H -I x data. Therefore, four variables are introduced to describe the variation and linearity deviation of a SOT synapse. Primarily, the first two variables are the averaged absolute switching variation |ΔR var | and the averaged absolute linearity deviation |ΔR dev |, where the first one describes the average height of error bars, and the second one describes the average maximum absolute difference between the measured value and the fitting line. Then, the averaged relative variation η var and the averaged relative linearity deviation η dev can be obtained as and where η var is the relative variation for the total current-induced switching range (2R max ), and η dev is the relative linearity deviation for the linearity range (R a -R b ). As shown in Figure 1d, both 2R max and |ΔR var | decrease with the increase in H x . The decrease in 2R max can be ascribed to the gradually tilted m z [10] ; Meanwhile, the underlying mechanism of the decrease in |ΔR var | are not clear at this stage, which can be attributed to the growing pinning effect. [10] This pining effect may stabilize the current-driven domain wall motions and thereby enhance the cycle-to-cycle reproducibility. Unlike the monotonous decrease in 2R max , the R b -R a , as a function of H x , initially exhibits an ascending trend for H x < 2500 Oe, followed by a descending trend when H x exceeds 2900 Oe, as shown in Figure 1e. The enlargement of R b -R a from 2100 to 2500 Oe is consistent with the improved multilevel magnetization switching. [10] Meanwhile, |ΔR dev | decreases as H x increases in most cases, except for H x at 2500 and 2900 Oe. Due to the misalignment between the fitting line and measured datapoints, |ΔR var | is unexpectedly large at 2500 and 2900 Oe, as shown in Note 1, Supporting Information. As a result, both evolutions of η var and η dev in Figure 1f show "U" shape trends when H x increases from 2100 to 3500 Oe, with bulges in η 2500Oe dev and η 2900Oe dev . The "U" shape trends enable us to tune the variation/deviation and optimize training results by controlling H x , which will be discussed later.
As schematically shown in Figure 2a, a 2 Â 1 synaptic array comprising two such tunable multilevel SOT devices was applied to carry out pattern classification using gradient descent algorithm, where the training and testing pattern samples (X 1 , X 2 ) are shown in Figure 2b. The array output Z, the logistic function Y (the activation function which controls the output based on Z and mimics the step function output of human neurons using differentiable sigmoid function) as well as the cost function J (the error between the label value and the actual output value) of pattern samples are obtained by measuring the R H -related synaptic weights W i of the two SOT devices. Their relations can be described as the following equations.
J ¼ Ày ln Y À ð1 À yÞ lnð1 À YÞ where b is a bias constant and y is the label value of the pattern (integral number, ranges from 0 to 9). Then, the desired changes in W i are calculated in a gradient descent way as where α is the learning rate. Finally, values of ΔW i determine the direction and intensity of current pulses I i on the basis of fitting lines in Figure 1c. Note that the updated weight (represented by Hall resistance) also depends on the stochastic deviation ΔR dev , as discussed in Note 2, Supporting Information. Aforementioned training procedures are shown in Figure 2c. In brief, a successful pattern classification requires a lowest possible J (! 0), as well as significant differences between Y values for the training (Y ! 1) and testing (Y ! 0) pattern samples. According to this principle, training results for α ¼ 0.5 under various H x , as shown in Figure 2d-g, can be qualitatively interpreted by H x -dependent variations, linearity ranges, and linearity deviations shown in Figure 1d-f as followings: First, the cycle-to-cycle variation is very important to the gradient descent training, as trainings have failed for η var > 5.5% at 2100 and 2300 Oe; Second, the linearity range is also an important factor, as the successful training at 2500 Oe shows similar η var (%4%) and η dev (%5.5%) but nearly tripled R b -R a (%1.2 Ω) with the failed case at 3500 Oe. The effect of linearity range is further confirmed by the case at 3300 Oe. Although the synaptic device at 3300 Oe has small η var (%3%) and η dev (%3.5%), it also has the slowest training process and the weakest identifiability (see cyan triangles in Figure 2d-g) among the successful cases from 2500 to 3300 Oe as the small R b -R a (%0.7 Ω); Third, for 2700 and 2900 Oe with both small η var (<3.5%) and large R b -R a (1.1-1.2 Ω), the best training result is obtained only at 2700 Oe due to its smaller η dev (%4%), comparing with that of 2900 Oe (>5%). Corresponding evolutions of W i and R Hi in each SOT synapse are shown in Note 2, Supporting Information, and other two training results for larger learning rates of α ¼ 1 and α ¼ 2 are shown in Note 3, Supporting Information, where identical trends versus H x are observed. As a general understanding, we attribute the strong correlation between larger linearity ranges and better training results to the adjustable synaptic weights.
Furthermore, we have proposed a m Â n SOT synaptic array for recognizing larger patterns. As shown in Figure 3a, n columns of SOT synapses are parallelly connected and m transistors are designed to switch the input current pulse I mn for certain synapse. For practical applications, H x can be replaced using SOT devices with optimized build-in in-plane magnetic fields, which can be achieved using either an static in-plane magnetic field [30] or a "T-type" interlayer exchange coupling between an inplane and an out-of-plane magnetized layer. [10,[31][32][33] Other magnetic field-free SOT switching strategies include the wedged layers, [34] the asymmetric spin currents, [35][36][37][38][39] the competing spin currents, [40] etc., [41] all of which have potential for the field-free multilevel magnetization switching in the future. The feasibility of this proposal is primitively verified by measuring the gatevoltage-controlled multilevel magnetization switching of a SOT device driven by a thin-film transistor (TFT). Moreover, the variation-dependence of gradient descent algorithm carried out on the proposed SOT synaptic array was investigated by examining the simulated classification accuracy of a one-hiddenlayer backpropagation neural network, as shown in Figure 3b. The hidden and output layers export the same logistic functions as Equation (3), and synaptic weights are updated using a similar gradient descent method in Equation (6) with stochastic deviations of ΔW i ranging from Àη var Â W range =2 to þη var Â W range =2. Note that the η var here works equally with the η dev , because we assume that those devices have the ideal linearity. The classification accuracy is defined as the percentage of successfully recognized testing pattern samples in all testing samples. The success of a recognition can be judged by comparing the label value of each sample with the number of the corresponding highest output Y OL in Figure 3b. In this article, pattern classifications of two typical datasets, i.e., a small image version (8 Â 8 pixels) of handwritten digits from the "Optical Recognition of Handwritten Digits" (ORHD) dataset [42,43] and a large image version (28 Â 28 pixels) of handwritten digits from the "Modified National Institute of Standards and Technology" (MNIST) dataset, [43][44][45] are simulated under various η var , as shown in Figure 3c,d respectively. More details of the simulations can be found in Note 4, Supporting Information. Obviously, the classification accuracies of both datasets drop substantially from %95% to 80%$90% (ORHD %90%; MINST %80%) when the variation increases from 0% to 2%, and then even reduce to %50% when the variation reaches 8%, as shown in Figure 3e, suggesting the great importance of cycle-to-cycle variation for large-scale www.advancedsciencenews.com www.advintellsyst.com multilevel SOT synaptic arrays. To improve the variation and linearity of a multilevel SOT device, uniform pinning potential in the domain wall motion process is required. In addition, emerging technologies in skyrmion synapses may provide new strategies, where linear weight update is expected as it depends on the number of accumulated skyrmions. [46] In conclusion, we have carried out a pattern classification using the gradient descent algorithm on a 2 Â 1 SOT synaptic array under various multilevel switching conditions. It is shown that training results are improved by decreasing cycle-to-cycle variation and linearity deviation as the linearity range increases. We also proposed a larger m Â n SOT synaptic array with m transistors, and the simulated classification accuracy shows a significant positive correlation with the cycle-to-cycle variation. Our findings are an important step toward the application of spinorbit synaptic arrays in neuromorphic computing. www.advancedsciencenews.com www.advintellsyst.com

Experimental Section
Device Fabrication: First, Hall bars were patterned onto Si(0.5 mm)/ SiO 2 (190 nm) wafers by lithography. Then, Pt(3 nm)/Co(0.8 nm)/ AlO x (2 nm) stacks were deposited into the Hall bar patterns at room temperature by magnetron sputtering. Direct current (DC) sputtering was used to deposit the Pt and the Co layers, while radio-frequency (RF) sputtering was used to deposit the AlO x layer. The chamber base pressure was less than 2 Â 10 À6 Pa, and Ar gas was used for sputtering. The chamber pressures during the DC and the RF sputtering were 1.067 Â 10 À1 Pa and 2.666 Â 10 À2 , respectively. No magnetic field was applied during the sputtering. The deposition rates for Pt, Co, and AlO x layers were controlled to be %0.021, 0.013, and 0.0018 nm s À1 , respectively.
Device and Array Measurements: All the electrical measurements were carried out at room temperature using a Keithley 2602B as the current source and a Keithley 2182A as the nanovoltmeter. Note that there are two types of currents, i.e., the relatively large pulse currents I x for driving SOT-induced magnetization switching, and the relatively small (100 μA) measuring DC for detecting the anomalous Hall voltage and thereby obtaining the anomalous Hall resistance for each device.
Neural Network Simulations: Details of the neural network simulations can be found in the Note 4, Supporting Information.

Supporting Information
Supporting Information is available from the Wiley Online Library or from the author.