Stochastic spin-orbit-torque synapse and its application in uncertainty quantification

Stochasticity plays a significant role in the low-power operation of a biological neural network. In an artificial neural network (ANN), stochasticity also contributes to critical functions such as the uncertainty quantification (UQ) for estimating the probability for the correctness of prediction. This UQ is vital for cutting-edge applications, including medical diagnostics, autopilots, and large language models. Thanks to high computing velocity and low dissipation, a spin-orbit-torque (SOT) device exhibits significant potential for implementing the UQ. However, up until now, the application of UQ for stochastic SOT devices remains unexplored. In this study, based on SOT-induced stochastic magnetic domain wall (DW) motion with varying velocity, we fabricated an SOT synapse that could emulate stochastic weight update following the Spike-Timing-Dependent-Plasticity (STDP) rule. Furthermore, we set up a stochastic Spiking-Neural-Network (SNN), which, when compared to its deterministic counterpart, demonstrates a clear advantage in quantifying uncertainty for diagnosing the type of breast tumor (benign or malignant).


Introduction
Stochasticity plays a significantly crucial role in the low-power operation of a biological neural network (BNN).Beyond the stochastic firing of neurons, synapses responsible for modulating the strength of connections between pre-neurons and post-neurons also manifest non-deterministic behaviors.The random events associated with a biological synapse stem from the spontaneous opening of intracellular Ca 2+ stores, synaptic Ca 2+ channel noise, and the random positioning of vesicles with a broad size distribution [1].This stochasticity is discernable through distinct outputs under identical input stimulation [Fig.1(a)].As inspired by BNN, the Artificial Neural Network (ANN) has been widely harnessed to execute various functions in artificial intelligence, including recognition, analysis, and inference.In a deterministic ANN, the presence of noise in an electronic device could be detrimental for the accuracy of predictions.However, judiciously leveraging noise can offer crucial functionalities that a conventional ANN lacks.For example, moderate noise can expedite the training process of an ANN and be utilized for probabilistic computing [2].In addition to that, grounded in Bayesian inference, controlled noise also plays a pivotal role in UQ [3,4], which includes the calculation of posterior probability based on observational outcomes and the estimation of uncertainty in predictions [Fig.1(b)].This UQ hold particular significance for cutting-edge applications such as medical diagnostics, autopilots, and large language models [5][6][7].
Among various types of ANNs, the Spiking Neural Network (SNN) holds unique potential for implementing UQ because the presence or absence of spikes in an SNN correlated with the sampling from a binary random variable [8].On the other hand, the event-driven and asynchronous spikedbased parallel processing results in low power consumption [9], a crucial aspect for UQ demanding substantial computational effort.To date, there has been widespread demonstration of the application of UQ using SNNs with stochastic neurons [10,11].In addition to neurons, stochastic synapses also play a critical role in UQ.For example, it has been theoretically established that spiking artificial neurons in a noisy synapse environment can perform Bayesian inference based on incomplete observations [12].Stochastic synapses have also found application in neural sampling machines for approximating Bayesian inference through Monte Carlo sampling [13].
Different electronic devices, such as memresistors, ferroelectrics, and spintronics devices, can be exploited to emulate an artificial synapse in an ANN.In the Spin-Orbit-Torque (SOT) device, current pulses can induce a continuous variation of anomalous Hall resistance (R AHE ) characterized by ultra-low dissipation (fJ/bit) and rapid processing (ps) [Fig.1(c)] [14,15].Consequently, the SOT device holds promise for stochastic neuromorphic computing which requires computational power with low energy consumption.Despite increased attention to the stochasticity of SOT devices in recent years [16,17], the exploration of their application in UQ remains largely uncharted.
From a microscopic perspective, the modulation of R AHE in an SOT device is based on currentdriven domain wall (DW) motion [Fig.1(c)], including a linear change in R AHE when a DW moves at a uniform velocity and a nonlinear variation of R AHE when DW motion features variable speeds [18,19].Inducing a uniform DW motion necessitates driving the DW into a narrow strip.However, in more general scenario, DW motion comprises distinct stages with varying velocities, including the initial post-depinning fast motion and the subsequent slower motion as the DW approaches the edge [Fig.1(d)].This DW motion with diverse velocities could lead to nonlinearly varied R AHE , which corresponds to nonlinear weight update for a synapse in an SNN.Furthermore, owing to thermal fluctuation and random distribution of pinning centers, the DW motion also exhibits stochastic behaviors [20,21], contributing to stochasticity in weight updates.
In this paper, leveraging SOT-induced DW motion with varying velocities, we designed and fabricated a stochastic synapse to emulate nonlinear weight updates in an SNN.Employing this SOT synapse, we established a stochastic SNN for the classification of breast tumors, assessing the uncertainty of the prediction.In contrast to a deterministic neural network, the SNN equipped with the stochastic SOT synapse adeptly gauges the uncertainty for the prediction outcomes.

Results and discussions
To determine the optimal structure of an SOT device suitable for nonlinear weight update, we  Building upon the simulation results, we fabricated an SOT Hall-bar made of the Ta(3.0)/Pt(5.0)/Co(1.15)/SiN(7.0)multilayer (The numbers in parentheses denoted layer thickness in nanometers) deposited on a Si/SiO 2 substrate via the magnetron-sputtering (Fig. 2b) (Refer to the "Methods" section for more details).The measurement of AHE revealed that the film displays perpendicular magnetic anisotropy with a coercivity of approximately 100 Oe (Fig. 2b).We also conducted the measurement of R AHE as a function of current (I) at different maximum I.Here we exhibit the variation of R AHE (ΔR AHE ) with respect to the initial data before applying the magnetic field or current.A series of stable ΔR AHE values persists after removing the currents, forming the base for the nonvolatile multi-states of a memresistor (Fig. 2c).Additionally, we also collected the variation of ΔR AHE with pulse number at different I ranging from 10 to 30 mA (Fig. 2d).For I below 20 mA, the variation of ΔR AHE was minimal, indicating negligible weight updates when the triggering was below the threshold.However, at a current as high as 30 mA, ΔR AHE rapidly approaches saturation following the initial pulse injection.A continuous nonlinear variation of ΔR AHE occurred under 25 mA, rendering it suitable for nonlinear weight updates.
We conducted additional measurements on ΔR AHE under a series of current pulses to characterize the stochasticity (Figs.2e ~ 2h).In Figs.2e and 2g, the black curves represent the mean values, while the red error bars denote the standard deviation (σ).Under a fixed current amplitude (25 mA) and pulse width (50 μs), the ΔR AHE exhibits a Gaussian distribution with varying σ between 10 and 30 mΩ (The inset of Fig. 2e illustrates a representative distribution of ΔR AHE at the 15 th pulse.).
Notably, σ significantly increases in the first few pulses for both LTP and LTD stages, stabilizing as ΔR AHE approaches saturation (Fig. 2f).A similar nonlinear variation in ΔR AHE and non-monotonous changes in σ were observed under a series of current pulse with varied amplitudes (Figs.2g and 2h).
However, the σ in this case appears smaller than that under a fixed pulse amplitude (Figs.2e and   2f).To elucidate the microscopic mechanism for the nonlinear variation of ΔR AHE with current pulses, we observed the SOT-induced DW motion by using the Magneto-Optical Kerr Effect (MOKE) microscope (Refer to the "Methods" section for more details).Under the initial 20 % current pulses in the LTP procedure, approximately half of the cross-region experiences rapid magnetization switching through swift DW motion immediately after depinning (Fig. 3a).Afterwards, the remaining magnetization in the cross-region was gradually switched under the subsequent 60 % pulses.In the LTD procedure, as the magnetization in the LTP stage was not fully switched, the magnetization in the cross-region easily switched during the first 40 % pulses.However, numerous additional pulses were still required to fully switch the magnetization in this region (Fig. 3b).This observed magnetization switching process aligns with the results obtained from micromagnetic simulations (Figs.3c ~ 3d).The magnetization switching based on DW motion with varying velocity corresponds to the nonlinear variation of ΔR AHE with the number of current pulses.

(d) Variation of RAHE with respect to Δt in the context of the triangle-wave current pulses illustrated in (c).
In an SNN, the weigh update follows the Spike-Timing-Dependent-Plasticity (STDP) rule, involving exponential weigh variation by modifying the time difference of spike arrival between the pre-synaptic and post-synaptic neurons (Δt).Typically, spiking pulses for both pre-and postsynaptic neurons can manifest as triangle waves, resulting in an effective square-wave pulse with variation in both pulse width and amplitude (Figs.4c) [22].Alternatively, the pulses for the pre-and post-synaptic neurons can take the form of square-wave pulses, with polarity controlled by the time sequence difference between the pre-and post-synaptic neurons (Figs.4a) [23,24].By adjusting the width and amplitude of the current pulse, we verified that the variation of ΔR AHE with Δt adheres to the exponential STDP rule (Figs.4b and 4d).
To evaluate the efficiency of UQ for the SOT synapse, we designed a stochastic SNN composed using deterministic neurons and a stochastic SOT synapse to classify the breast tumors (benign or malignant) via Python3.10software (Fig. 5).For comparison, we also employed a deterministic SNN counterpart (with zero standard deviation in weight updates) for the same application.We mapped the ΔR AHE to the weight value ω between 0 and 2 and modeled the variation of ω as a function of Δt based on the exponential STDP rule.Meanwhile, we also fitted the standard deviation σ ω of the weight update [σ ω = (σ/ΔR AHE )ω] as a function of ω.The breast tumor data, sourced from the Wisconsin Breast Cancer Data [25], comprised a dataset of 699 entries, with 399 for training and 300 for testing.Each entry detailed nine tumor features (clump thickness, uniformity of cell size, uniformity of cell shape, marginal adhesion, single epithelial cell size, bare nuclei, bland chromatin, normal nucleoli and mitoses) and 2 classes (benign or malignant).The network, featuring 90 input and 2 output neurons, encoded each tumor feature with every 10 input neurons, while each output represented a tumor type.The output neurons adhered to the leak-integer-fire principle, and the network followed the winner-take-all rule, ensuring that one neuron could not fire meanwhile the other one was active.
We initially reduced the original 9-dimensional data to 2 dimensions using the principal component analysis (PCA) [26].Subsequently, we partitioned the 300 test data points evenly into 23 regions and visualized their distribution in a coordinate system defined by the first and second principle components (Fig. 5a).Data points located outside the region delineated by the two green dashed lines unequivocally correspond to either benign or malignant tumors.Within this region, however, the data points for the two tumor types are intermingled.
After completing the full training of the SNN, we utilized the stochastic SNN to predict the type of breast tumors for each test datum by 100 times.The prediction probabilities in each region were then averaged.The probability of classifying a tumor as benign (B) or malignant (M) can be estimated using the formula: This implies that the network provides identical results across the 100 predictions, irrespective of their correctness (the bottom figure in Fig. 5d).Such uniformity could potentially lead to misdiagnose.In contrast, the stochastic SNN exhibited a non-zero prediction entropy in the middle region, peaking when the probability for classifying the tumor as malignant hovered around 50% (the upper figure in Fig. 5d).This peak signifies the highest difficulty in accurately determining the tumor type.This elevated prediction entropy serves as an indication for doctors to consider further examinations.Outside this region, however, the prediction entropy diminishes to nearly zero, indicating a high level of confidence in the predictions.

Conclusion
In summary, we designed and fabricated a SOT synapse, enabling nonlinear weight updates following the STDP rule through DW motion with varying velocities.The R AHE of the device followed a Gaussian distribution, facilitating its modeling as a stochastic synapse to introduce a noisy environment for deterministic neurons.Leveraging these stochastic synapses, we constructed a stochastic SNN and applied it to the classification of breast tumors.The network exhibited high prediction accuracy and demonstrated proficiency in quantifying the uncertainty associated with predictions.

Figure 1 .
Figure 1.(a) Illustration depicting a biological synapse facilitating communication between pre-synaptic and first simulated the SOT-induced variation of the z-component of magnetization (m z ) within the cross regions of four Hall bars with different shapes by using the micromagentic simulation [Fig.2a](Refer to the "Methods" section for more details).This m z variation arises from the DW motion shown in Fig.3.In the Hall bars Ι and Ⅱ, the length (L) of the cross region is much smaller than the width (W), with consideration given to the edge roughness in structure Ⅱ.A square-shaped crossregion, where L equals W, was assumed in the Hall bar Ⅲ, while L > W in Hall bar Ⅳ.The simulation results demonstrated a linear m z variation attributed to uniform DW motion in Hall bar Ⅳ, whereas nonlinear m z variation was obvious for the Hall bars Ι and Ⅱ. Especially, the m z variation in the Hall bar Ⅱ closely resembles the procedures of Long-Term Potential (LTP) and Long-Term Depression (LTD) associated with the nonlinear weight update in an SNN.

Figure 2 .
Figure 2. (a) Simulations of SOT-induced magnetization switching in Hall bars with varied shapes in cross

Figure 3 .
Figure 3. (a) Visualization of DW motion under a sequence of current pulses during the LTP procedure.(b)

Figure 4 .
Figure 4. (a) Waveform of square-wave current pulses with controlled polarity designed for the pre-synaptic

Figure. 5
Figure. 5 An SNN comprising stochastic DW synapses and its application in the classification of breast tumors.

.,
Here N k represents the total spiking occurrences of the output neuron z k .As illustrated in Fig.5b, for both the deterministic and stochastic SNNs, the probabilities for classifying the tumor as malignant closely align with those calculated directly from the test data (the blue line).We further computed the overall accuracy by diving the number of correct predictions by the total 300 test data points.The stochastic SNN achieved a maximum accuracy of 95%.Based on the results of the prediction probability, we further calculated the prediction entropy where c indicates the type of the breast tumor, and x and y are the input data and the prediction result, respectively.Notably, in the intermediate region between the two dashed lines, the prediction entropy of the deterministic SNN was consistently zero.