Spontaneous Threshold Lowering Neuron using Second‐Order Diffusive Memristor for Self‐Adaptive Spatial Attention

Abstract Intrinsic plasticity of neurons, such as spontaneous threshold lowering (STL) to modulate neuronal excitability, is key to spatial attention of biological neural systems. In‐memory computing with emerging memristors is expected to solve the memory bottleneck of the von Neumann architecture commonly used in conventional digital computers and is deemed a promising solution to this bioinspired computing paradigm. Nonetheless, conventional memristors are incapable of implementing the STL plasticity of neurons due to their first‐order dynamics. Here, a second‐order memristor is experimentally demonstrated using yttria‐stabilized zirconia with Ag doping (YSZ:Ag) that exhibits STL functionality. The physical origin of the second‐order dynamics, i.e., the size evolution of Ag nanoclusters, is uncovered through transmission electron microscopy (TEM), which is leveraged to model the STL neuron. STL‐based spatial attention in a spiking convolutional neural network (SCNN) is demonstrated, improving the accuracy of a multiobject detection task from 70% (20%) to 90% (80%) for the object within (outside) the area receiving attention. This second‐order memristor with intrinsic STL dynamics paves the way for future machine intelligence, enabling high‐efficiency, compact footprint, and hardware‐encoded plasticity.

. X-ray Photoelectron Spectroscopy (XPS) measurements along with fitting results of (a) O 1s, (b) Ag 3d, (c) Zr 3d and (d) Y 3d peaks of the YSZ:Ag switching layer. All spectrums are calibrated by aligning C 1s to 284.6 eV.       Table S1. Element concentrations in the as-deposited memristor according to EDS point spectrums in Figure 3c. As Electrically, sample#1 showed non-volatile resistive switching ( Figure S8b

STL due to relatively smaller interfacial energy between Ag and the host dielectric:
To reveal the physical origin of the second-order dynamics and STL, we fabricated and To reveal the underlying physical origin of this difference, the lateral SiO x :Ag and YSZ:Ag memristors were inspected by SEM before and after the electrical tests, as shown in Figure   S9b and S9d, respectively. In Figure S9b, it's observed that the number and size of Ag nanoclusters between the two electrodes in the SiO x :Ag memristor did not significantly change although some clustered relocated, consistent with the observed constant switching incubation time. On the other hand, the large Ag nanoclusters were broken into smaller ones and scattered along the percolation path in the YSZ:Ag memristor after the electrical test ( Figure   S9d), consistent with Figure 3 of the main text and the observed STL. The underlying reason between this difference is likely the interfacial energy between the SiO x and Ag is larger than that between YSZ and Ag, therefore small Ag nanoclusters tend to be more stable in YSZ:Ag compare to that in SiO x :Ag. The underlying mechanism of the Ag-based volatile switching has been widely studied in the prior arts, here we summarize the mechanism of YSZ:Ag based on previous discussions, which is schematically illustrated in Figure S10. The as-deposited Ag nano-clusters were randomly distributed in the YSZ matrix ( Figure S10a). Upon application of a positive bias, the established electric field polarized Ag clusters and triggered electrochemical reactions.
These Ag clusters acted as bipolar electrodes (BPEs), with an effective anode (δ + ) and cathode (δ -) on opposite sides ( Figure S10b). [7][8][9] Electrochemical oxidation processes generate Ag + ions from the anode side of the clusters, which then drift along the electric field and are reduced to Ag atoms downstream through possible reduction reactions like trapped electrons. As Ag + ions deposit downstream, a second Ag cluster emerges and grows, while the first cluster is slowly consumed and shrinks. When the second cluster exceeds a certain size, adequate polarization forms to oxidize the Ag atoms at its anode side, resulting in the emergence of a third cluster ( Figure S10c). [8] This process repeats, causing a series of Ag clusters on the left to dissolve and merge into the right, leading to overall movement and redistribution of Ag clusters along the electric field direction until filaments form and bridge the electrodes. [7] Once the bias is removed, the Ag filament broke spontaneously, leading to volatile threshold switching due to the Thomson-Gibbs effect, which is caused by the surface curvature radii-related metal atoms' surface diffusion ( Figure S10d). This effect originates from the gradient of surface atomic vacancy concentration or the gradient of the surface atomic chemical potential, resulting in a tendency to minimize the surface energy. [11][12]  Supporting Note 2. STL for spatial attention.

The biological background of spatial attention:
Deep-layer neurons of the visual system feature large receptive fields, such as those located in the inferotemporal cortex. [13][14] These neurons are capable to recognize objects regardless of the spatial locations of the objects, as illustrated in Figure S11a and Figure S11b. Such location and orientation invariant object recognition is a unique feature of the brain which also contributes to the remarkable efficiency. [14] In the presence of multiple objects, simultaneously appeared objects pose a challenge as there is a possibility of features from different objects getting mixed up, a phenomenon commonly referred to as the "binding problem". [15] This mixing of features can cause confusion to the object classifier (e.g. the inferotemporal cortex neurons), invalidate classification results, as depicted in Figure S11b. Therefore, to disentangle the mixed features, a spatial attention mechanism was introduced. Figure S11. An increasing receptive field size allow the deep-layer neurons in visual systems to recognize an object regardless of where object's spatial location is. (a) An airplane appears at the left upper corner of the receptive field. The information is processed by pretrained convolutional kernels, spatial max pooling layers, and classified by the pretrained classifier.
(b) A car appears at the right lower corner of the receptive field. The information is also classified by the same pretrained classifier.

"Binding problem" in spiking convolutional neuronal networks:
For the spiking convolutional neuron network (SCNN) model, the "binding problem" arises when feature layer neurons in the shallow layers spike simultaneously, resulting in the spatial and temporal mixing of features at the deep feature layer, as illustrated in Figure S12. Therefore, the inclusion of bio-inspired spatial attention could improve the accuracy of multiple object classification in SCNN.

How are the STL neurons used for spatial attention to solve the "binding problem":
We address the binding problems by using STL neurons in the shallow feature layers to implement spatial attention mechanism, while keeping the SCNN architecture (the topology and the weights of the pretrained convolutional kernels and the single object classifiers).

Figure S13
illustrates how STL neurons, or the YSZ:Ag second-order memristors, in the shallow feature layer can practice the spatial attention mechanism in a self-adaptive manner. resulting in a lower threshold. Therefore, they spike at an earlier time step (t 1 ) and propagate features to deeper layers towards classification, as shown in Figure S13c. On the contrary, the STL neurons outside of the area of interest spike less frequently, leading to a relatively higher threshold than that of neurons inside the area of interest. Thus, neurons outside the area of interest tend to spike at a later time step (t 2 ) as shown in Figure S13d. This mechanism efficiently detects multiple objects in a self-adaptive manner. A similar idea is also demonstrated in Ref [16] , where different neuron thresholds lead to a different firing sequence, which is consequently used for rank encoding to classify the object in the area of interest. To pre-optimize a spiking convolutional neural network (SCNN) for single object classification (i.e. modified national institute of standards and technology, MNIST, handwritten digits), surrogate gradient-based training was employed in PyTorch. [17] Figure   S14a shows the pre-optimized model, consisting of the input layer (28×28 nodes), the first convolutional layer (15 kernels, zero padding and unit stride, or 28×28×15 nodes), the second convolutional layer (4 kernels, zero padding and unit stride followed by averaging pooling, or 14×14×4 nodes), a dense layer (10 nodes). The training is implemented via SNN backprop, a variation of backpropagation commonly used for SNNs. [17][18] The accuracy during the course of training is shown in Figure S14b, reaching 97% accuracy at the end. The pre-optimized kernels in convolutional layers and weights of dense layers are shown in Figure S14c  Spiking Neural Networks (SNN) are bio-inspired, which employ artificial spiking neurons to model biological neuron behaviors. Famous models include leaky integrate and fire (LIF) [19] , Izhikevich [20] , and Hodgkin-Huxley [21] . SNN neurons encode messages through sparse and binary spikes. Neurons receive incoming spikes, these spikes are integrated and increase the membrane potential (u) over time. Once the membrane potential reaches a threshold (V th ), the neuron fires an output spike (o) and resets the membrane potential. Mathematically, Where t is the time step, n is the indices of the layers of neuron of SNNs. W is the pretrained convolutional kernels. Θ(x) is a Heaviside step function which satisfies Θ(x) = 1 when x > 0, otherwise Θ(x) = 0. In addition to that, in an STL neuron, the neuron threshold (Vth) decreases by a decaying factor (d) upon each firing, according to Equation (3), If the neuron spike at time step t ( ), the threshold decay by a decaying factor d ( ). Otherwise, the threshold remains unchanged ( ).
The SCNN with self-adaptive spatial attention mimics the human visual system, which employs a simple image classifier for multi-object detection. [16] The architecture of SCNN with spatial attention is shown in Figure S15a, consisting of the input layer (56×56 nodes), the first pre-optimized convolutional layer (15 kernels, zero padding and unit stride, or 56×56×15 nodes), the second pre-optimized convolutional layer (4 kernels, zero padding and unit stride follwoed by averaging pooling, or 28×28×4 nodes, the feature layer), the addtional max pooling layer which takes the max value from four corners (with 14×14×4 nodes), the dense layer (10 nodes, the output layer). The extracted input features are visualized in Figure   S15b, where features from four corners (left up/down, right up/down) are max pooled as shown in Figure S15c. This is followed by the dense layer with output spikes shown in Figure S15d.
Spiking regulation, which inhibits a neuron in an inference propagation once it spikes, is applied to the feature layer to prevent features from low threshold neurons from spiking too long and temporally overlapping with features produced by high threshold neurons.
Lateral inhibition, which resets membrane potential of all neurons in the same layer if any of them spikes, is applied to the output layer neurons. This allows the output layer neurons produce spikes that reflect the instaneous features they receive from the convolutional layers, without being affected by the historical features.
Temporal separation of features in the feature layer allows independent feature to be correctly classified by a single object classifier. Early output spikes correspond to the object in the area of lower threshold potential, or the area of interest, which receives attention thanks to spontenous threshold lowering. Latter output spikes correspond to the object in the area of higher threshold potential, or outside the area of interest.

Supporting Note 5. Analysis on internal state of STL neurons in the shallow feature layer
We convert image pixel values to spike intensity of repeat time spike (e.g. t 1 , t 2 , t 3 ) in the input layer. At each time step, these spikes propagate into a SCNN with pretrained weight.
The SCNN feature layer neurons are implemented by YSZ:Ag memristors, whose thresholds reduce upon firing events. Before the spontaneous threshold lowering, these neurons have the same threshold as shown in Figure S16a left, where neurons within the area of interest and outside the area of interest take the same three time steps to accumulate the membrane potential to reach the threshold, as shown in Figure S16a right. Then the SCNN was fed with samples from the multi-object dataset, as shown in Figure S17, where two handwritten digits appear in different corners of the receptive field at different frequencies. After two epochs of inference, the neurons inside the area of interest experienced more spikes than those outside the area of interest, thus having lower thresholds as shown in Figure S16b left. Consequently, the neurons inside the area of interest take only two time steps to accumulate the membrane potential to fire ( Figure S16b right upper), while the neurons outside the area of interest still take three time steps to fire ( Figure S16b right lower). As such, the STL neurons in the feature layer practice the spatial attention according to the spatial locations of the multiple objects. The threshold of neurons in the area that objects appear more frequently (or so called area of interest) decreases, a manifestation of the spatial attention that the neurons within the area of interest spike faster than the rest. Such a spatial attention mechanism is physically achieved by the STL of YSZ:Ag memristor in a self-adaptive manner.
In addition to the self-adaptive spatial attention, the YSZ:Ag memristor-based STL neurons also reduce the latency in inference. Comparing neuron thresholds in Figure S16b and S16c, the average neuron thresholds decrease with epochs, leading to fewer time steps to spike for both neurons within and outside the area of interest and faster information processing of SCNN.  1. Device-to-device threshold variation: First, we examined the impact of the device-todevice threshold variation on the STL-based spatial attention. We experimentally acquired the variation, as shown in Figure S18a, which was fitted to a Gaussian distribution with a mean 0.817V and a standard deviation 0.045V. In the simulation, the initial neural thresholds were sampled from the Gaussian distributions of the same mean and a varying standard deviation (from 0.740V to 0.940V). The multiple object classification accuracy as a function of the standard deviation is shown in Figure S18b. We find there is no clear degradation in the classification performance unless the standard deviation reaches 0.3V (for comparison, the experimental one is 0.045V). This indicates the initial neural thresholds variation imposes relatively less influence to the multi-object classification using SCNN with STL.

Device-to-device and cycle-to-cycle threshold lowering rate variation:
We also tested the impact of device-to-device and cycle-to-cycle threshold lowering rate variation on STLbased spatial attention. Figure    Internal state of neuron (9,7) that is within the area of interest and neuron (7,21) that is outside of the area of interest. Supporting Note 7. The ablation experiment without utilizing the STL neurons.
To evaluate the effectiveness of the STL neuron in implementing the spatial attention mechanism, we conducted an ablation experiment using neurons without STL. The results are presented in Figure S20, which shows the shallow feature layer with and without STL in Figure S20a and S20b, and the spikes of the classifier output with and without STL in  Figure   S20a, causing the spikes of the feature layer of different objects to appear at the same time.
As a result, the simple classifier was unable to recognize both objects (within and outside the area of interest) during the inference, as depicted in Figure S20c. Conversely, the SCNN with the STL neuron can adaptively lower the neuron threshold, as shown in Figure S20b. During inference, the neuron threshold in the area of interest decreases more because the neurons spike more frequently than those outside the area of interest. Consequently, the feature of the object inside the area of interest spikes earlier, temporally separating it from the feature of the object outside the area of interest. This leads to the classifier spiking at different time steps, as shown in Figure S20d. For example, at Epoch 2, the classifier outputs the first spike at t 2 , corresponding to the object inside the area of interest (Digital 7, in the illustrated case). At t 3 , the classifier outputs two spikes due to the presence of the object outside the area of interest, and one is arbitrarily chosen to evaluate the classification accuracy of the second object. In summary, it's observed that both the objects within and outside the area of interest experience recognition accuracy improvement compared to the case without STL neurons, thereby solving the feature "binding problem" in a self-adaptive manner.  (Figure S20b and S20d)), whereas the SCNN without STL generates spikes for the feature layer and classifier at t 3 (Figure S20a and S20c). This is because a lower threshold of STL neuron enables a quicker response of the feature layer neurons, leading to a reduction in SCNN latency.
In summary, the SCNN without STL neurons behaves similarly to the pristine state of the SCNN with STL, where all the neuron thresholds are still the same.