Neuromorphic Motion Detection and Orientation Selectivity by Volatile Resistive Switching Memories

Motion detection is a primary visual function, crucial for the survival of animals in nature. Direction‐selective (DS) neurons can be found in multiple locations in the visual neural system, both in the retina and in the visual cortex. For instance, the DS ganglion cell in the retina provides a real‐time response to moving objects, which is much faster than the image recognition executed in the visual cortex. Such in‐retina biological signal processing capability is enabled by the spatiotemporal correlation within different receptive fields of the DS ganglion cells. Taking inspiration from the biological DS ganglion cells in the retina, the motion detection is demonstrated in an artificial neural network made of volatile resistive switching devices with short‐term memory effects. The motion detection arises from the spatiotemporal correlation between the adjacent excitatory and inhibitory receptive fields with short‐term memory synapses, closely resembling the physiological response of DS ganglion cells in the retina. The work supports real‐time neuromorphic processing of sensor data by exploiting the unique physics of innovative memory devices.


Introduction
The biological visual system has been extensively investigated by biologists in the last 60 years, [1][2][3] where retina cells play a crucial role in sensing, transferring, and pre-processing the external light input. While visual information is extensively processed in the visual cortex, the retina can conduct fast, in situ cognitive functions in real time. [4] Energy efficiency is enabled by spikecoding of visual information where neuron activity is limited to the time and space of the external stimulation. This type of spike-based, in situ processing has not been presented in state-of-the-art artificial vision systems, usually relying on a frame-based video camera for capturing the images at periodic times, converting the analog visual information into the digital domain, and processing the individual image frames in the central processing unit (CPU). The mainstream research efforts in artificial intelligence in recent decades have focused on image processing to extract relevant information by deep learning techniques. [5] However, frame-based artificial visual systems are extremely inefficient from energy and frequency perspectives, compared with their biological counterpart. To improve the energy efficiency of artificial visual systems, alternative neuromorphic concepts have been proposed. [6] For instance, the dynamic vision sensor (DVS) is a camera [7] that can detect the contrast change of images, thus only delivering a spike when and where an event takes place. Combined with spike-based neuromorphic processors, such as True-North [8] and Loihi, [9] DVS results in lower bandwidth and lower power consumption. Pattern recognition in photosensors has been recently demonstrated to enable low-power and low-latency implementation of visual cognitive functions. [10,11] However, a biomimetic artificial visual system closely mimicking the biological one and enabling fast, low-power cognitive functions is still missing. [12] A fundamental function of the biological visual system is the motion detection by direction-selective (DS) ganglion cells in retina [13] and visual cortex striate neurons. [14][15][16] As motion detection is crucial for animals to survive, natural evolution has selected a fast and robust motion detection mechanism by DS ganglion cells in the retina. [17,18] This consists of bipolar cells and starburst amacrine cells (SACs) with different receptive fields stimulating the ganglion cells with excitatory and inhibitory inputs, respectively. [3,19] The spatiotemporal correlation of these input signals results in different postsynaptic currents injected into different ganglion cells, thus enabling the sensitivity to various moving directions of the object under sight. [2] In this work, we develop a retina-inspired artificial vision system capable of motion detection by exploiting the short-term memory effect of Ag-based volatile resistive switching memory (RRAM) devices. [20,21] These volatile RRAMs act as short-term ionic channels in the neuron membrane to enable the spatiotemporal correlation in the retina. We show that, similar to biological ionic channels, our volatile RRAM synapses are affected by a significant stochasticity, [22] which, however, can be compensated by proper redundancy to achieve highly robust direction selectivity.

DOI: 10.1002/aisy.202000224
Motion detection is a primary visual function, crucial for the survival of animals in nature. Direction-selective (DS) neurons can be found in multiple locations in the visual neural system, both in the retina and in the visual cortex. For instance, the DS ganglion cell in the retina provides a real-time response to moving objects, which is much faster than the image recognition executed in the visual cortex. Such in-retina biological signal processing capability is enabled by the spatiotemporal correlation within different receptive fields of the DS ganglion cells. Taking inspiration from the biological DS ganglion cells in the retina, the motion detection is demonstrated in an artificial neural network made of volatile resistive switching devices with short-term memory effects. The motion detection arises from the spatiotemporal correlation between the adjacent excitatory and inhibitory receptive fields with short-term memory synapses, closely resembling the physiological response of DS ganglion cells in the retina. The work supports realtime neuromorphic processing of sensor data by exploiting the unique physics of innovative memory devices.
Inspired by the concept of the receptive field in retina, [23,24] we spatially distribute the excitatory and inhibitory volatile devices, thus allowing us to reproduce the direction selectivity and motion detection to moving objects. Based on these results, an artificial retina concept with real-time motion and direction selectivity is finally demonstrated. The responses of the artificial retina are used as an input for further visual cognitive processing in a spike-based neural network, allowing for fast and accurate direction sensitivity in an artificial visual cortex. Figure 1a schematically shows a biological synapse where the presynaptic contact releases a neurotransmitter upon electrical activity of the presynaptic neuron. The neurotransmitter (for instance, L-glutamate in Figure 1b,c [25] ) binds to receptors located in the plasma membrane of the postsynaptic neuron, causing the opening of the ligand-gated ion channel and a positive ion (e.g., Ca 2þ ) current flow into the postsynaptic neuron ( Figure 1b). Due to the slow unbinding rate of the neurotransmitter, the ionic current persists even after the neurotransmitter release is completed. The binding affinity controls the unbinding time of the neurotransmitter, hence the persistence time of the ionic current. [26] Summation of the ionic currents of each individual channel yields the excitatory postsynaptic current (EPSC; Figure 1c), where the time decay dictates the short-term memory effect.

Biological and Artificial Short-Term Memory Effect
To mimic the short-term opening of ionic channels, we have developed an artificial synaptic device using an RRAM device, consisting of an Ag top electrode, a HfO 2 switching layer, and a graphitic C bottom electrode (Figure 1d; see Experimental Section for the fabrication and characterization details of the device). The RRAM device was connected to a series field-effect transistor (FET) to control the switching current, hence its retention time. [27] A positive voltage pulse applied to the Ag top electrode (top in Figure 1e) induces the switching of the RRAM device to the on state, which is monitored by applying a small read voltage (0.15 V) to the device (Figure 1e). The gate voltage of the FET was biased to keep the current during the switching pulse and the read phase at a comparable level (see Figure S1, Supporting Information). In the on state, the RRAM device contains an Ag filament generated as a result of Ag migration during the switching pulse. The current measured during the read phase reveals a transition to the off state after a stochastic retention time t ret , due to the retraction of the Ag filament by Gibbs-Thomson interfacial energy minimization. [28,29] Similar to the ionic channel current in the biological synapse, the RRAM retention time is highly stochastic (see Figure S2, Supporting Information) and falls in the 10-100 ms time scale. [30] As a result, the averaged currents of 50 individual Ag-based RRAM devices ( Figure 1f ) can closely reproduce the time decaying biological EPSC current ( Figure 1c).

Detection of Spatiotemporal Information
The short-term memory properties of the RRAM device allow for spatiotemporal recognition of spiking sequences. Figure 2a shows the experimental setup for detecting the sequence of spikes. Here, the input spikes were applied to two combined synapses A and B, each containing N volatile RRAM devices in parallel configuration to average the stochastic retention time (see Figure S3, Supporting Information). The synaptic currents,   Figure 2b for N ¼ 5. If the spike sequence is A followed by B, then the EPSC exhibits a positive spike, which can be detected by a comparison with a threshold current, e.g., I th ¼ 3 μA in Figure 2b. On the other hand, if the spike sequence is B followed by A, then the EPSC exhibits a negative spike, which will not cross by the positive threshold. This allows for spatiotemporal computation, [31,32] where the spatial information (i.e., which pre-neuron generates the spikes) and the temporal information (i.e., when spikes are generated) are recognized by the spiking stimulation of volatile RRAM devices. Figure 2c shows the distribution of the maximum EPSC obtained by repeating the experiments in Figure 2b for 100 times. The threshold to discriminate the current in the two cases of Figure 2b is determined from the statistical distribution of EPSC in Figure 2c. In the retina, it can be achieved by the adaptive behavior of the postneuron, which can automatically adjust its threshold according to the binomial distribution of the output current from the receptive fields. The experimental results indicate that the maximum EPSC of sequences A-B and B-A can be fully discriminated by being below and above the threshold current, respectively. Varying the number of volatile devices in each synaptic branch and the time interval of the two spikes can improve the recognition accuracy (see Figure S4 and S5, Supporting Information).

Direction Selectivity in Ganglion Cells
The spatiotemporal network of Figure 2a can be used for DS behavior similar to the receptive fields of ganglion cell in the retina, which is schematically shown in Figure 3a. Receptive fields are distributed in space and organized in excitatory and inhibitory branches connected to bipolar cells and SAC, respectively. The bipolar cells receive sensory stimuli from photoreceptors and, in turn, provide excitatory spikes to the SAC. [2] Note that the SAC transfers the excitatory input to inhibitory output locally in its dendrites. [33] When the image of an object moves across the receptive field, excitatory and inhibitory branches are stimulated in sequence, thus causing an EPSC with either positive or negative sign. [23] To mimic DS in ganglion cells, we arrange volatile RRAM devices in excitatory and inhibitory synapses, as shown in Figure 3b. The resulting circuit is schematically shown in Figure 3c, where spikes generated by the passing image stimulate excitatory and inhibitory RRAM devices at different times, similar to the receptive field of ganglion cells. Depending on the image direction, e.g., from zone A to zone B (A-B, preferred direction) or from zone B to zone A (B-A, non-preferred direction), the excitatory and inhibitory receptive fields are stimulated in opposite sequences. Figure 3d shows the currents I A and I B for the case of image moving from left to right, corresponding to the sequence A-B of spiking currents. In this case, the overall EPSC shows a positive current exceeding the threshold current I th ¼ 2.5 μA, thus being recognized as the preferred direction. The image moving along the opposite direction causes the spike sequence B-A and a consequent negative EPSC, as shown in Figure 3e. Figure 3f shows the distribution of the maximum of the I EPSC in the experiments of Figure 3d,e. Because of the averaging within the population of RRAM devices and the spatiotemporal correlation of the input spikes, the distribution shows a sharp contrast between the preferred and non-preferred directions remaining mostly above and below the threshold I th ¼ 2.5 μA, respectively. This shows that the DS of the ganglion cell can be well reproduced in a hardware circuit by spike processing in the volatile RRAMs.
DS can be made more robust by increasing the number of RRAMs in each excitatory and inhibitory branch, because of the better averaging of the stochastic retention time (see Figure S6 and S7, Supporting Information). The artificial ganglion cell is also sensitive to the velocity of the moving object (see Figure S7, Supporting Information). In fact, DS vanishes for relatively high velocity, as the timing between spikes A and B becomes less than the resolution of the circuit, and for very slow objects, as the spike delay becomes much longer than the retention time of the volatile RRAM. The dynamic response of volatile RRAMs thus enables movement detection, where the neural network detects only moving objects with high energy efficiency due to event-driven responsivity. By continually changing www.advancedsciencenews.com www.advintellsyst.com the moving and direction angle, from non-preferred (À180 ) to preferred angle (0 ) and vice versa (0 to 180 ), the highest I EPSC is detected at the preferred direction, as shown in Figure 3g.
A numerical Monte Carlo model for the DS neural network was developed based on the stochastic response of the volatile RRAMs (see Experimental Section for the details). The results of the numerical model (solid red lines) are also shown in Figure 3g, and Figure S7, Supporting Information. Figure 3 h shows the polar plot of the DS response (the same data as in Figure 3h), highlighting the DS of a moving image by the spatiotemporal network of volatile RRAM.

Artificial Retina for DS
The concept of artificial ganglion circuit allows to develop an artificial retina with full DS capability. Figure 4a shows a schematic view of an active area of the retina. [19] Excitatory synapses (blue dots in Figure 4a) are formed directly from bipolar cells to the ganglion cells, whereas inhibitory synapses are formed in the overlap region between SAC and ganglion cells (yellow dots in Figure 4a). As a moving image passes over this area of the retina, the ganglion cells (colored cells in Figure 4a) receive asymmetric excitatory and inhibitory inputs from bipolar cells and SAC, respectively. [2] Only four types of ganglion cells have been found in retina, with preferred directions being left to right, right to left, bottom to top, and top to bottom, [34] which are labeled as East, West, North, and South, respectively, in Figure 4b. Based on the response of the four ganglion cells, the movement of an object and its direction can be easily detected. Figure 4b shows an artificial version of the retina, where inhibitory volatile RRAMs are located in the center, whereas excitatory volatile RRAMs are arranged in the surrounding area horizontally and vertically. The four receptive fields of the ganglion cells were assumed to contain ten excitatory and ten inhibitory volatile RRAMs, to guarantee sufficient averaging of the stochastic retention time. To facilitate the practical hardware implementation, we arranged the volatile RRAM devices in a grid array. The current summation is straightforwardly obtained by Kirchhoff 's law at the common source terminals in the 1T1R crossbar array, where the connections can be obtained by a dedicated metal layer in the integrated circuit. Figure 4c shows the EPSC response in the case of an image moving from left to right. The East cell displays the maximum positive current, whereas the West cell displays the maximum negative current. On the other hand, North and South cells only show small fluctuating currents, as www.advancedsciencenews.com www.advintellsyst.com the excitatory and inhibitory synapses are stimulated by the moving image roughly at the same time. Figure 4d shows the similar results for a movement direction from right to left, thus resulting in maximum positive EPSC for the West cell and lower EPSC for the other cells. Figure 4e shows the distribution of the maximum positive EPSC for experiments with moving angle θ ¼ 0 , namely, left to right, similar to the one shown in Figure 4c. In most cases, the East cell shows the highest response, whereas the current of the other three cells is less than half of the East-cell EPSC. Assuming a threshold current I th ¼ 2.5 μA, we could achieve 98% accuracy of detecting the correct direction angle θ ¼ 0 . Figure 4f shows the measured maximum EPSC as a function of the direction of the moving object for all ganglion cells.
The results of the Monte Carlo model are also shown in the same plot (solid lines). Figure 4g shows the four polar plots of the maximum EPSC for the four ganglion cells. The response on the polar plot closely resembles the measured output of biological DS ganglion cells in the retina, [19,35] thus supporting the bioplausibility of the artificial retina developed by volatile RRAMs.

Complex DS in the Visual Cortex
Unlike the DS ganglion cells in retina, where the preferred directions are clustered along the cardinal axes, DS cells in the visual cortex show fine-grained preferred directions along multiple axes. [34] The development of such accurate direction selectivity requires visual experience, [36] namely, training. To enable highly accurate direction selectivity, the response of DS ganglion cells in the artificial retina can be further processed in a multi-layer neural network. Here, the full EPSC traces of Figure 4c,d can be used as input information for the recognition of the motion direction with nearly 100% accuracy (see Figure S9, Supporting Information). However, using the full EPSC traces for fine direction selectivity in retina is highly inefficient and not compliant with spike coding of information in the brain. Note that recognition of an object moving closer or farther away, i.e., distance selectivity as opposed to direction selectivity, is generally achieved in animals by stereoscopic vision. As both the two eyes are involved, distance selectivity takes place in the visual cortex, which goes beyond the scope of this work. As a more bio-realistic approach to accurate direction selectivity, we considered a brain-inspired spike-based processing of information where ganglion cells are assumed to behave as Hodgkin-Huxley (HH) neurons [37] (see Experimental Section, and Figure S10, Supporting Information). Figure 5a shows the simulated spiking activity of the ganglion cells, modeled as HH neurons, in response to the EPSC currents of Figure 4c. The time of the first spike in each ganglion cell (t E , t W , t N , and t S in Figure 5a) is used as an analogue input to the neural network in Figure 5b, displaying 50 hidden neurons and 24 output neurons for 15 precision of direction selectivity between À180 and 180 . A 90% fraction of the data in Figure 4 is used for the supervised training of the neural network, whereas the remaining 10% is used as a test dataset. The twolayer neural network can be viewed as the visual pathway from the retina to the visual cortex via the lateral geniculate nucleus. [38] Details about the neural network configuration and training algorithm are given in Experimental Section. After 1000 epochs of training, the neural network reaches a direction recognition accuracy of 82.92% for the testing dataset, as shown in Figure 5c. www.advancedsciencenews.com www.advintellsyst.com input directions, for training and testing datasets, respectively. Recognition errors mostly consist of nearest neighbor DS cortex cells, which can be viewed as tolerable error.

Discussion
Motion detection in frame-based vision systems can be achieved by tracking the location of detected features in successive frames. [39] However, frame-based systems are affected by high energy consumption and relatively long latency. On the other hand, event-driven cameras only respond to contrast changes, yielding output spikes that can be further processed for motion detection or gesture recognition. [8] Energy consumption is, thus, greatly reduced; however, the latency issue still exists (>100 ms) [8] since the spike-event generation and processing are separated. The retina-inspired DS system in this work can respond to the object movement in real time and in situ. Energy consumption for each spike of the device operation is estimated to be 87.5 nJ (see Experimental Section), which is higher than previously reported memristive synapses, [40] mainly due to the relatively long retention time in the range 0.01-1 s. Reducing the retention time by engineering the device structure and materials can, thus, contribute to reduce the energy consumption. The DS circuit can be easily integrated within complementary metal-oxide-semiconductor (CMOS) image sensors to enable preprocessing of visual information in situ within the camera at no additional area cost, thus enabling the development of a biometric eye. [12] The local motion information in a frame can be used for image stabilization, which usually requires postexposure image processing or complex mechanical devices in camera lens. [41] Motion detection of the DS ganglion cells in retina is a genetic ability, because it exists before the eye experiences any light stimulus. [36] Accurate direction selectivity in visual cortex, however, requires training with visual experiences. The DS ganglion cells emit spatiotemporal spiking patterns, [32,42] which hold sufficient information for accurate direction selectivity in the visual cortex. In principle, learning of direction sensitivity in the visual cortex can also be implemented by fully connected memristive neural networks, [43] which have gained a significant maturity in recent years. In the biological visual system, however, information is transferred from retina to visual cortex as spiking patterns. Future works may explore fully spatiotemporal networks [32] that combine the high accuracy of direction sensitivity and the high energy efficiency of spike-based computation.

Conclusion
Ag-based volatile RRAM devices show short-memory effect, which mimics the ligand-gated ionic channel in biological synapses. Based on this analogy between the nanoelectronic device www.advancedsciencenews.com www.advintellsyst.com and the biological synapse, we have harnessed the short-memory effect to enable spike-based computation via spatiotemporal correlation. Although the retention time of the volatile RRAMs is affected by high stochasticity, we demonstrate robust computation using multiple devices that average stochastic variations of the volatile response. Through the concept of the receptive fields, we have developed an artificial ganglion cell by spatially arranging the volatile RRAM devices in receptive fields of excitatory and inhibitory cells. We, thus, can demonstrate the direction selectivity and motion detection of the artificial ganglion cells.
The direction selectivity and motion detection function enabled by the volatile RRAM device can be integrated to realize an image sensor array with low-power motion detection embedded at the sensor level. Finally, we show that the more refined direction selectivity of visual cortex cells can be obtained by a two-layer neural network with input from the DS ganglion cells, thus mimicking the visual pathway from retina to visual cortex.

Experimental Section
Volatile Device Fabrication and Characterization: The volatile resistive device is fabricated on top of a transistor from the standard CMOS technology. The bottom electrode is a graphitic carbon pillar with a diameter of 70 nm connected to the transistor drain via the tungsten (W) bottom contact. The HfO 2 switching layer (10 nm) and the silver (Ag) top electrode (100 nm) are sequentially deposited by e-beam evaporation without breaking the vacuum. The characterization is performed with a TTi-TGA 12102 arbitrary waveform generator providing the pulses to the top electrode of the resistive switching device and a gate voltage to the gate terminal of the transistor, and a Tektronix MSO58 oscilloscope monitoring the current responses from the source terminal of the transistor.
Short-Term Memory Effect and Analysis: The volatile device is activated by a half-triangular pulse with 10 ms time width and 3.5 V peak amplitude. A small read voltage (0.15 V) is applied to the device to monitor the current of the device. The retention time of the device as well as the retention current, switching-on time, and switching-on current ( Figure S2, Supporting Information) are analyzed after the current traces were retrieved from the oscilloscope. We varied the gate voltage, to check the dependence of retention behavior on compliance current of the switching. The retention time is found independent of the compliance current except for its tails in short time range. We used the gate voltage of 0.9 V for further analysis to keep the switching-on current in the similar level of the retention current.
Demonstration of a DS Network: We collected 200 voltage and current traces of volatile RRAM devices from the oscilloscope and stored in a database. For the demonstration shown in Figure 3 and 4, the starting times of voltage pulses were calculated according to the locations of volatile devices in their receptive fields, the velocity and the direction of the light bar. Current traces of single device were randomly selected from the database and synchronized, averaged, and subtracted in simulation to yield the EPSC response ( Figure S8, Supporting Information).
Compact Modeling: The compact modeling for the direction selectivity was developed, considering the various stochasticity of the volatile behavior, switching-on time of the device during voltage pulse, the on-current during the pulse, on-current during the retention (retention current), and retention time. See Figure S2 and S8, Supporting Information, for the simulated distributions of these parameters and modeling methods.
HH Neuron Model: The HH model is a set of nonlinear differential equations that approximating the dynamics of voltage-gated ions channels ( Figure S10, Supporting Information) and describes the initiation and propagation of action potentials in the neuron membrane. The total current density J through the membrane is given by where C m is the membrane capacitance, g Na and V K are the conductance and reversal potential of sodium ion channel, respectively, g K and V K are the conductance and reversal potential of potassium channel, respectively, and g l and V l are the leak conductance and leak reversal potential, respectively. The two conductances g Na and g K depend explicitly on voltage as well. All constant parameters and equations for g Na and g K dynamics use the values provided in the original paper by Hodgkin and Huxley. [37] The EPSC for ganglion cells in Figure 4 is assumed to be injected into a membrane patch of an area of A ¼ 0.1 cm 2 , thus J ¼ I EPSC /A. Training of the Network from Retina to Visual Cortex: The neural network shown in Figure 5a is fully connected with 4 input neurons, 50 hidden neurons, and 24 output neurons. Rectified linear unit (ReLU) and sigmoid functions are used for activations in the hidden layer and output layer, respectively. For simplicity, the input data are normalized to the range from 0 to 1, and the connection weights use floating point values. The training target is that, for a moving direction, the corresponding output neuron should have an output value of 1, whereas other neurons have output values of 0. All the EPSC data falling to 24 directions in Figure 4f (there are 25 directions in Figure 4f, but À180 and 180 are actually the same direction) are used for training in one epoch. Each direction has 100 instances of EPSC traces, and 90 of them are used for training and the remaining 10 for testing. Mean squared error (MSE) is used to calculate the output loss function, and the loss function is backpropagated to preceding layers. A stochastic gradient descendent method is used to update the connection weights with a fixed learning rate of 0.1.
Energy Consumption of Artificial Retina: The energy consumption is estimated from the integration of the voltage and current traces for a single device shown in Figure S2, Supporting Information. The energy consumption can be estimated as around 2.5 ms Â 3.5 V Â 4 μA ¼ 35 nJ during the stimulation voltage pulse and around 0.1 s Â 0.15 V Â 3.5 μA ¼ 52.5 nJ during the read pulse. The total energy consumption per spike is thus 87.5 nJ.

Supporting Information
Supporting Information is available from the Wiley Online Library or from the author.