Memristor‐Based Biologically Plausible Memory Based on Discrete and Continuous Attractor Networks for Neuromorphic Systems

To approach an advanced neuromorphic system, a significant unsettled problem is how to realize biologically plausible memory structures that are dramatically different from classical computers. Herein, a physical system based on memristors is simulated to realize associative memory based on discrete attractor networks, which is essentially content‐based storage, and the influence of device characteristics on network performance is systematically studied. An in situ unsupervised learning method is applied to make greater use of array structure and competitions between neurons, demonstrating significant performance improvement in memory capacity and noise tolerance compared with existing supervised approaches. By extending to continuous attractor neural networks (CANNs), working memory is realized based on memristors for the first time via simulation, and the write and read noises in memristor arrays are found to have different impacts on the ability of CANN in maintaining dynamic information. This work lays a foundation for the construction of future advanced neuromorphic computing systems.


Introduction
Neuromorphic computing, a novel concept proposed in 1990s, suggests that future computer system can imitate the operation principles of human brains by processing analog signals in parallel directly on the physical level. [1] It is a promising way to move beyond the deterministic von Neumann model of computation due to its three significant advantages-high parallelism, low power consumption, and in-memory computing [2] -and can be included in a Shannon-inspired statistical computation model. [3] Guided by information theory, it is possible to explore its design principles of circuits and architectures that approach the limits of energy efficiency, latency, and accuracy. This is exactly an inspiration to establish a complete set of new mathematical tools to analyze and estimate future neuromorphic computing systems.
A memristor, proposed in 1971 [4] and experimentally established in 2008, [5] is a resistive device as an optimized future neuromorphic device to this kind of nonvon Neumann computing. Memristor can change its resistance according to the interior state and the exterior stimulation, such as voltage pulse. Previous studies have shown that a crossbar structure based on memristors can accelerate various artificial neural networks (ANNs) by directly mapping vector-matrix multiplication (VMM), a most intensive computing component, to electrical parameters relying on Ohm's law and Kirchhoff's law. [6,7] Under this principle, the VMM computing process directly happened in situ, thus avoiding the memory wall (von Neumann Bottleneck) caused by fetching data from memory. Especially in supervised learning, it can reduce the computational complexity of feed forward process and backpropagation from NP to P. [8] Therefore, current studies mostly focus on classification and regression tasks to make use of this new computing mechanism as a complement to complementary metal oxide semiconductor (CMOS) circuits. However, different physical mechanisms of memristors, such as conducting filament formation/dissolution and phase change, decide that there are device imperfections that require further optimizations. [9,10] Therefore, many theoretical models have been set up to analyze the influence of device imperfect characteristics, [11][12][13] such as number of weight states, nonlinearity, asymmetry, variation, To approach an advanced neuromorphic system, a significant unsettled problem is how to realize biologically plausible memory structures that are dramatically different from classical computers. Herein, a physical system based on memristors is simulated to realize associative memory based on discrete attractor networks, which is essentially content-based storage, and the influence of device characteristics on network performance is systematically studied. An in situ unsupervised learning method is applied to make greater use of array structure and competitions between neurons, demonstrating significant performance improvement in memory capacity and noise tolerance compared with existing supervised approaches. By extending to continuous attractor neural networks (CANNs), working memory is realized based on memristors for the first time via simulation, and the write and read noises in memristor arrays are found to have different impacts on the ability of CANN in maintaining dynamic information. This work lays a foundation for the construction of future advanced neuromorphic computing systems. and so on, mostly in supervised learning. It is therefore an important question whether the requirements of devices in unsupervised learning are different from supervised learning, when using other training methods rather than backpropagation.
Furthermore, ANNs based on memristors are dedicated accelerators targeted for specific computing applications. It is actually more challenging and important to build a universal neuromorphic computer. In this case, not only do we need to accelerate feedforward neural network to do classification, but also need to build brain-inspired memory, which may generate totally new computing architectures. There are a variety of memories in our brain, and previous studies have realized long-term memory, short-term memory, and mutual transformation of them on single device level. [14] An interesting thing is, as for memory, classical digital computers use address-based storage, while our brain uses content-based storage. From the perspective of computational neuroscience, biological neural networks always have an effective energy function. Through recurrent connections, the activity of the network will transfer from one initial state to a locally lowest point of energy through time, which is called an attractor. Biological associative memory is assumed to be stored in this abstract state. There are many memory patterns stored in one neural network, which depends on the weights of synapses.
If the attractors are discrete, an initial state will fall into the nearest attractor. This is the model of associative memory, which has been performed on memristor crossbar earlier. [15] However, these previous studies used offline learning called Hebbian rule, which means the weights of synapses had been calculated in software and simply mapped to the memristor crossbar. The performance of Hebbian rule is poor and unable to support large-scale attractor networks. Other algorithms were also applied to improve the performance, [16] but the learning process is still offline.
By bringing in translationally invariant bell-shaped connectivity pattern, the network attractors can form a plane marvelously. This is called a continuous attractor neural network (CANN), which has received broad attention from both theoreticians and experimentalists. [17] From its advent, CANN has been successfully applied to theoretically describe the representation and processing of continuous features in neural systems, such as orientation, [18] head direction, [19] spatial location, [20] and so on. Recently, experimental discoveries about Drosophila central brain supported the existence of CANN in real biological neural systems. [21] When it comes to memory, it is believed that the brain can memorize the current state temporarily during dynamic assignments and use this information to do computation afterward. This is called working memory, which can be realized by a CANN naturally. In contrast, associative memory represents as long-term learning, while working memory is on behalf of dynamically processing temporary information for computing. Due to the biological plausibility and powerful computational capabilities of CANN, an early study has tried to implement a CANN using electrical circuits based on CMOS, [22] and recently CANN was also implemented for the tracking function of an unmanned bicycle. [23] In this study, an effective in situ online learning method named Oja rule is applied in associative memory based on unsupervised learning by introducing competition and cooperation between neurons, showing that this method can get at least 10 times performance improvement, which will greatly reduce chip area and enhance robustness of the hardware. The influence of nonideal device characteristics including weight precision, nonlinearity, asymmetry, device-to-device variation, and cycleto-cycle variation in unsupervised learning is systematically studied, and the results reveal that the weight precision has a more significant impact compared with nonlinearity. Furthermore, for the first time CANN is applied on neuromorphic devices to perform working memory based on offline training, and the impact of device noise on the ability of CANN in maintaining dynamic information is studied. This work will pave the way for producing brain-like memory for future neuromorphic computing systems.

Network Model
Discrete attractor neural networks, also known as Hopfield Neural Networks, are fully connected networks, where each neuron has connections with the other neurons but does not have self-connections ( Figure 1c). If the weight matrix is a symmetric matrix and the diagonal element is 0, there must be attractors in the network. By following Equation (1) and iterating over time, the network will finally converge to a certain pattern according to predefined weights of connections.
The energy function can be defined as E ¼ À 1 2 P P i6 ¼j W ij X i X j , which can represent the state of the network, and the energy function is always reduced or unchanged during network operation. When all states are tiled into a 2D plane, the energy function can form a surface (Figure 1b), where all the minima in the hollows act as attractors. The network state will converge to one of the attractors during evolution, and therefore the neurons of the network will have a determined mode eventually.
Therefore, by setting its weights, the network has a capacity to store data and can restore images from corrupted images. Here, we use 50 32 Â 32 images selected from CIFAR 100 [24] to test the capacity of the network (Figure 1a). These images were transformed into binary to clearly explain the problem, when black means 1 and white means À1. The size of the network is 1024 Â 1024.
As the most intensive part of computation is VMM, we can map this network into memristor crossbar and the external circuits only need to judge the output values and send the binary outputs back to inputs, as shown in Figure 1d. If the number of neurons is n, the required size of synaptic weights is n Â n. To represent negative weight and simplify the operation of external circuitry, we adopted a strategy that represents one synapse weight using a differential pair of memristors. It is noteworthy that the equivalent weights of diagonal line can always be set as zero because neurons do not have self-connections. If these www.advancedsciencenews.com www.advintellsyst.com weights are set as nonzero, it can be assumed as a self feedback. If they are set properly, the network would change into a chaotic neural network that can solve combinatorial optimization problems, [25] namely, to find the global minimum instead of nearest minimum. Recently, this has also been realized utilizing the random telegraph noise in memristive devices. [26] In this associative memory task to find nearest minimum, the input image is corrupted with noise and the output in simulation can perfectly get the prestored images. The methods of adding noise x is randomly choosing one pixel inversed for x times rather than randomly choosing x pixels inversed. Otherwise, when x equals the total number of pixels, the corrupt image just becomes inverse and it is not proper.

Oja Rule
To set the weights, we apply an advanced rule named Oja rule where W is the weight matrix and α is the learning rate. Different from the previous work using software calculated weights, this rule can train the network online in the crossbar and will therefore get significant performance improvement. The main principle of this rule is to use the competition between neurons, which is also used in locally competitive algorithm (LCA), [27] explained as Winner Takes All (WTA). This rule needs to use the weight matrix and transposition of weight matrix, as backpropagation algorithm in supervised learning also needs transposition of weight matrix. By exchange I/O port, we can use the transposition of weight matrix in an in situ way. The computing process is also shown in Figure 1d.
Oja rule is a training method, grasping the characteristics of principal components. It needs high weight precision, and will get more refined and functional weight structure in return. As shown in Figure 2a, when the number of stored patterns N > 5, the weight map of Hebbian rule has almost no change, always following a fixed frame. In contrast, the weight map by Oja rule has higher distribution evenness, thus owning more delicate weight distribution structures. As reflected in results (Figure 2b, where lines of different colors represent different noise levels), when the number of stored patterns is larger than 5, the recall accuracy of Hebbian rule drops abruptly, while the recall accuracy of Oja rule still remains high even if N ¼ 50. Evidently, the neuromorphic system can still recall the full pattern with high accuracy using Oja rule, even when 50 patterns are stored and only half of the original information is given. Therefore, the memory capacity of Oja rule is roughly 10 times better than Hebbian rule, and the neuromorphic memory system trained by Oja rule also has better robustness, as manifested by the much higher noise tolerance, demonstrating overall significant performance improvement.

Device Characteristics
The theoretical memristor device model in Neurosim [11] is adopted here for the evaluation of device characteristics. www.advancedsciencenews.com www.advintellsyst.com Here, we mainly discuss weight precision, nonlinearity, conductance variation, device-to-device variation, and cycle-tocycle variation. The detail process is shown in Figure 2c.

Network Model
CANN is a candidate for canonical models of neural information representation and processing. Intuitively, a CANN is endowed with a bunch of continuous attractors, which are neutrally stable (Figure 3a). Traditionally, the dynamics of a CANN describing head direction system can be described as [28] 8 > < > : where U(x,t) represents the total synaptic inputs of the neuron x ðx ∈ ½Àπ, πÞ at time t, r(x,t) is the instantaneous firing rate, τ is the time constant, ρ is the neuron density, and k is the strength of global inhibition. Jðx, x 0 Þ ¼ 1 2πa 2 Â exp½Àðx À x 0 Þ 2 =ð2a 2 Þ is the connectivity strength between neuron x and neuron x 0 , which is a translationally invariant function, ensuring the neutral stability of those continuous attractors. I ext ðx, tÞ ¼ 1 2πb 2 Â exp½Àðx À x 0 Þ 2 =ð2b 2 Þ is the Gaussian-shaped external input, where x 0 indicates the center of the stimulus, i.e., head direction of the animal. Note that in this model inhibitory neurons are not explicitly defined, while global inhibition is achieved through divisive normalization, which may be implemented by shunting inhibition. Under such settings, network response r(x,t) is also a bell-shaped function, and the representation of the head direction in the neural system is calculated through zðtÞ ¼ P π x¼Àπ rðx,tÞ P x 0 rðx 0 , tÞ x: Due to its neutral stability, the internal representation z(t) can smoothly track the rotation of head direction x 0 , while the internal representation still holds even in a dark environment (Figure 3b).

Working Memory
In this study, we explore the ability of a CANN to hold persistent activities triggered by a brief external stimulus. [29,30] Figure 3c shows the task of working memory based on continuous attractor neural networks in ideal situation, where the ordinate represents the normalized strength, and the abscissa represents the number of neurons (50 neurons from -π to π). First of all, when no noise exists in the system, Figure 3c shows that a CANN can perfectly hold persistent activities, i.e., the internal representation is anchored to the center of the external stimulus and no deviation is observed as time elapses (see Movie 1, Supporting Information). Specifically, when external stimulus is applied to the neurons, the population of neurons will respond to it and obtain the shape and central position of the stimulus; after the external stimulus is removed, the neuron population can still remember the previous shape and central position of stimulus ( Figure 3c). However, in electronic neuromorphic systems, there are three major problems in offline learning, namely, weight precision, write noise, and read noise. Regarding the weight precision, offline learning usually has much lower precision requirements than online learning. In addition, it is the network structure that ensures ability of CANN in working memory rather than the weight distribution. It is found that CANN www.advancedsciencenews.com www.advintellsyst.com can even work under any weight matrix of arbitrary precision, including 1-bit. When programming connection strengths of the network onto memristor crossbars, the weight matrix may have some nonideal factors even after multiple verifications, embodied in the difference between conductance values and target values (write noise, Figure 3d). In addition, the parasitic effects and thermal fluctuations during reading also affect the final result (read noise). A CANN based on memristor model is therefore set up to analyze the tolerance of neuromorphic computing to the aforementioned imperfections. The dynamic location of the network center after the external stimulus is withdrawn can therefore reflect the reliability of neuromorphic computing in this dynamic working memory.

Influence of Device Characteristics
The influence of device imperfections on the implementations of associative memory and working memory was systematically studied, as shown in Figure 4 and 5. As shown in Figure 4a, when the weight precision is lower than 6-bit, Oja rule cannot work properly. However, once the weight precision is higher than 6-bit, the attractor network has better performance than the network based on Hebbian rule, as shown in Figure 2b. Figure 4b further shows that the training based on Oja rule has greater endurance to nonlinearity (>2) than backpropagation because previous results by Chen et al. have shown a dramatic performance drop when the nonlinearity exceeds 1.5 using back propagation. [11] Figure 4c shows the real weight matrix distribution of memristor crossbar (with weight bit of 10, nonlinearity of 1.5, N ¼ 20), where one can see that nonideal effects of the devices increase the deviation of adjacent weights in the memristive crossbar. A universal model of working memory based on memristive neuromorphic systems has also been set up, and Figure 5 shows the simulation results. Through simulating weight matrix with different weight precisions, it is verified that CANN can be easily implemented by neuromorphic devices. As shown in Figure 5a, even simplest binary devices can realize CANN, just by mapping the conductance to þ1 and 0. The maximum tolerance of write noise in this assignment is 5%, as shown in Figure 5b, which can be realized by closed-loop verifications during programming or adding transistors to regulate the currents. The maximum tolerance of read noise in this assignment is 10% (Figure 5c). It is worth mentioning that the write noise and read noise have different impacts on network activities. The existence of write noise will reshape the response activity. Figure 5d shows that the network activity with write noise of 0.1 becomes sharper. When the write noise is stronger (e.g., 0.3), the single peak of neuronal activities will even split into multiple peaks (see Movie 2, Supporting Information). In contrast, the existence of read noise (with an amplitude of 0.1) will make the location of peak drift with the overall shape almost kept unchanged (Figure 5e). When the read noise is stronger (e.g., 0.3), the peak drifts faster within a larger range (see Movie 3, Supporting Information). It may explain why people could be distracted because the working memory is originally temporary memory for the next computing tasks and in this situation its peak is shifted all the time.  (5)) and the y-axis represents normalized value of external stimulus I ext (red curve) and network response r (blue curve). The external stimulus lasts from t ¼ 0 to t ¼ 50, the center of which represents a direction. The external stimulus induces network responses, the center of which represents the encoding of that direction by the neural network (t ¼ 5, t ¼ 45). After the external stimulus is withdrawn, the network still holds the dynamic memory (t ¼ 55, t ¼ 250). d) Conductance matrix mapped on the memristor array, without noise (left) and with noise (right).

Conclusion
Biologically inspired and plausible neural networks have always been the ultimate goal of neuromorphic computing. Previous studies have shown that supervised learning can be capable of classification and regression. In this work, a new unsupervised learning algorithm is adopted and implemented on associatiove memory. From the perspective of actual function and efficiency, associative memory is essentially a kind of content-based storage.
There are other methods based on memristors in realizing content-based storage, but these methods still need a searching algorithm with computational complexity N, while the iteration number of attractor network is only logN. As for working memory, it has been considered an effective way to approaching the edge of chaos's decision because it can dynamically accept input from outside and can maintain the information itself.
Here, associative memory and working memory are realized based on discrete and continuous attractor networks, respectively, and the influence of device characteristics on network performance is systematically studied. The in situ online training demonstrates significant performance improvement in memory capacity and noise tolerance. By cascading CANN with other computing units, such as reservoir computing, the total system can have better dynamic information processing capability because CANN can constrain external input into a specific form and be able to temporarily remember external stimulus in the  www.advancedsciencenews.com www.advintellsyst.com network level. Previous studies have shown the application of memristors in reservoir computing, [31] and suggested that combination of discrete attractor neural network with reservoir computing can have better performance, showcased by the generation of handwritten numbers. [32] Using attractors to constrain the complex dynamics in reservoir will help the computing system converge faster and avoid sustained oscillations. The realization of brain-inspired content-based storage may facilitate the development of real neuromorphic systems.

Supporting Information
Supporting Information is available from the Wiley Online Library or from the author.