SNNSim: Investigation and Optimization of Large‐Scale Analog Spiking Neural Networks Based on Flash Memory Devices

Spiking neural networks (SNNs) have emerged as a novel approach for reducing computational costs by mimicking the biologically plausible operations of neurons and synapses. In this article, large‐scale analog SNNs are investigated and optimized at the hardware‐level by using SNNSim, the novel simulator for SNNs that employ analog synaptic devices and integrate‐and‐fire (I&F) neuron circuits. SNNSim is a reconfigurable simulator that accurately and very quickly models the behavior of the user‐defined device characteristics and returns key metrics such as area, accuracy, latency, and power consumption as output. Notably, SNNSim exhibits exceptional efficiency, as it can process the entire 10 000 Modified National Institute of Standards and Technology (MNIST) test dataset in a few seconds, whereas SPICE simulations require hours to simulate a single MNIST test data. Using SNNSim, the conversion of artificial neural networks (ANNs) to SNNs is simulated and the performance of the large‐scale analog SNNs is optimized. The results enable the design of accurate, high‐speed, and low‐power operation of large‐scale SNNs. SNNSim code is now available at https://github.com/SMDLGITHUB/SNNSim.


Introduction
[4] However, conventional artificial neural networks (ANNs) implemented on central processing unit (CPU)/graphic processing unit (GPU)-based von Neumann architectures suffer from high energy consumption and latency, limiting their real-time performance.To overcome these problems, spiking neural networks (SNNs) that use event-driven spikes for data processing and mimic the behavior of biological neural networks such as human brains have been proposed.These bio-plausible networks are anticipated to offer a promising solution for implementing energy-efficient platforms compared to conventional ANNs.Adopting this perspective, TrueNorth [5] and Loihi, [6] which are silicon chips for processing SNNs with digital architecture, have been fabricated following this principle.Digital implementations of SNNs are often favored for their compatibility with existing digital systems, and their performance can be reliably estimated based on familiar digital frameworks, making them a popular choice. [7]While digital SNN systems are considerably more energy-efficient than conventional von Neumann architecture and offer predictability when compared to analog SNN systems, it is important to note that their reliance on synchronous processing based on an intrinsic digital architecture sets them apart from the human brain and can lead to different results in their speed and power consumption.10] However, it remains unclear how such a fully analog SNN would perform in terms of area, accuracy, latency, and power consumption when it is implemented in large-scale.Due to their ability to significantly reduce computational resources, analog SNNs present a promising option for serving as a substitutional processor in edge computing and on-device computing.Despite extensive research on exploring SNN systems at the software level, [11][12][13] a significant gap remains in understanding the hardware implementation of analog SNNs.Some SNN algorithms are challenging to implement on hardware for various reasons, such as their reliance on bias for network operation. [14]In contrast, research on single synaptic devices or neuron circuits typically focuses solely on accuracy without adequately addressing the impact of incorporating these components into larger systems in terms of area, latency, and power consumption.It is necessary to thoroughly investigate into the performance estimations of these systems reflecting hardware characteristics.
In this article, we investigate the performance of large-scale analog SNNs using a hardware-level simulator for SNNs with analog synaptic devices and I&F neuron circuits.To accurately investigate the large-scale analog SNNs at the hardware-level prior to their silicon implementation, the SPICE simulator is widely used, but it suffers a long simulation time.Therefore, a novel simulator is necessary to explore the performance of analog SNNs, ensuring both high accuracy and speed, similar to that of NeuroSim, [15] which is a popular simulator for compute-inmemory (CIM) chips processing ANNs.10] 2) We develop SNNSim that enables accurate and fast simulations of customized analog SNNs that reflect specifications of synaptic devices and neuron circuits.3) We analyze and optimize the performance of the analog SNNs through the SNNSim, including area overhead, latency and verify the ultra-energy-efficiency as well as high accuracy of bio-inspired analog SNNs.
This article is organized as follows: Section 2 presents the background on SNNs, including their operational schemes and training methods.Section 3 provides an overview of the hardware architecture of analog SNNs, which consists of analog synaptic devices, current mirrors, and analog I&F neuron circuits.Section 4 provides a proposition of the performance estimation model utilized by SNNSim, while Section 5 describes the validation for the accuracy of SNNSim by comparing its estimated system performance against that of SPICE simulation.Section 6 presents the results of SNNSim simulations on the hardware performance of multi-layer perceptron (MLP), and convolutional neural networks (CNNs) with their optimization guidelines.Finally, we conclude this article in Section 7.

Backgrounds on Spiking Neural Networks
Figure 1 shows the operational scheme of a biological synapseneuron model and an analog SNN implemented in hardware.The analog SNNs mimic the biological synapse-neuron model, in which data is processed via event-driven spikes in the networks.These spikes propagate to adjacent synapses to increase or decrease the membrane voltage (V mem ) of neurons in the subsequent layer.If the V mem surpasses the threshold voltage, the neuron generates a spike, and the V mem is reduced.In this work, we use I&F neurons.Thus, the leaky behavior is not considered.This process in I&F neurons can be expressed as follows: V l j ðt i Þ represents the V mem of the j th neuron in layer l at the time step t i , while N lÀ1 denotes the number of neurons in layer l À 1. C mem represents the capacitance of the membrane capacitor so that value of P N lÀ1 m¼1 S lÀ1 m ðt i ÞÃW lÀ1 mj is converted to V mem .S lÀ1 m ðt i Þ represents spike pulse generation from m th neuron in layer l À 1, and W lÀ1 mj represents the weight of synapses between m th neuron in layer l À 1 and j th neuron in layer l.This process is repeated until neurons in the output layers generate sufficient spikes to make decisions on the given task.In other words, we use the rate coding scheme that linearly encodes the input value as the frequency of the input spikes.The final decision is then referred by the index of the output layer neuron with the highest firing rate.

Synapse
To train large-scale SNNs following the above rules, spike timing dependent plasticity (STDP), [16] which is a conventional learning rule of SNNs, has limitations to be extended to multi-layer.Instead, we note that analog SNNs can achieve high accuracy comparable to software-based ANNs using an ANN-to-SNN conversion method. [17]The spiking behavior of I&F neurons following the rules of Equation ( 1)-( 3) is very similar to that of the rectified linear units (ReLU) activation function, which is commonly used in ANNs.In this regard, the weights first trained in ANNs with the ReLU activation function can be transferred to the weights in SNNs directly, and the SNNs can perform the inference task similar to the ANNs that exhibit state-of-the-art accuracy.In addition, the ANN-to-SNN conversion method does not require additional peripheral circuitries for training on-chip which are not utilized during inference.From this perspective, we adopt the ANN-to-SNN conversion method for analog SNN hardware.

Architecture of Analog Spiking Neural Networks Hardware
We propose a hardware architecture for analog SNNs using flash memory arrays and I&F neurons, as shown in Figure 2a. Figure 2a shows the main elements for the hardware implementation.The first element is the synapse array that performs multiply-accumulate (MAC) operations using Kirchhoff 's current law shown in Figure 2b.W lÀ1 mj in Equation ( 1) is represented by the conductance of each synaptic device.The conductance of each synaptic device can be fine-tuned by using the methodology depicted in Figure S1, Supporting Information.Since the variation of flash memory devices in the array is much smaller than the memory window, multi-level conductance in flash memory devices is reliably implemented.Voltage spikes represented by S lÀ1 m ðt i Þ in Equation ( 1) generate currents proportional to the conductance of the synaptic devices.Thus, P N lÀ1 m¼1 S lÀ1 m ðt i ÞÃW lÀ1 mj term in Equation ( 1) is calculated by the current sums along the bit-lines (BLs) of the synapse array.The second element is the current mirrors that transfer the current sum to the membrane capacitor of the neuron circuits.The current sum is converted to the V mem of 1 Finally, the third element is the I&F neuron circuits, which generate spikes transmitted to the next layer when the V mem exceeds the threshold voltage, as shown in Equation (2).By combining these elements in series, multi-layer SNNs can be implemented.
Figure 2c depicts the circuit diagram of the current mirror and I&F neuron circuit and an example of changes in V mem and output spikes.Here, we adopt a complementary metal-oxidesemiconductor (CMOS) I&F neuron circuit previously reported by our group. [8]The mechanism of the I&F neuron circuit is represented as follows.The current sum copied by the current mirror flows into the membrane capacitor of the neuron circuit, changing the gate voltage of the M 1 transistor.Until reaching the threshold voltage, NODE 1 maintains a high voltage by M 10 .Upon reaching the threshold voltage of the neuron circuit, NODE1 is pulled down by M 1 and the inverter consisting of M 5 and M 6 , switches the output of the neuron circuit to the high voltage state.M 3 and M 4 support M 1 to pull down NODE 1 to lower voltage so that M 5 pulls up the output node to the high voltage sufficiently when the neuron fires.With the output node pulled up to the high voltage, M 2 and the inverter comprising of M 7 and M 8 are triggered, causing NODE 2 to enter a low voltage state, and M 2 discharges the V mem simultaneously.M 9 then drives NODE 1 into the high voltage state, triggering the inverter, consisting of M 5 and M 6 , and causing the output of the neuron circuit to return to the low voltage state.Note that this cyclic process does not require synchronized clock or control signals, indicating that the event-driven analog SNNs can be implemented with the I&F neuron circuits. [18]

SNNSim
We introduce SNNSim, which estimates the area, accuracy, latency, and power consumption of analog SNNs for inference tasks, developed in Python.Unlike previous SNN simulators that estimate performance without considering hardware and only in the digital domain, SNNSim estimates performance in the analog-domain consisting of synapse arrays, current mirrors, and I&F neuron circuits.Figure 3 illustrates the performance estimation flowchart of SNNSim.In SNNSim, the estimation is performed by calculating currents of all synaptic devices and resulting updates in V mem to determine spike generations.This process is conducted layer-wise from the input to the output layer in each timestep until a neuron of the output layer fires a predetermined amount.The remaining section will discuss how SNNSim calculates area, accuracy, latency, and power consumption.

Area
In SNNSim, we estimate the area of the analog SNN by calculating the area of each component: synapse arrays, current mirrors, and analog I&F neuron circuits.The area of the synapse array can be calculated by multiplying the number of devices required for each synapse array by the area per synaptic device cell.In this article, we adopt flash memory cells for the synapse array, as mentioned in the previous section.Flash memory cells in the AND-type synapse array occupy 6F 2 per cell. [19]or the current mirrors and neuron circuits, the total area can be estimated by determining the layout of transistors and capacitors.The layout consists of multiple transistors and capacitors, which refers to the CMOS process design rule, which is the general rule that determines the size and spacing of the transistors.For example, we have depicted the mask layout of neuron circuits according to the CMOS process design rule, as shown in Figure S2, Supporting Information.The area of the capacitors can be calculated using the well-known capacitance equation.By dividing the dielectric constant of capacitor dielectric materials such as silicon dioxide or high-k dielectrics by the distance between the capacitor plates according to the process design, we can easily estimate the area of the capacitors for the given capacitance.By summing up the areas of all elements, SNNSim obtains the total area of the system.

Accuracy
SNNSim follows the flowchart in Figure 3 until any single neuron in the output layer generates a predetermined number of spikes, after which it returns the output based on the index of the neuron.In certain situations, there are two possibilities: either no neuron produces the required number of spikes, or multiple neurons simultaneously generate the predetermined number of spikes.When no neuron generates enough spikes, analog SNNs continuously wait for the specific number of spikes to be generated.In contrast, when multiple neurons produce the last spike together, analog SNNs cannot provide an accurate answer unless additional criteria are used to determine the correct response.To address these problems, SNNSim refers to the V mem of the neurons in the output layer.When SNNSim outputs multiple answers, it refers to the V mem and determines the answer according to the index of the output neuron storing the highest V mem .This solution is reasonable because the frequency of spikes is proportional to the degree of increase of the V mem . [20]

Latency
The latency of the system is the time required for the system to perform the inference, i.e., the time required for an output neuron to generate the predetermined number of spikes.The latency of the analog SNNs is calculated as the average timesteps multiplied by the predefined timestep unit.Although the analog SNNs are event-driven, the minumum period of the input spike trains serves as the timestep.For instance, if the simulation takes eight timesteps for the inference, and each timestep is defined as 1 μs, the latency is 8 μs.

Power Consumption
SNNSim estimates the power consumption of analog SNNs by summing that of synapse arrays, current mirrors, and I&F neuron circuits.We can obtain power consumption by dividing energy consumption by latency.Energy consumption is caused by the current flowing into the circuits from the voltage source.By integrating the current and multiplying the result by the supply voltage, we can calculate the energy consumption of the electronic circuits.Synapse arrays generate currents when voltage pulses are applied to their WLs.The currents are provided by the voltage sources connected to current mirrors, which generate the copied currents.The integration of the currents can be processed by multiplying the currents according to the conductance of synaptic devices by the applied voltage pulse width.In addition, by considering the scale of the copied currents generated by the current mirrors, SNNSim obtains the energy consumption of the synapse arrays and current mirrors.
I&F neuron circuits consume energy caused by their dynamic switching (dynamic energy) and leakage (static energy).Dynamic energy is mainly considered in the circuits, which operate with fast voltage switching.In circuits with fast voltage switching, the amount of currents flowing into the circuits can be calculated by multiplying the node capacitance by the voltage amplitude.
[23][24][25] In the circuit diagram illustrated in Figure 2c, M 1 and M 10 generate a large leakage current of analog SNNs when the V mem rises.By modeling the leakage current depending on the V mem , SNNSim can estimate the energy consumption by the leakage current.

Validation
In the subsequent section, we aim to verify the performance estimation of our developed SNNSim.Building upon our preceding research, [8] we have fabricated single-layer analog SNNs with CMOS I&F neuron circuits integrated with a charge-trapping flash array.We initially fit the electrical characteristics of the fabricated devices with the existing metal oxide semiconductor field effect transistor (MOSFET) model in the SPICE simulation.Then, we predict and compare the spike behaviors of the neurons in the SPICE simulation with the measured spike behaviors in the fabricated analog SNN.Thereafter, we upscale the network size in the SPICE simulation to compare the estimated performance with that of SNNSim.This process validates that SNNSim accurately estimates the performance of large-scale analog SNNs based on the SPICE simulation that covers the hardware-implemented analog SNNs.The details of this process are discussed in the following subsections.

Modeling of Fabricated Analog SNN AND-Type Flash Array to SNNSim
The electrical properties of the fabricated devices can be fitted by tuning the parameters of the BSIM-SOI model in the SPICE simulation, such as mobility, threshold voltage, subthreshold swing, mobility degradation, oxide thickness, and doping concentration.The fitting results of the devices are represented in Figure S3, Supporting Information.As shown in Figure S3, Supporting Information, the electrical properties of the fabricated devices, including n-channel MOSFET, p-channel MOSFET, and flash memory devices, are successfully fitted by the BSIM-SOI model in the SPICE simulation.Based on the fitting results, we design and simulate the analog SNN in the SPICE simulation with identical array architecture and neuron circuits to the fabricated SNN, which consists of a 25 Â 4 AND-type flash array connected to four I&F neuron circuits.Figure 4a illustrates the spike behaviors of neurons exhibited in the fabricated analog SNN and the SPICE simulation.This figure indicates that the simulated spike behaviors, including firing timing and spike frequency, accurately match the behaviors of fabricated neurons.The comparison of the spike frequencies measured in the fabricated neurons and emulated in the SPICE simulation is represented in Figure 4b.The transient spike behaviors of the data represented in Figure 4b are shown in Figure S4, Supporting Information.
We also investigate the spike behaviors in the SNNSim, as shown in Figure 4a.The firing timings of the spikes in the SNNSim match those in not only the fabricated analog SNN but also the simulated analog SNN in the SPICE simulation.Figure 4c shows the spike rate in the SPICE simulation versus the spike rate in the SNNSim, indicating their rates almost match each other under various conditions of pulse and membrane capacitance.Although the SNNSim cannot mimic the spike shape precisely because the spikes are binarized in Python for simplicity, the SNNSim can accurately predict the spike timing in single-layer analog SNNs.Note that the latency and accuracy of analog SNNs are significantly determined by the transient spike behaviors of I&F neurons.

SNNSim Validation in Multi-Layer Perceptron
In addition to the single-layer analog SNN, we validate the estimation of the SNNSim in a multi-layer neural network by comparing it with the SPICE simulation.First, we train the network, in which the size is 324-40-10, using the ReLU activation function with ternary weights. [26]Then, we design the network in both SPICE simulation and SNNSim with the I&F neurons and the trained weights.Subsequently, we compare the latency and power consumption estimated by the SPICE simulation with those by SNNSim.Figure 4d,e show the comparison of the latency and power consumption, respectively, estimated in the SPICE simulation and SNNSim.The latency and power consumption are estimated using ten digit images in Modified National Institute of Standards and Technology (MNIST) test data set.According to the matched estimations in Figure 4d,e, the validation of SNNSim can be extended to multi-layer perceptron (MLP), in which a large number of neurons and synaptic devices correlate with each other.Figure 4f compares classified digits from the SPICE simulation and SNNSim.

Optimization of Analog SNN in MLP
In this section, we employ the validated SNNSim to optimize analog SNNs in various aspects such as area, accuracy, latency, power consumption, and conductance variation tolerance within a two-layer fully connected network structure, also known as an MLP, for MNIST classification task.The weights of this two-layer SNN are derived from an ANN through an ANN-to-SNN conversion.We optimize analog SNNs under various conditions, including membrane capacitance (C mem ), the number of hidden neurons, and technology nodes.A 0.5 μm CMOS technology node (the same node as our fabricated analog SNNs) as well as the advanced technology nodes are investigated for the analog SNN designs.Figure 5a presents the network structure of the MLP, where the numbers of neurons in the input and output layers are fixed, while the number of hidden neurons (H) is variable.Figure 5b-e show the optimization results of analog SNNs with the MLP structure, including area, accuracy, latency, and power consumption, as a parameter of C mem and the H.The total area of analog SNNs exponentially increases as the H increases.In addition, a slight area increase is observed with the increase in C mem , but in 0.5 μm technology, other components, such as the synapse arrays and I&F neurons, are more significant.From the accuracy perspective, it is observed that the accuracy increases close to the baseline accuracy (ANN-MLP at an H of 512) at an H of 128.As the C mem increases, the accuracy enhances in analog SNN, however, the effects of the C mem on the accuracy are not critical at a H ≥ 128.Note that the latency of analog SNNs is significantly affected by the C mem because the firing rate of I&F neurons is determined by the C mem in (2), as shown in Figure 5d.If the spikes are more fired in hidden neurons, more spikes can be propagated to the deeper layer, accelerating the output neurons to fire and reducing the latency.The power consumption of analog SNNs is shown in Figure 5e.The power consumption increases with increasing H, while it is slightly affected by the C mem (Figure S5, Supporting Information).Additionally, Figure 5e highlights the superior energy efficiency of analog SNNs in contrast to other studies focused on CIM.[29] This optimized C mem varies according to the input voltage pulse width.When the voltage pulse width increases, the optimized membrane capacitance should also increase proportionally.This is due to the fact that an increased voltage pulse induces a greater quantity of synaptic current within a single period.If the membrane capacitance were to remain constant under these conditions, the membrane voltage change would double, potentially resulting in information loss.Therefore, to preserve the membrane voltage change as the optimized conditions, the membrane capacitance must also be doubled.In other words, if the membrane capacitance has been optimized for a given voltage pulse width, the product of the voltage pulse width and the membrane capacitance should remain a constant value.
Figure 5f depicts the accuracy degradation according to conductance variation.The conductance transferred from the weights trained in ANNs deviate from their nominal values and exhibit variation in flash memory devices.Thus, it is significant for the accuracy of analog SNNs to be tolerant against conductance variation.We simulate the tolerance of analog SNNs in SNNSim with C mem of 0.4 pF and various Hs.The accuracy is more tolerant to conductance variation with an increase of H.In addition to the conductance variation, the various configurations such as on-off ratio of the synaptic devices and spike loss rate can be also considered for accuracy, as shown in Figure S6, Supporting Information.We also investigate the scalability of analog SNNs in Figure 5g.Using the Predictive Technology Model (PTM) library, [30] SNNSim estimates the area and power consumption in 45 and 22 nm technology nodes.The validation of SNNSim with SPICE simulations using the PTM library is presented in Figure S7, Supporting Information.Theoretically, the transistor area is reduced by a squared factor as the technology node is scaled.However, the capacitor area is not scaled by the same factor, which leads to a large area overhead.Therefore, it is important to decrease the size of the membrane capacitors as the technology node shrinks.For instance, capacitors with a capacitance of 0.1 pF, using SiO 2 as dielectric, necessitate a footprint of 30 μm 2 , which is considerably substantial.To address this problem, the requisite capacitance could be decreased from 0.1 pF to a lower value by either reducing the on-current of synaptic devices or by reducing the width of the current-copied side of the current mirrors.The substitution of SiO 2 with high-k dielectric materials, coupled with the three-dimensional fabrication of capacitors, also offers a viable solution to reduce the area demand by capacitors.Figure 5h represents benchmarking results for various analog SNNs.Trade-offs exist in optimizing area, accuracy, latency, and power consumption.Reducing H may be a solution to decrease the area of analog SNNs, but it degrades the accuracy and latency of analog SNNs.Also, to decrease the latency, the C mem should be smaller.However, this reduces the accuracy of analog SNNs.This trade-off must be considered based on the defined objectives for analog SNNs.Depending on the goal of analog SNNs and design considerations, C mem and H need to be determined depending on the trade-off between area, accuracy, latency, and power consumption.

Optimization of Analog Spiking CNN
We conduct optimization of analog SNNs using SNNSim on deeper and larger networks, which classify CIFAR-10 dataset.It is assumed that the implementation of analog SNNs. is conducted with a 22 nm technology node.The networks are trained using the ANN-to-SNN conversion method similar to the MLP cases and their performance is evaluated under various C mem and network sizes.We conduct the simulation for two different networks depicted in Figure 6a.The first network consists of six layers in total: four convolutional layers with 3 Â 3 filters and two fully connected layers.The second network is a variant of the first network in which the second and fourth convolutional layers of are replaced with convolutional layers of 1 Â 1 filters.In summary, we construct two SNN networks.Through ANN-to-SNN conversion, the first adopts the VGG architecture, [31] and the second adopts MobileNet. [32]We refer to the former structure as VGGSNN and the latter structure as MobileSNN.MobileSNN conserves resources by reducing the number of weight parameters and operations.Implementation the 1 Â 1 filters in analog SNNs allows for an approximate reduction of one-third in the total number of required synaptic devices.Figure 6b-e show the optimization results of analog SNNs with VGGSNN and MobileSNN, including area, accuracy, latency, and power consumption, as a function of C mem .In the 0.5 μm CMOS technology, the impact on area overhead due to capacitor size proves to be negligible, as shown in the previous subsection.The transistor size shrinks proportionally to the square of the feature size.In contrast, the capacitor size shrinks proportionally to the thickness of the gate oxide.Since scaling of the thickness of the gate oxide has limitations compared to the feature size, the area occupied by the capacitors becomes dominant.As depicted in Figure 6c, enlarging the C mem is significant for attaining high accuracy.Thus, a careful examination of the area and accuracy trade-off is required when determining the capacitor size in 22 nm CMOS technology.A reduction of the capacitor size leads to an increased firing rate across all neurons, causing information loss across the layers, which diminishes the accuracy.Figure 6d depicts the relationship between capacitor size and latency.Similar to the trend observed in MLP, a larger capacitor size correlates with a reduced frequency of spike occurrence in neurons, thus leading to increased latency.This latency effect is further amplified in networks with more layers as the effect is accumulated through the layers.Figure 6e illustrates the power consumption depending on membrane capacitance.When C mem is small, spikes are applied to synaptic devicecs frequently across the layers, leading to increased power consumption of the synaptic array.As the C mem increases, spike fires sparse, and the current flow in the later layer decreases, resulting in power consumption by synapse arrays decreasing.The power consumption by I&F neurons exhibits a stable behavior in relation to C mem .In the cases of MobileSNN, I&F neurons consume more power than that of VGGSNN.This is induced by different patterns of voltage variation in the membrane capacitor, leading to a larger leakage current flowing into I&F neurons than VGGSNN.
This section illustrates the implementation of large-scale analog SNNs, using flash memory devices and I&F neurons while preserving high accuracy and ultra-energy efficiency.The application of CNN structures to analog SNNs using flash memory arrays and I&F neurons suggests certain trade-offs.For minimizing area and latency, it appears more beneficial to employ 1 Â 1 filters.However, it is better to adopt 3 Â 3 filters and larger C mem to achieve higher accuracy, sacrificing area overhead and latency.

Conclusion
In this work, we have introduced SNNSim, a tool designed to explore the performance metrics of large-scale, hardware-level analog SNNs using flash memory arrays and I&F neuron circuits.These metrics include area, accuracy, latency, and power consumption.Notably, SNNSim exhibits exceptional efficiency, as it can process the entire 10 000 MNIST test dataset in just a few seconds, whereas SPICE simulations require hours to simulate a single MNIST test data.The comparison of SNNSim with the previously developed simulators is proposed in Table 1.Our findings indicate that analog SNNs, which consist of flash memory arrays and I&F neurons, deliver highly energy-efficient performance, maintaining comparable accuracy to that of ANNs.Historically, research on SNNs has primarily focused on individual components such as synaptic memory devices, neuron circuits, and network structures.Our work bridges these components by demonstrating the competitive edge of analog Table 1.Comparison of SNNSim with the previous simulators.SpikeSim [7] NeuroSim [12] RxNN [34]   SNNs in neural network operation.We also highlight the potential for deep analog SNNs for edge device computing.Moreover, our developed tool, SNNSim, can serve as a valuable resource for performance benchmarking or for verifying the performance of other analog SNNs prior to their silicon implementation.Other kinds of SNNs using different encoding schemes, such as temporal, phase, and burst coding, will be further investigated with algorithm-hardware co-optimization. [33]This will provide a wider view of the performance of analog SNNs under different conditions.

Figure 1 .
Figure 1.Operational scheme of the biological synapse-neuron model and the analog SNN architecture to emulate it.By connecting neuron circuits to synapse device arrays and synapse device arrays to neuron circuits in order, an analog SNN mimics the biological synapse-neuron model.

Figure 2 .
Figure 2. a) Schematics of hardware architecture for the implementation of SNN.b) Circuits diagram of AND-type flash synapse array, in which the source and BLs are parallel and gate (word) lines (WLs) are perpendicular to them.c) Circuit diagram of the current mirror and analog I&F neuron with their operating timing diagrams.Voltage spike inputs are applied to synapse arrays, while the resulting output is represented as current sums which are accumulated in I&F neuron membrane capacitor C mem .

-Figure 3 .
Figure 3. Flowchart showing the overall operation of SNNSim.SNNSim simulates SNNs to calculate energy, accuracy, and latency with repeated time steps.Accuracy and latency are obtained by referring to the index of the first firing neuron of the output later and its generation time.

Figure 4 .
Figure 4. Verification of SNNSim.a) Spike behaviors of the fabricated analog SNNs, SPICE simulation, and SNNSim with varying input spikes.b) Correlation between the spike frequencies derived from SPICE simulations and those observed in the measured results.c) Relationship between spike frequencies in SNNSim and those in the SPICE simulations.d-f ) Predicted latency, power consumption, and output comparison between SPICE simulations and SNNSim within the upscaled network.

Figure 5 .
Figure 5. Optimization of analog SNN.a) Structure of MLP, which is converted to analog SNN.b-e) Area, accuracy, latency, and power consumption of analog SNN depending on membrane capacitance (C mem ) and the number of hidden neurons (H).f ) Accuracy degradation of analog SNN versus conductance change in synaptic devices as a parameter of H. g) Area of analog SNN according to the technology node scaling.h) Benchmarking results of analog SNNs with various H and C mem .The benchmarking results are displayed based on the rank of the analog SNNs.

Figure 6 .
Figure 6.Optimization of deeper analog SNNs.a) shows the structure of CNNs, which are converted to analog SNNs.b-e) Show area, accuracy, latency, and power consumption of deeper analog SNNs as a function of membrane capacitance.