In‐Memory Binary Vector–Matrix Multiplication Based on Complementary Resistive Switches

This work studies a computation in‐memory concept for binary multiply‐accumulate operations based on complementary resistive switches (CRS). By exploiting the in‐memory boolean exclusive OR (XOR) operation of single CRS devices, the Hamming Distance (HD) can be calculated if the center electrodes of multiple CRS cells are connected. This HD is linearly encoded in the voltage drop of the common electrode, and from it the result of a binary multiply‐accumulate operation can be calculated. A small‐scale demonstration is experimentally realized and the feasibility of the in‐memory computation concept is confirmed. A simulation study identifies the low resistance state (LRS) variability as the main reason for the variations in the output voltage. The application as a potential hardware accelerator for the inference step of binary neural networks is investigated. Therefore, a 1‐layer fully connected neural network is trained on a binarized version of the MNIST data set and the inference step of the test data set is simulated. The concept achieves a prediction accuracy of approximately 86%.

DOI: 10.1002/aisy.202000134 This work studies a computation in-memory concept for binary multiplyaccumulate operations based on complementary resistive switches (CRS). By exploiting the in-memory boolean exclusive OR (XOR) operation of single CRS devices, the Hamming Distance (HD) can be calculated if the center electrodes of multiple CRS cells are connected. This HD is linearly encoded in the voltage drop of the common electrode, and from it the result of a binary multiply-accumulate operation can be calculated. A small-scale demonstration is experimentally realized and the feasibility of the in-memory computation concept is confirmed. A simulation study identifies the low resistance state (LRS) variability as the main reason for the variations in the output voltage. The application as a potential hardware accelerator for the inference step of binary neural networks is investigated. Therefore, a 1-layer fully connected neural network is trained on a binarized version of the MNIST data set and the inference step of the test data set is simulated. The concept achieves a prediction accuracy of approximately 86%.
nonlinear dependence between the computational result and the voltage drop, which may increase the overhead for the readout circuitry.
The concept explored in this work also uses the voltage divider effect to encode the result of the binary vector-matrix multiplication, but still shows a linear dependence of the output voltage on the computational result. The slope of this linear encoding only depends on the resistance ratio between the HRS and LRS. Thus, ReRAM cells with high LRS states can be used, paving the way for low power applications. With a resistance ratio of approximately 100, nearly the full read voltage is used for the encoding, which helps to separate the computation results from each other. These properties make this concept a promising alternative as an accelerator for binary vector-matrix multiplications and possibly simplifies the design of the peripheral circuitry.

Concept
ANNs are inspired by biological neural networks, wherein the neurons are connected by synaptic weights. These weights are adjusted during the training procedure of the network to better predict the underlying data used for the training. During the inference step, when a signal propagates through the ANN, the inputs of each neuron are multiplied by the corresponding weights and the results are summed up. This operation is called multiply-accumulate operation and has a large contribution to the energy consumption of ANNs. [19] A binary multiply-accumulate (bMAC) operation of two binary vectors x and y (with x ¼ (x 1 , x 2 , : : : , x n ) and y¼(y 1 , y 2 , : : : , y n ), where x i , y i ∈ ½1, À 1) can be computed exploiting boolean logic. To this end, each entry has to be transformed into a boolean value, e.g., x i,new ¼ ðx i,old þ 1Þ=2. Then, a bitwise exclusive OR (XOR) comparison is performed between these two vectors and the resulting vector is accumulated (summation of the "1"-bits). This accumulation calculates the Hamming Distance (HD), which describes how many digits of two binary words are different. The HD needs to be retransformed to receive the same result as in the original bMAC operation by bMAC ¼ n À 2 Â HD, with n being the length of the compared vectors.
Complementary resistive switch (CRS) cells were introduced by Linn et al. to resolve the sneak path problem in passive crossbar arrays. [20] A CRS cell consists of two antiserially connected ReRAM devices with a complementary encoding where one of them is always in the HRS. Therefore, the total resistance of the CRS cell is always in the HRS state which prevents sneak paths during the destructive readout. [20] It was also shown that CRS cells can perform many logic operations by applying certain voltage patterns. [21,22] One promising logic operation is the CRS-based XOR operation enabled by measuring the voltage drop across the center electrode. This logic operation also functions without switching the device state. With a single CRS cell, which is shown in Figure 1a,b in a vertical and horizontal configuration, this operation can be achieved by using the binary encoding (b i , b s ) which is specified in Table 1a,b. The voltage divider formed by the elements of the CRS cell only leads to a relevant voltage output at the shared electrode if the corresponding XOR operation results in a "1." This XOR operation is also shown in Table 1c.
CRS cells can also be configured in a passive crossbar structure, as shown in Figure 1c. In this configuration, the center electrodes of parallel CRS cells intentionally share one electrode. This disables the intrinsic sneak path prevention of the CRS cell, but does not influence the parallel read operation in the crossbar array. Nevertheless, for programming each cell a selective device is necessary. Typically transistors are used for that and as long as the on-resistance of the transistor is a fraction of the LRS, the influence of it during the read process can be neglected. For the ongoing discussion, a slice of a passive crossbar array (highlighted cells of Figure 1c) will be analyzed. A circuit diagram of such a line of the array is shown in Figure 1d.
With the help of the earlier introduced XOR operation, Figure 1b can be rearranged to Figure 1e. The resistances R a and R b can hen be specified based on the result of the XOR operation of the stored and input bit by This rearrangement facilitates creating the equivalent circuit for a line of CRS cells with a common electrode for arbitrary Figure 1. Circuit diagrams. a) Circuit schematic of a CRS cell as developed by Linn et al. [20] The voltage drop at the center electrode can be monitored. b) Lateral configuration of a CRS cell. c) 3D illustration of a lateral CRS-based passive crossbar array. The highlighted part corresponds to the circuit diagram of (d). d) Circuit schematic of multiple lateral CRS cells in parallel with a common center electrode. e) Equivalent circuit for one lateral CRS cell with an arbitrary binary input (b i ) applied. Using the encoding in Table 1a,b, the resistances R a and R b can be calculated based on the stored pattern (b s ) and the result of XOR(b i , b s ). f ) Equivalent circuit for an arbitrary input and stored pattern of a shared electrode of multiple CRS cells in a line array. The number of CRS cells connected by the shared electrode is defined by n.
www.advancedsciencenews.com www.advintellsyst.com input and stored patterns. This equivalent circuit is shown in Figure 1f, and with it an analytical expression for the voltage drop at the common electrode can be derived as follows A similar equation can be calculated for the XNOR operation which was done by Chowdhury et al. in their system analysis of an equivalent concept. [23] In our approach, the voltage divider effect is used to retrieve the HD from the corresponding voltage drop of the shared electrode, and from that the bMAC result can easily be computed if needed.

Hardware Realization
To confirm the linear relationship between the output voltage and the HD, a single row of lateral CRS cells, each based on two ReRAM cells, was fabricated. [24] The ReRAM devices are based on a platinum, tantalum oxide, tungsten stack capped by another platinum layer. A detailed description of the fabrication process can be found in Section 6. Images of the fabricated sample are shown in Figure 2. Figure 2a shows the sample (background) connected with a probe card (foreground) to the measurement setup. Figure 2b shows a lateral CRS cell (green) and the shared electrode (blue). A scanning electron microscope (SEM) image of a fabricated ReRAM device is shown in Figure 2c. The green-colored part corresponds to the top electrode (bitline), whereas the blue-colored part again shows the shared electrode (wordline). For the measurement, 14 ReRAM cells were combined into 7 lateral CRS cells and an exemplary "1111111" pattern was stored in these devices.
The following procedure was used to program the corresponding resistance states. In the first series of measurements, the LRS (HRS) of the corresponding device was programmed by a positive (negative) triangular voltage sweep with a maximum (minimum) voltage of 3.0 V (-1.5 V) applied to the bitline and a sweep rate of 500 V s À1 . The wordline was connected to a virtual ground and the current was measured. For the transition to the LRS, a series resistance of 10 kΩ was included to limit the current. For the LRS (HRS), a programmed resistance of 2.5 kΩ (90 kΩ) was targeted around which the measured resistances are fluctuating due to the intrinsic resistance variability. The resistance of each device was measured with a read voltage of 0.3 V applied to the bitline Table 1. XOR encoding for a single CRS cell and constant simulation parameters.  The center image depicts a lateral CRS cell (green) to which an input bit is applied. The output voltage is measured at the shared electrode (blue). c) The SEM image on the right shows one resistive switch with an area of 200 nm Â 200 nm. The green-colored part is the top electrode and the blue-colored part is the shared electrode. d) Resistance value of each resistive switch read at a voltage of 0.3 V applied to the top electrode. Two neighboring cells are combined to one lateral CRS cell to store an exemplary "1111111" pattern. The red dots represent the resistance states with higher intrinsic variability and the blue dots represent the resistance states with the manually reduced variability. e) Voltage drop at the shared electrode for each input pattern from "0000000" to "1111111" on the y-axis. On the x-axis, the HD between the input and the stored pattern is shown. This measurement reveals that the variability from the resistance states has a significant influence on the variability of the output voltage.
www.advancedsciencenews.com www.advintellsyst.com while the wordline was grounded again. The measured results are visualized by the red dots of Figure 2d. In the second series of measurements, the same sweep rate was used and the maximum and minimum voltage was manually adjusted to reduce the deviation from the target resistance. The resistances were again read with a voltage of 0.3 V applied to the bitline while grounding the wordline and are represented by the blue dots in Figure 2d. For the computation of the HD, a pair of two bitlines is used to encode one bit of the input pattern. All 128 possible input patterns from "0000000" to "1111111" were encoded in voltages as shown in Table 1a and applied to the line array. The voltage drop on the worldline was measured and the results are shown in Figure 2e. The HD between the input pattern and the stored pattern is displayed on the x-axis. The voltage drop of the shared electrode of the CRS cells is shown on the y-axis. The voltage response of the stored resistance states with the higher variability is represented by the red dots and the voltage response of the resistance states with the lower variability is shown by the blue dots. The linear dependence between the HD and the output voltage is visible in Figure 2e. The measurements also show that the resistance variability has a significant influence on the variability of the output voltage. To better understand this influence, a simulation study will be discussed in the following section.

Modeling of ReRAM Cell Conduction in LRS and HRS
As the HRS of the ReRAM cells shows a nonlinear behavior with respect to the voltage, a single ohmic resistance cannot be used to describe the device behavior properly. Instead, measured I-V sweeps for each programmed resistance state in the voltage regime from -0.3 to 0.3 V in steps of 0.01 V reveal this nonlinear behavior with a slight asymmetry with respect to the voltage polarity (cf. Figure 3). To model this I-V characteristic, the conduction is described by assuming a tunneling process at the platinum interface and an ohmic resistance in the bulk of the oxide. The tunneling process is described by the so-called intermediate current-voltage relationship derived by Simmons [25] I ¼ eA Here, e is the electron charge, A is the device area, h is Planck's constant, V is the applied voltage, m is the effective electron mass, and ϕ is the tunneling barrier height. To fit the measured resistance states, the only parameter adjusted is the effective tunnel barrier thickness d. The constant simulation parameters are shown in Table 1d. The Simmons equation has been used previously to describe the electron transport in ReRAM devices. [26,27] The measured I-V sweeps for each cell in the LRS are shown in Figure 3a by the colored circles. The simulated sweep based on the described model is visualized by the lines using the same color coding. The LRS is well described by that model only by adjusting the effective tunnel barrier thickness d. The measured I-V sweeps for the HRS are shown in Figure 3b. Again the measurement is represented by the circles and the simulation by the accordingly colored lines. The HRS can be fairly well described by the model, and the deviations from the real nonlinearity and asymmetry are small. Using this conduction model and the fitted data, the measurement results can be simulated with the Spectre Simulation Platform of Cadence. In Spectre, seven lateral CRS cells are configured in the experimental "1111111" configuration based on the fitted model parameters and all possible input patterns are applied.  www.advancedsciencenews.com www.advintellsyst.com The obtained results are shown in Figure 3c where the lower variability data of Figure 2e are visualized by the blue circles, whereas the simulation data are represented by the red circles. The data points are slightly shifted apart from each other for a better visualization. This direct comparison of the measured and simulated circuit behavior also supports the agreement between the device properties and the utilized conduction model.

Modeling of the Resistance Variability
To better understand the origin of the variability in Figure 2e, a simulation study with changing variability contributions is performed. Two truncated normal distributions for the tunnel thickness d are assumed from which values are randomly taken for the simulation. The LRS is drawn from a truncated normal distribution with a mean value of d LRS ¼ 0.75 nm and a standard deviation of σ LRS ¼ 0.02 nm. The distribution is truncated after an interval of 3σ. The HRS is drawn from a distribution that has the same standard deviation, is equally truncated, and has a mean value of d HRS ¼ 1.2 nm. The utilized distributions for the tunnel thickness are shown in Figure 4a. To confirm that the assumed distributions correspond to reasonable resistance variations, the coefficient of variation (σ/μ) of the resulting LRS and HRS distributions is compared with the measured data by Sheng et al. [28] For this purpose, 100 000 values are randomly drawn from each tunnel barrier distribution and the model resistance at a read voltage of -0.11 V is calculated to match the experimental data. The simulated resistance distributions are summarized in two histograms which are shown in Figure 4b. The resulting LRS distribution has a mean value of 2.53 kΩ with a standard deviation of 206.8 kΩ and the HRS distribution has a mean value of 95.33 kΩ with a standard deviation of 18.51 kΩ. The coefficient of variation is 0.19 for the HRS distribution, which is similar to the one measured by Sheng et al. for that resistance range. [28] The coefficient of variation for the LRS distribution is 0.08, which is roughly one order of magnitude higher compared with the results of Sheng et al. This deviation is intentionally chosen to be higher to attribute for the missing transistor in our demonstrator. Having a transistor in series to the ReRAM cell enables a better control of the LRS, which would result in a lower coefficient of variation. [28,29] For the simulation study, 14 CRS devices are simulated with the Spectre Simulation Platform of Cadence. To this end, 14 tunnel barriers are drawn from each distribution and stored in a "11111111111111" configuration in the CRS cells.  www.advancedsciencenews.com www.advintellsyst.com In Figure 4c, each cell is shown with its corresponding tunnel barrier drawn from the distribution visualized by the blue circles.
To be able to vary the amount of variability in the simulation, each cell also has a constant tunnel barrier thickness assigned to it. This value is defined by the mean value of the corresponding distribution and is indicated by the red circles of Figure 4c.
In the first simulation, the influence of the variability on the output voltage is studied. The ideal case, which is represented by the red dots in Figure 4d, has no variability on the used tunnel thicknesses and follows the linear relation of Equation (1). In contrast, adding variability to the HRS and LRS leads to a dispersion of the output voltage around the ideal output voltage (blue dots).
To investigate which variability is mainly contributing to the output voltage, the next simulation enables each variability separately. The blue and red dots in Figure 4e correspond to only LRS and only HRS variability, respectively. As shown by the red dots, the HRS variability only has a minor influence on the output voltage. Assuming only LRS variability, in contrast, results in a higher variability of the output voltage. Thus, the main reason for the output variability is the variability of the LRS. This is a considerable result as it is the LRS variability which can be controlled significantly better. [28,29] 4. Potential as bNN Inference Accelerator

Simulation of the Inference Accuracy
To investigate the potential of the concept as a hardware accelerator for binary vector-matrix multiplications, the inference step of the MNIST data set was simulated with the Spectre Simulation Platform of Cadence.
As architecture, a 1-layer fully connected neural network with 784 input neurons and 10 output neurons was used. A representation of this network is shown in Figure 5a. Each output neuron is representing the prediction of the bNN for its corresponding number from 0 to 9. No activation functions are used because no hidden layers are implemented.
The bNN was trained in software on the MNIST data set for which an adapted version of the binarynet source code of the nn_playground project of the user DingKe was used. [30,31] For the training in software, full precision weights are used which are binarized for the inference step. During the training, the prediction results of each training batch are used to calculate an error function. This error is backpropagated using the adam optimizer and the straight through estimator as approximation for the gradient. With this process, the full precision weights are updated for each training batch. [9,32] The data set had to be preprocessed because it originally consists of grayscale images with 256 quantization levels. Therefore, each pixel was binarized by using the following equation The trained weights of each output are visualized in Figure 5b. Black pixels correspond to a weight of -1 and white pixels to a weight of 1. For a better visualization, the trained weights are reshaped into a 28 pixel Â 28 pixel image, which corresponds to the original shape of the MNIST data set. These 7840 weights are then transferred to the corresponding 7840 CRS cells by drawing tunnel barriers from the LRS and HRS distributions shown in Figure 4a. A weight of value "1" ("À1") is encoded according to a "1" ("0") as described in www.advancedsciencenews.com www.advintellsyst.com Table 1b. A block diagram of the simulated hardware implementation is shown in Figure 5c. In this case, line resistances are neglected to only show a proof of concept simulation. A more detailed discussion of these effects is included in the supporting information.
For estimating the accuracy of the accelerator, each image of the test data set is binarized, flattened, and transformed to boolean values "1" ("-1") to "1" ("0"). The resulting vector is encoded into a voltage pattern based on the encoding described in Table 1a and applied to the crossbar array. The voltage drops at each output line are compared with each other. The lowest voltage drop is used as the prediction of the hardware accelerator. In the simulation, the network can predict the correct handwritten number with an accuracy of %86%. This result is promising, as it is comparable with the 1-layer neural network with grayscale inputs and analog weights by LeCun et al., which achieved an accuracy of 88%. [30] The simulated hardware accelerator not only has to deal with binarization of the weights but also with the variability of the resistance states and therefore suffers from this accuracy loss. To understand where the hardware realization is mainly doing false predictions, a visualization of the confusion matrix, which compares the prediction result with the actual number, is shown in Figure 5d. This image conveys that the network mostly confuses the numbers 4 with 9.

Design Considerations
For deriving some design considerations, it is helpful to introduce the resistance ratio r ¼ R HRS =R LRS and rearrange Equation (1) to With Equation (3), the theoretical possible voltage window for a specific resistance ratio can be calculated by (r À 1)/(r þ 1). For filamentary ReRAM cells, realistic resistance ratios lie between 10 and 100 and will lead to a voltage window between 81% and 98% of the applied read voltage. [33] Increasing this resistance ratio further will always improve the voltage window but quickly slow down and stop having a significant influence on it.
Another consideration can be concluded from the fact that Equation (3) only depends on the resistance ratio and not on any resistance state itself. Thus, technologies with high LRS states and a reasonable resistance ratio benefit the most from this concept. An increased LRS lowers the current for each calculation and therefore makes the calculation more energy efficient. This is confirmed by calculating the worst-case current for one operation. The current through the crossbar array depends on the HD between the stored pattern and the input pattern and is the most if the HD ¼ n/2. With the help of the equivalent circuit in Figure 1f, the equation for the worst-case current can be derived to Apart from the energy considerations, the simulation study has shown that the concept is intrinsically resistant to HRS variability, so the main challenge is the control of the LRS variability. This variability can be well controlled in integrated circuits by using a 1T1R structure or introducing write-verify schemes to reprogram the resistance if it is outside specified boundaries. [28,34]

Conclusion
This work presented a computing in-memory concept based on a bitwise XOR operation for CRS cells. The center electrode of multiple CRS cells is connected to perform the accumulation of each cell's boolean logic operation. A demonstrator of this concept was fabricated and the measurement results were presented. The intrinsic variability of the programmed resistance states led to a significant variation of the output voltage. For understanding the underlying mechanism, a simulation study was performed to separate the LRS and HRS variability contribution. From this, the conclusion that the majority of the output voltage variability stems from the variations in the LRS could be derived. To show that real-world problems can be tackled by the studied concept, a 1-layer fully connected bNN was trained on the MNIST data set and the inference step was simulated. In this simulation, the hardware accelerator achieved an accuracy of around 86%.

Experimental Section
Fabrication: For a reduction in processing steps, the CRS cells were fabricated in a lateral configuration. A thermally oxidized Si piece (%430 nm silicon oxide) was covered with %5 nm titanium as adhesive and %30 nm platinum as bottom electrode by sputter deposition. Then, diluted AZ nLOF 2020 photoresist was spin-coated and patterned by electron beam lithography. Reactive ion beam etching was used to create the shared electrode of the parallel CRS cells. After a resist removal process, the whole sample was covered with a stack of %9 nm tantalum oxide, %16 nm tungsten, and %20 nm platinum by sputter deposition. Again, diluted AZ nLOF 2020 photoresist was spin-coated and patterned by electron beam lithography. The excess material was removed by reactive ion beam etching and the leftover mask was cleaned. With this process, lateral CRS cells with a common electrode down to a device size of 200 nm Â 200 nm could be realized.
Measurement Setup: The μController module platform and a 4 Â 32 switch matrix by aixACCT systems were the essential components that were used to apply binary encoded patterns to the CRS cells. A Picoscope 5444D MSO was used for measuring the voltage drop at the shared electrode (wordline) of the CRS cells. Two of the inputs of the switch matrix were connected to the voltage source of the μController module, where one of them included an ohmic series resistance of 10 kΩ. Another input was connected to the ground of the μController module. This port could also measure the current. The last input of the switch matrix was connected to a channel of the Picoscope to monitor the voltage drop across the shared electrode. The 32 outputs of the switch matrix were connected to a probe card that connected the measurement system to the sample.
Initial Measurements: For the initial measurements, the measurement signal was always applied to the top electrode (bitline) of the devices. The shared electrode (wordline) was connected to ground. After fabrication, a triangular forming voltage sweep up to 4 V and down to À1.8 V was applied. The sweep rate was set to 500 V s À1 and a series resistance of 10 kΩ was included only during the positive cycle to limit the current through the devices after the forming process. After that, the devices were switched 5 times between the HRS and LRS to establish a stable switching behavior. To this end, a triangular sweep with the same sweep rate up to 3 V and down to À1.5 V was applied. Again, a series resistance of 10 kΩ was only included during the transition to the LRS.