Reliable Domain‐Specific Exclusive Logic Gates Using Reconfigurable Sequential Logic Based on Antiparallel Bipolar Memristors

The development of memristor‐based stateful logic circuits can minimize data movement during the computing process to achieve in‐memory computing, mitigating the von Neumann bottleneck in the current computing architecture. Herein, a method to combine resistance–resistance (R–R) and voltage–resistance (V–R) logic gates to implement exclusive logic gates composed of APMR‐two‐2(XOR, IMP, RIMP) and APMR‐three‐4XOR, where APMR means antiparallel memristors with a series resistor, is suggested. The proposed gates can accelerate XOR logic operation in a single cycle and expand for the n‐bit input. The performance of the proposed logic gate is then demonstrated with a 1‐bit full adder–subtractor along with the comparison of an n‐bit ripple carry adder. It shows that the implementation for the n‐bit adder takes 4n+1 memristors within 2n+1 steps, which significantly improves the optimization in terms of space‐ and time‐related costs compared with other memristive logic gates. Subsequently, the improved adder circuit can be further utilized to implement an n‐bit multiplier. In addition, the evaluation of the device stress on the various logic gates confirms that the proposed logic gates are reliable.

DOI: 10.1002/aisy.202100267 The development of memristor-based stateful logic circuits can minimize data movement during the computing process to achieve in-memory computing, mitigating the von Neumann bottleneck in the current computing architecture. Herein, a method to combine resistance-resistance (R-R) and voltage-resistance (V-R) logic gates to implement exclusive logic gates composed of APMR-two-2(XOR, IMP, RIMP) and APMR-three-4XOR, where APMR means antiparallel memristors with a series resistor, is suggested. The proposed gates can accelerate XOR logic operation in a single cycle and expand for the n-bit input. The performance of the proposed logic gate is then demonstrated with a 1-bit full adder-subtractor along with the comparison of an n-bit ripple carry adder. It shows that the implementation for the n-bit adder takes 4nþ1 memristors within 2nþ1 steps, which significantly improves the optimization in terms of space-and time-related costs compared with other memristive logic gates. Subsequently, the improved adder circuit can be further utilized to implement an n-bit multiplier. In addition, the evaluation of the device stress on the various logic gates confirms that the proposed logic gates are reliable.
(GPU), [4] the trend can be conceived for the general-purpose IMC gate for several data-centric computations. The remaining question is what should be targeted in the IMC gate. The exclusive OR (XOR) logic operation is essential in arithmetic operations, parity checkers, controlled inverters, digital comparators, and cryptography. However, the implementation of the XOR operation is still challenging in the IMC approach.
For the past decade, many types of IMC logic gates have been reported since the first logic family of IMPLY gates was conceived by Lehtonen et al. in 2009 [15] and demonstrated by Borghetti et al. in 2010, [16] using two parallel BRS memristors with a series resistor. They can be divided into two family categories: V-R logic [17][18][19] and R-R logic. [20][21][22][23][24] The V-R logic family is also known as sequential logic, which exploits the voltage form of input and the resistance form of output. In contrast, R-R logic, that is, stateful logic, utilizes resistance for both input and output.
The logic operation in the R-R logic family is based on the voltage divider of a memristive structure of the primitive circuit. The different basic logic gates of Boolean logic operations can be defined depending on the circuit configuration. In the authors' previous review, PMR-two-2IMP, [16] PMR-two-3NAND, [21] PMASM-two-3NOR, [25] and APMR-two-2(IMP, AND, TF) [23] gates were introduced with the nomenclature rule and verified under sustainability, success, and stability constraints. [26] As the input and output types are the resistances of the memristor, the R-R logic gates have an advantage in the cascading operation, which means that the output of the previous logic gate transfers to the input of the following logic gate. Therefore, the XOR logic function or any other combined gating can be implemented with multiple cycles using the cascading property. However, either high conditional bias or long latency due to the multiple cycles on combined memristive structures must be addressed for these cases. This can lead to more significant power consumption and a state drift problem.
On the other hand, the logic operation in the V-R logic family can be implemented by altering the bias voltage applied (V a ) to two terminals of a memristor, providing high scalability, flexibility, and parallelism. [27] The logic reconfiguration can be represented by a finite-state machine, [17] providing high configurable functions, such as AND [28] and majority gate, [29] in a single step. The implementation of XOR in V-R logic also requires multiple but fewer cycles. Moreover, it can increase the endurance of memristors because half of the combinations of input states ('00' and '11') do not participate in the resistive switching of the memristors owing to the zero voltage drop on the actual memristor. However, the complicated cascading between the subsequent gates could be a critical drawback of V-R logic gates. Consequently, the output of the first gate cannot be used in the input of the second gate, and the conversion from resistance to voltage for each cascading step has to be included, increasing the size, complexity, and power consumption. [10] In this work, a new method was suggested to solve this problem and accelerate XOR operation on the previously reported APMR-two-2(IMP, AND, TF) logic primitive circuit, [23] composed of two anti-parallel BRS memristors (APM) and a series resistor (R s ). This new primitive circuit element was named "exclusive logic (Ex-logic) gate." The proposed gate can implement XOR, IMP, and RIMP functions in a single cycle on predefined states with a new cascading method to perform a three-input XOR operation. Furthermore, the basic principle of the Ex-logic gate belongs to the V-R logic family, and the logic reconfiguration can be applied for functional completeness. At the same time, it can also feature the R-R logic family since the voltage divider is used on two Ex-logic gates for the three-input XOR operation. In this context, the feasibility and performance of the accelerated two-input and three-input XOR operations are demonstrated in a 1-bit full adder-subtractor (FAS) with the RRAM-based model. In addition, the efficiency of optimization in terms of space-and time-related costs for the n-bit ripple carry adder (RCA) is presented and compared with the previous memristive RCA implementation. [30][31][32][33][34][35][36][37][38][39] Next, the efficiency of implementation is further emphasized with the memristive n-bit multiplier. Moreover, the proposed algorithm has the advantage of lower device stress, which ensures the reliability of the logic operation. Although this work focuses on arithmetic logic circuits, the memristive crypto function, such as the advanced encryption standard, [14,40] can be efficiently implemented with the proposed gate and will be presented elsewhere.

Reconfigurable Primitive Circuit and Path-Dependent State
The proposed primitive circuit consists of two APMs and an R s , as shown in the inset of Figure 1. The flexible implementation in a 3D crossbar array (CBA) structure is possible. The APM can be configured by connecting two word-lines (WLs) and a common bit-line (BL). Similarly, the different logic primitive circuits, such as PMR-two-2IMP, PMR-two-3NAND, and PMASM-two-3NOR, are shown in Figure S1, online Supporting Information (SI). The logic primitive circuits can be reconfigured by selecting a combination of memristors. For instance, if the 2 Â 3 CBA structure is given as in Figure 1, applying the conditional bias to WL 3t and WL 3b and connecting BL 2 to the ground (GND) will result in the APM structure in the M 32 cell, while biasing WL 1t and WL 2t and connecting BL 1 to the GND result in the PMR-two-2IMP gate in the M 11 and M 21 cells. Here, the WL is identified by its number and top (t) or bottom (b) position, given as the subscript (%WL 3t means the third WL located on top of the M). A memristor is indicated by the WL and BL numbers in the subscript (%M 11 means the memristor is located at the cross point of WL 1t , WL 1b, and BL 1 ). The importance of the compatibility of the primitive circuits will be clarified in a later section.
Meanwhile, the proposed primitive circuit structure is identical to the previously reported APMR-two-2(IMP, AND, TF) gate [23] but with a different logic configuration as it adopts the VÀR logic family. For the APMR-two-2(IMP, AND, TF) gate within the R-R logic family, the different bias voltages (to select a basic logic function) are applied to the top and bottom WLs and connected GND to the shared BL. The initial resistive states of the APM are predefined to the input states of p and q. In contrast, the V-R logic family in this work applied a common bias voltage (input p) to the top and bottom WLs of the selected APM, for which the resistive state was predefined (to select the basic logic function), while the shared BL is biased to another value (input q). The output is then represented by the final resistance of the combined memristor (between R on and R off ). The R on and R off represent logical '1' and '0,' respectively, whereas the input states '1' and '0' are defined by the bias voltages of V th and GND, respectively. Depending on the input combinations, the resistance of the APM, which represents the output, can be controlled. If the input states are either '11' or '00', no voltage drop across the APM occurred (the corresponding WL and BL voltages are identical), rendering the memristor states unvaried. On the other hand, when the input combination is '10', þV th is applied to the APM. If the initial states of the two memristors are {HRS, HRS}, the þV th can switch the resistance of the memristor located in a forward direction (fM) from '0' to '1' (set switching), resulting in the {LRS, HRS} state. Consequently, the {LRS, HRS} state is defined as the '1þ' state. On the other hand, when the input combination is '01', ÀV th is applied; the memristor located in the reverse direction (rM) is now switched from '0' to '1,' resulting in {HRS, LRS} states, which are defined as the '1-' state. Here, it should be noted that both the '1þ' and '1À' states of the APM show the identical current of set state, but they can be discerned by the reading scheme in Figure 2a if necessary.
Compared with the conventional state system consisting of a single BRS memristor, as shown in Figure 2b, the suggested state system contains one additional logical state, as shown in Figure 2c, which can be regarded as a Moore machine. [41] This new configuration offers great potential for implementing the single-cycle XOR function, which was improbable for the conventional single BRS memristor system. The essential aspect of this system is that the combinational states of '1þ' and '1-' have equivalent resistances but are different in the subsequent state transition with a unique path dependency, the so-called "path-dependent state," which is shown in Table 1. For example, the conventional state system defines only '1' and '0' states for the LRS and HRS of a BRS memristor, respectively. The periphery circuit can be minimized to account for the two different logic systems between single and combined memristive structures by sensing the equivalent resistance with one threshold line.
The conversion from '1' to '0' requires only '01' input state, which is the reset process of the memristor. In contrast, to reset the '1-' to '0' state in the APM system, different '10' input states have to be applied. On the other hand, the '1þ' state has the same response as the conventional single BRS memristor system, meaning that the '01' input states reset it. This additional response method can be utilized to efficiently increase reconfigurability for functional completeness. Another crucial merit of  www.advancedsciencenews.com www.advintellsyst.com this primitive circuit is its convenient cascading property, despite it being a sequential logic, which is demonstrated in the FAS.

Logic Reconfiguration Based on the Proposed Primitive Circuit
The proposed primitive circuit provides three possible states that allow the configuring of three logic functions, XOR_pq, IMP_pq, and RIMP_pq, in this Ex-logic gate. The combined gate is named the APMR-two-2(XOR, IMP, RIMP) gate with the predefined states (s) in a single cycle, as shown in Figure 3a. For the sake of clarity, T represents the bias condition of the three terminals (two WLs and one BL) of the proposed primitive circuit. When T is pq, it means that the input bias of p is applied to the two WLs of the memristor (fWL 1t , WL 1b g←p) and that of q is simultaneously applied to BL of the memristor (fBL 1 g←q). The variable s' represents the final state of the M 11 cell after the Ex-logic operation on the initial s.
In this context, one of the advantages of using the pathdependent state on the primitive circuit is the functionality to implement the XOR logic operation in a single cycle with a simple circuit configuration. In addition, the proposed Ex-logic gate substantiates functional completeness. The reconfigured logic operations of the XOR_pq function with the given reconfiguration number (r) of 1, the number of steps from the configured function, are shown in Figure 3b. This means that the primitive circuit first operated the XOR_pq (light blue background in Figure 3b), while the other logic operations (light green background in Figure 3b) could be achieved in one additional operation step. Here, T' is introduced to distinguish the first and second steps. For example, T' ¼ T 1 T 2 means the T 1 bias is applied to WL 1t and WL 1b and a T 2 bias is applied to BL 1 . Thus, the NAND_pq logic operation can be achieved by applying either T' ¼ p1 or 1q to the result of the previous XOR_pq function. The other configured functions of IMP_pq and RIMP_pq for r of 1 are summarized in Figure S2-1 in Section II of Supporting Information. Consequently, the 16 Boolean logic operations are presented in Table 2, with a maximum r of 2.
Most logic operations can be implemented within two cycles, except for the NIMP and XNOR logic functions, which can be realized with three sequential cycles. However, the number of steps can be further reduced with the aid of other logic primitive circuits if necessary. As the Ex-logic gate supports compatibility in the 3D CBA structure, the primitive logic gate of PMR-two-2IMP can simultaneously be implemented. For the forward and reverse directions of a single BRS memristor, their states are given by two equations.
For example, after the logic operation of XOR_pq in the Exlogic M 11 gate, fM 11 can be partially selected, which is defined as the partial selection method, by connecting WL 1t to the bias voltage and BL 1 to GND. Then, the NIMP_pq function can be implemented in a single cycle, the same result as the primitive logic gate of a single BRS memristor proposed by Linn et al. for SM-two-(RIMP, NIMP) gate. [17] Similarly, the RNIMP_pq function can be obtained by selecting rM 11 . In particular, Equation (1-2) shows an essential idea to implement the subtractor using Ex-logic gates, which will be discussed later. In addition, XNOR_pq can be reduced to a single cycle when one of the nodes is connected to the NOT gate. The negation of one input bit, defined as the negation method, on the XOR_pq operation coincides with the XNOR_pq function. The two reduction techniques of partial selection and negation methods will be exploited in the FAS implementation, as shown later.

Cascading via the V-2R Logic
The V-R logic family has an inherent cascading problem between consecutive logic gates due to the discrepancy between voltage-type input and resistance-type output. The correlation between the previous gate must be presented for the multi-input logic gate, but it is cumbersome to construct a relationship with the discrepancy. Generally, an external circuit, such as a potentiometer, is required to convert the resistance to voltage, which increases the space-and time-related costs in IMC applications. This requirement, however, can be mitigated by adopting domain-specific gates. Specifically, the target domain of the proposed Ex-logic is to accelerate the XOR function rather than the implementation of all logic functions. The domain-specific cascading method for V-R logic can be applied for the XOR logic function, but the generality of the application can be increased by combining it with the different logic implementation methods. Subsequently, the additional external circuit for the input conversion becomes unnecessary through the combination of V-R and R-R logic implementations, which is referred to as V-2R logic and explained below.
The principle of the V-2R logic concept could be understood by the voltage divider effect between two Ex-logic gates, M 11 and M 21 , connected in series. This V-2R logic circuit can be used to implement a three-input XOR operation, for which the truth table is shown in Figure 4a.M 11 cell is the first Ex-logic gate that stores the result of the XOR operation with the two inputs of d 1 and d 2 (WL 1t , WL 1b ←d 1 and BL← d 2 ). For this XOR operation, M 11 should be initialized to s ¼ 0 (both fM 11 and rM 11 are in www.advancedsciencenews.com www.advintellsyst.com HRS). The third input state of d 3 is stored in the fM 21 of the M 21 cell as resistance, and the rM 21 cell is settled to the HRS state at the initialization step. Then, the two Ex-logic XOR gates are connected in such a way that WL 1t and WL 1b are biased with the conditional voltage, and WL 2t and WL 2b are connected to GND, as shown in Figure 4b. fM 31 is currently neglected. When the conditional voltage of V a is applied to the two series Ex-logic gates, the voltage drop on the M 21 cell can be calculated by the voltage divider between the two series cells.
where R M11 and R M21 represent the equivalent resistance of M 11 and M 21 cells, respectively. 2R s accounts for the two Ex-logic gates connected in series as each has an R s . As the number of '1' state in the M 11 cell increases after the first XOR operation between d 1 and d 2 , the equivalent resistance for the M 11 cell decreases from R off =2 down to R on R off =ðR on þ R off ), resulting in an increment in the voltage drop of the M 21 cell. The critical feature of this operation is that the minimum of the equivalent resistance of M 11 is not R on =2 as the proposed state transition diagram of Ex-logic does not allow the {LRS, LRS} state. In addition, the input condition (d 1 , d 2 ) of (11) results in the identical output of M (no change) when (00) is the input. Thus, cases 7 and 8 in the truth table become identical to cases 1 and 2, respectively. This makes the proposed voltage divider scheme possible for the three-input XOR operation as the equivalent resistance of the previous XOR operation containing only one number of '1' is always the lowest. As a result, the cascaded XOR result is stored as a form of equivalent resistance of the M 21 cell. This gate can be called "APMR-three-4XOR." Therefore, the proposed algorithm enables a connection between V-R and R-R logics without requiring an external active circuit. Further, this V-2R method can be expanded for the n-input XOR logic operation that follows Algorithm 1. www.advancedsciencenews.com www.advintellsyst.com Table 2. Implementation of 16 Boolean logic operations based on the reconfiguration.

The Memristor Model and Criteria for the Ex-Logic Gate
The selection of a resistive switching memristor is vital in implementing the IMC logic gate. The energy and latency of the logic operation are closely related to the memristor's switching property. In addition, the characteristics of the memristor, such as endurance, retention, and variation, can affect the state transition. For simplicity, the consideration is limited to logic feasibility constraints based on the memristor model. Moreover, to operate the Ex-logic gates in parallel, the state transition diagram, mentioned in Figure 2c, must be satisfied with the same magnitude of the applied voltage for the set and reset. Thus, the reset voltage magnitude of the memristor has to be smaller than the set voltage magnitude for the conditional switching in the presence of R s . Otherwise, the magnitude of the reset voltage becomes higher than V th , which renders the circuit operation unfeasible. The candidate memristor for this property is reported on Cu-Te-based conductive bridge random access memory, [42] Cu/Go/Pt, [43] and a TiO x -based memristor. [23,44] Among the candidates, the TiO x -based memristor is selected to demonstrate the 1-bit FAS. Here, the dynamics of the conductive filament growth/dissolution compact model [45] are used for the memristor with the switching conditions, as shown in Figure 5a. It is worthwhile to note that the correction factors to address nonideal effects, such as sneak current and parasitic capacitance, are not considered in this work, which should be addressed in practical implementation.
The optimal conditions of V a and R s for the APMR-two-2XOR and APMR-three-4XOR gates can be found using the success constraints for each case of operations. A total of eight constraints have to be fulfilled (three constraints for the two-input gate and five constraints for the three-input gate), as shown in Section III of SI in detail. The constraints can be represented by linear equations, so that they can be plotted in Figure 5b to find an optimal solution. The yellow region, formed by two lower boundaries (blue lines) and one upper boundary (red line), represents the qualified region that satisfies the success constraints of the logic operation. Among the eight constraints, three of each Ex-logic operations participate in forming the optimal region. These are called active constraints and play an essential role in finding the appropriate R s value. Although the variation in devices is not considered explicitly, the logic constraints would have a certain range instead of the discrete value, as shown in Section IV of SI. For this case, the applied voltage at the given R s can be selected above the upper variation line at the cost of the device stress, in which the concept will be discussed in detail in a later section. Thus, the optimization between the device stress and variation is required. For the demonstration, the optimal V a and R s are selected to be À1.58 V and 100 Ω, respectively. The reason for the selection with the active constraints will be highlighted in a later section.
After logic operation, the stability constraint is then appraised. When V a with the same magnitude is applied to the two-and Algorithm 1. Ex-logic n-input XOR logic operation.
Procedure n-input XOR operation (M 1:n-1, d 1:n ) For n ← 1 to NÀ1 do If n ← 1 The optimal regions to find the optimal V a and R s in the APMR-two-(XOR, IMP, RIMP) and APMR-three-4XOR gates. The constraint conditions for two-input and three-input logic operations are derived from the switching cases in Section III of Supporting Information.
www.advancedsciencenews.com www.advintellsyst.com three-input Ex-logic gates, the voltage drop on the output memristors with the two-input case is higher than the latter. For example, the required set voltages of the memristor in the two-and three-input cases are À1.546 and À1.577 V, respectively, when R s ¼ 100 Ω. The relevant stability constraints determine these values. Subsequently, the selected V a of À1.58 V imposes a larger overvoltage for the two-input case (|À1.58|-|À1.546| ¼ 0.034 V) as compared with the three-input case (|À1.58|-|À1.577| ¼ 0.003 V). Thus, the drift speed of ions in the memristor (if the switching mechanism is ionic) of the two-input case is faster than that of the three-input case, resulting in a faster switching speed. The stability constraints with the dynamic switching speed are considered as described in Section V of SI. Moreover, the different overvoltage for the RRAM operation requires the selection of shorter and longer pulse lengths for the two-and three-input Ex-logic operations. Therefore, 35 and 750 ns, respectively, are selected for this work. As a result, complete Boolean logic operations can be demonstrated. The state transitions based on the state transition diagram and NAND logic operation are conducted as demonstration. A total of 12 possible cases can be obtained from the state transition diagram (four input cases for three initial states). However, inputs of '11' and '00' naturally retain the initial state ('0', '1þ', or '1-'), so only six cases with effective bias voltage being applied to the APM are depicted in Figure 6a.
The '0'|01! '1À' denotes the transition from '0' state to '1À' state under the '01' input combination. Four cases (cases 1-4) change the state, but the other two do not (cases 5 and 6). The demonstration of the NAND logic operation according to Figure 3b is shown in Figure 6b. The implementation takes two sequential steps XOR_pq à NAND_pq (T' ¼ p1) for each case. As there are four cases for the two input combinations, a total of eight results is presented. The two dashed lines in the result represent the initial and final points, respectively, when the conditional voltage V a is applied to the Ex-logic gate. The resistance is derived with a read voltage of 0.3 V, and the results show the feasibility of the Ex-logic gate.

Demonstration of a 1-Bit Full Adder-Subtractor and Implementation for the Multibit Input
Full adder (FA) is a combinational circuit that performs the addition of two binary numbers, A (A 1 , A 2, … A n ) and B (B 1 , B 2, …, B n ), with a bit-length n. It takes two inputs and carry-in and produces sum and carry-out, for which the truth table is shown in Figure 7a. The sum of the half adder (HA) is first implemented by the APMR-two-2XOR gate, while that of the FA can be by the APMR-three-4XOR gate. The carry can be calculated by reconfiguring AND and OR functions using the APMR-two-2XOR gate, but borrowing from the other primitive gates, such as the majority gate (SM-three-1MAJ), could be an even more efficient method. [29] Further, using the flexible connection in the 3D CBA structure, the carry-out bit can be achieved in a single cycle. Combining with the Ex-logic gates, the 1-bit FA can be implemented on five memristors in three cycles, as shown in Figure 7b. The circuit structure is identical to that shown in Figure 4b. The carry-in bits are initialized to the two memristors of rM 21 and fM 31 , and the other memristors are set to the HRS state in parallel. After initialization, an APMR-two-2XOR gate is biased with two inputs of V A and V B on the M 11 to achieve the sum of the HA. Then, the carry-out bit is calculated on a single memristor of fM 31 by applying V B to WL 3t and V A ⋅ to BL 1 , which corresponds to the SM-three-1MAJ gate. Finally, the sum of the FA can be calculated and stored in the M 21 gate by operating the APMR-three-XOR gate connecting the M 11 and M 21 cells in series. The demonstration of the 1-bit FA is shown in Figure 7c, where the four dashed lines are the reference lines dividing five regions to indicate the initial state, three operation regions in the clocking scheme table, and the final state of the memristors.
Similarly, the 1-bit full subtractor (FS) can be implemented in the same circuit, as shown in Figure 8. The calculation of the borrow-out bit can be done using two IMP and RNIMP logic operations as it can be expressed by the following equation.
This is the same implementation on a single reverse memristor mentioned in Equation (1-2). The designer can either choose the forward or reverse memristor to implement the borrow-out bit (or carry-out in FA) by changing the bias voltage direction. The forward memristor fM 31 is selected for the bring-in bit by applying the V B to WL 3t and V A to BL 1 . As a result, in the aforementioned partial selection, selecting a single memristor from an Ex-logic gate, and negation, inversing one of the input bits, are applied to the Ex-logic gates to further optimize the FAS implementation. Consequently, the optimized FAS can operate in three operational cycles with five memristors.
For a practical comparison between state-of-the-art memristive logic circuits, the n-bit FAS is conceived. First, the RCA method is selected for larger n-bit binary numbers, which can be constructed by cascading the 1-bit FA. The main delay of the RCA adder originates from the computation of the carry-out as the previous FA gate must transfer the carry-out to the carry-in of the next adder gate. Although the propagation delay can be decreased by other types of adders, such as carry lookahead and parallel prefix, the RCA still has the advantage of a simpler memristive structure without requiring transistors for better modularity and periphery circuits. Figure 9 shows the block schematic diagram and clocking scheme table of a 4-bit RCA (n ¼ 4) using 1-bit FAs of Ex-logic gates. Instead of repeating the entire process of 1-bit FA for n times, the sum operations of APMR-three-4XOR can be applied in parallel through the WL, which reduces the total latency of the RCA from 4n to 2n þ 1. The total number of memristors used in the n-bit adder is 4n þ 1, where the þ1 accounts for the memristor storing carry-out bit. As a result, the 4-bit RCA can be implemented on 17 memristors in 9 steps, as shown in Figure 9b. Table 3 compares n-bit FA implementation using other memristive logic gates. Here, it should be noted that other complex logic gates that consist of transistors and memristors [46][47][48] are not considered because different space-related comparisons are required. The number of memristors (space-related) and steps (time-related) are tradeoff relationships. Due to its cascading ability, the IMPLY-type logic gate uses fewer memristors (low space-related cost) to implement an n-bit FA but requires more steps (high timerelated cost) in return. ORNOR gate is a type of IMP logic gate but uses three memristors instead of two, making it possible to www.advancedsciencenews.com www.advintellsyst.com implement the combination of the OR and NOR functions (ORNOR). [39] Combining ORNOR and AND functions can reduce the costs of an XOR operation as compared with that of an IMP function.
On the other hand, the SIXOR gate takes advantage of a single-cycle XOR operation and reduces the number of steps (low time-related cost), [49] but more memristors (high spacewise cost) have to be included and initialized to prevent input drift problems. Thus, overall, the proposed Ex-logic gate shows the best optimization in terms of space-and timerelated costs. In addition, unlike other R-R logic families, the proposed Ex-logic gates rely on the V-2R logic family, which increases the conversion speed between memristive addition and subtraction in the FAS circuit. Although initialization is not considered in the comparison, the R-R logic family requires the additional remapping of the input bit to each corresponding memristor or an additional logic operation when the transition between FA and FS is required. Consequently, this increases overall device stress and latency as the state transitions of the memristor are involved. In contrast, the proposed V-2R logic family does not involve additional initialization and can be implemented by simply negating the input bias lowering overhead cost.

Memristive Multipliers Using Ex-Logic Gates
Along with adder implementation, a multiplier is involved in many essential applications. The main optimization of the memristive multiplier (mMP) comes from reducing the partial products, which have been attempted by shift-and-add, Wallace-tree, and Dadda. [50][51][52][53][54] When utilizing the massive parallelism of the CBA structure and IMC, the operation speed of the mMP can be enhanced, while the area overhead remains minimal. In this work, a new mMP design is proposed using improved Ex-logic FAS with the modified Wallace-tree algorithm. Figure 10 shows the method used to represent the partial products of mMP, which are indicated by the black dot in the inset. The partial products can be distinguished by multiplicand (A n ) and multiplier (B n ) bit positions as in P 01 , which represents the partial product of A 0 and B 1 input bits. Unlike the conventional Wallace-tree reduction method, carry is not involved in the grouping step of the reduction. Instead, the carry is propagated through the entire system. Although the propagation delay can be the bottleneck of the operation, as already mentioned in the 4-bit RCA implementation, the parallelism of finding the sum in the simpler circuit structure can mitigate the overall speed issue. The modified Wallace-tree reduction consists of three components: 1-bit FA, one 1-bit HA, and one fM of the Ex-logic gate ( Figure 11). The reduction method starts with categorizing the boxes into groups of two sets of partial products in the same row, as shown in the red boxes. The P 00 position is always left with one partial product as the least significant bit of the product is the first product itself, so the single fM takes its place in the structure. Next, if the first red box is composed of P 10 and P 01 , the first carry bit can be generated. Thus, HA can be used to calculate the sum and carry-out. Therefore, the red box is implemented with the HA, and the other boxes are implemented with the FA as the carry-in should be included. For the remaining ungrouped partial products, they should be implemented with the HA between carry-in and partial product, but the incompatibility between in and out must be addressed as input states in the proposed APMR-two-2XOR gate should be mapped in the voltage form, whereas the cascaded carry-in would be the resistance form. Thus, FA is used instead with the auxiliary input of '0' to convert the voltage-type partial product to resistance, and www.advancedsciencenews.com www.advintellsyst.com Figure 9. Implementation of 4-bit RCA. a) The block schematic of RCA. The gray block represents 1-bit FA, and the previous carry-out of the FA block is propagated to the next carry-in of the block. b) The output address of RCA in the 3D CBA configuration. c) The proposed clocking scheme for each memristor. A total of 17 memristors with 9 cycles are required to implement 4-bit RCA consisting of four APMR-two-3XOR, SM-three-1MAJ, and APMRthree-4XOR gates in the proposed clocking scheme. The 4-bit FS can be implemented in the same circuit by negating the input bit A. Table 3. Comparison between the proposed logic gates for n-bit FA implementation. Additional initializations are required during the FA operation. Also, FLEX OR and SIXOR cannot operate in parallel without separate sources and circuits.
www.advancedsciencenews.com www.advintellsyst.com the sum can be calculated using the APMR-three-4XOR gate. Therefore, partial products can be combined in the block diagrams, as shown in Figure 11b, d, and f. The white box represents the single memristor, the gray box represents the HA, and the dark-gray boxes represent the FAs. Thus, n-bit mMP using Ex-logic gates can be represented with (n 2 þ n)/2 blocks. Consequently, the proposed methodology for memristive multiplication comprises the same components of logic operations, which are APMR-two-3XOR, SM-three-1MAJ, and APMR-three-4XOR gates but in different combinations and initializations.
For representation, the 2 Â 2 mMP is demonstrated in Section VI of SI, and 3 Â 3 mMP is implemented in Figure 12. To avoid confusion with the previous notation, the sum of partial products is expressed by S rÀc , where r and c represent row and column numbers in the reduction method diagram, as shown in Figure 11. In the given blocks, the reduction of the partial products can be divided into three stages: partial product mapping, carry cascading, and folding (Figure 12a). The first stage involves storing the partial products in the system. This can be mapped using the APMR-two-2XOR gates by summing two partial products on each designated block. Thus, the total steps of partial Figure 11. Proposed modified Wallace tree reduction method for the mMP using Ex-logic FAs. a) Grouping method in Wallace tree for the 2 Â 2 partial product matrix and b) block diagrams using FAs, a HA, and a fM to store the first product. c,d) 3 Â 3 partial product matrix and e) and f ) 4 Â 4 partial product matrix. Figure 10. In-memory Wallace tree structure for mMP. The partial product can be represented by the black dots, as shown in the inset of the figure. The partial products can be named using multiplicand bit and multiplier bit positions.
www.advancedsciencenews.com www.advintellsyst.com product mapping depend on the total number of blocks in the system. The location of mapping is determined by the block diagrams. Each column of the block diagram corresponds to two WLs in the Ex-logic circuit, whereas each row of the block diagram corresponds to a single BL. Thus, implementing the 3 Â 3 mMP requires four WLs and five BLs, as shown in Figure 12b. The partial products are mapped accordingly, as shown in Figure 12c, to account for the carry position. Next, the carry-in is propagated through the system starting from C 1 . As the carry-out can be calculated using the SM-three-1MAJ gate, the parallel operation to find the final sum of the partial products can be conducted through (WL 2t , WL 2b ) and (WL 4t , WL 4b ). One additional step is added for the reinitialization of BL 4 and BL 5 to reverse to the original state for the folding stage. However, this can be omitted in the fabrication process when M 24 and M 25 are not required by replacing the resistive switching layer of the memristors with an insulating layer. In the folding stage, the parallel implementation of the sum using APMR-three-4XOR gates is defined. Consequently, the results of the product are stored in the Ex-logic gates located on (WL 4t , WL 4b ), after performing 15 cycles with 20 devices. For the n-bit mMP, the latency can be found by considering each stage. The partial product mapping takes the value of nðn þ 1Þ=2 by finding the number of the blocks. The carry cascading, including reinitialization steps, is 1 2 ð3n 2 À 7n þ 6Þ, and folding is 2n=2 À 1. The total number of the devices can be determined by the number of FAs, a HA, and a single memristor. As the single HA and fM are used (addition of three memristors), the number of the FA can be found by subtracting two from the total number of the blocks, (n 2 þ n)/2. FA block takes two APM cells, multiplying four in the total number of FAs as 2ðn 2 þ n À 4Þ. The addition of the four memristors for the single-memristor operations results in 2n 2 þ 2n À 4. Compared with the other proposed logic gates for n-bit mMP, the implementation shows significant improvement in the latency with RCAs, which is comparable with the carry-save adder in MultiPIM design, as shown in Table 4, due to the accelerated XOR operation using Ex-logic gate and parallelism of the IMC. Although the single device of the Ex-logic gate consists of the APM, stacking the memristor ensures high density.
Another important aspect is energy. However, the energy comparison between the different suggested logic gates could not be performed due to the difficulty of comparing the different memristors used. Still, the magnitude of the applied bias compared with V th closely relates to the energy consumption and endurance of memristors. The proposed Ex-logic gate has the advantage of finding optimal V a using active constraints with R s . The relationship between V a and V th is described in detail in the next section.

κ Mapping Analysis Method Using the Calculation of Device Stress
So far, many types of IMC logic gates have been proposed and compared in terms of latency and area, [10,26,[55][56][57] but the reliability of logic operations has not been fully considered. Reliability is still an open challenge for memristive IMC logic gates, and there is actively ongoing research to address related issues such as state drift and soft error. The error correction code and triple-modular redundancy are suggested as ways to decrease the error. [58] However, this requires additional initialization operations to address the errors and only applies to soft errors. Instead, addressing and minimizing their source is an efficient way to increase reliability without the extra cost of additional operations. In this regard, minimizing the memristor device stress while satisfying operational constraints must be feasible. Thus, the normalized device stress factor (κ) is introduced to quantify the device stress level depending on the types of suggested logic gates.
Regardless of the R-R and V-R logic family, the conditional operation involves the set and/or reset transition of memristors. Therefore, V a must exceed V th for the set transition, but the degree of exceeding must be minimized. Otherwise, it may www.advancedsciencenews.com www.advintellsyst.com suffer from a hard breakdown (HBD). [59] The voltage drops on selected and unselected memristors can be derived by Kirchhoff 's law. The degree of overvoltage can be represented by κ, defined by Equation (5).
κ must be negative for the sustainability of unselected memristors (i.e., V a should be lower than V th for unselected memristor). On the other hand, a higher positive κ to the selected memristor may ensure the successful and rapid switching of the cell, but this may sacrifice endurance. The maximum allowable κ until the HBD depends on the device's properties and operating conditions. For instance, the Ta 2 O 5 -based memristor in a 45 Â 45 nm CBA structure showed set and reset switching with 0%þ4 V and 0%À2.5 V of V a with a compliance current of 10 μA. However, it had the HBD at þ5 V when the compliance current was increased to 100 μA. [60] This means allowable κ is 0. 25 V a to memristors in the logic primitive circuit can be calculated by finding the common node voltage (V c ). The general equation for V c can be represented by Equation (6), as derived from a previous review. [26] The output of the logic operation is stored in the selected memristor. The parameters for the representative primitive logic circuit are summarized in Table S1, Supporting Information. V 1 , V 2 , and V 3 are the conditional voltages that follow jV 1 j < jV 2 j jV 3 j h . R Ã represents the effective resistance of the logic primitive circuit, which is the sum of the conductance of each memristor in the branch. γ us and γ s represent the ratio of the effective resistance to the unselected and selected memristor states of the primitive circuit, respectively. V us and versus are the driving voltage applied to unselected and selected memristors, respectively. V c can be calculated by the weighted sum of each node, and thus, the voltage drops on each memristor in the branch, V M1 , V M2 , and V M3 , can be obtained.
The R-R logic family often requires the driving voltage in the region beyond the normal operation of the single memristor due to its connected memristive structure rather than that of a single memristor. Thus, even if the success constraints of the logic operations are satisfied for the given driving voltage applied to the memristor, that is, V a > V set , to change the state from HRS to LRS, the upper limit of the memristor has to be considered. For example, the PMASM-two-3NOR logic gate [25] utilizes the reset operation of the output memristor for the NOR operation. The voltage drop and κ across the input and output memristors can be calculated with the extracted device parameters, as shown in Figure 3 and 5. The blue and orange colors indicate set and reset transitions. The light colors represent the sustainability constraints, whereas the dark colors represent the success constraints.
The negative κ value of the unselected cell indicates that unwanted switching can be prevented. Nonetheless, the magnitude of the negative κ should be sufficiently high to avoid the probability of unwanted switching due to the stochastic property of memristors. [59,61] The voltage applied to the memristor near V th can still induce stochastic switching. In this context, the mathematical relationship between the reliability and switching probability of the PMASM-two-3NOR gate in an unsafe write regime has been reported. [62] In cases where unwanted switching has occurred, they can be switched back to the original state by initialization. However, HBD is catastrophic and can hardly be recovered.
The success constraint is used to drive the correct result of the logic gate, which may require κ ≥ 0. However, satisfying this with minimal overvoltage stress depends on the detailed operation method. For instance, APMR-two-2(IMP, AND, TF) can implement six basic logic functions of IMP_p, IMP_q, AND_p, AND_q, TF_pq, and TF_qp with six different applied voltages but has the highest device stress (κ ¼ 5) for the TF_pq function, resulting in the HBD of the memristor, whereas IMP_p and IMP_q have optimal device stress (κ % 0). As a result, only an experimental demonstration of IMP functions could be presented. Overall, it is still quite challenging to find the optimal logic gate, ensuring the operational condition of κ ≥ 0. Thus, improvement in the algorithm approach should first be undertaken. Figure 13a shows the optimal κ region for the logic operation. κ should be within the þα region for the success constraint without the HBD and below the À β region for the sustainability constraint without the stochastic switching. While α and β vary Table 4. Comparison between the proposed logic gates for n-bit mMP. depending on the memristor structure, switching speed, and measurement temperature, the safe value can be reasonably settled to be 1.0 and 0.2, respectively, by surveying the previous works in Section VII of Supporting Information. Based on these criteria, the κ values for the different memristive logic circuits are compared with the proposed Ex-logic gate, as shown in Figure 13b. It can be understood that the suggested gate has optimal operation conditions. In addition, the κ values for the four memristors of the suggested APMR-three-4XOR comprising the 1-bit FA are also analyzed, as shown in Figure 13c. In this case, the device overvoltage to the selected cells remained minimal (κ ¼ 0.0015-0.0022), and the unselected cells were safely unaffected (κ < À0.49). These results demonstrate that the suggested Ex-logic gates can be feasibly operated without involving the high risk of HBD and stochastic switching.

Conclusion
Starting from the finite-state machine, the new state system is proposed to provide an additional path to increase logic configurability. The system can be realized in two antiparallel bipolar resistive switching memristors and a series resistor that provides the XOR function in a single cycle. Then, the well-known cascading problem of V-R logic is addressed using a novel method of V-2R logic that connects V-R and R-R logic gates. Consequently, APMR-two-2(XOR, IMP, RIMP) and APMR-three-4XOR gates can be implemented with the proposed Ex-logic gates. Utilizing compatibility in a 3D CBA structure, a 1-bit FAS is demonstrated with five memristors in three steps. The ripple carry method is adopted to account for the larger input bit length on the n-bit full adder-subtractor. The comparison between the n-bit full adder shows that the new implementation has an advantage on the adder-subtractor conversion speed and optimization in terms of space-and time-related costs. This advantage is further verified using the n Â n mMP. Furthermore, the proposed implementation shows low device stress per operation and a low chance of unwanted stochastic switching. Thus, competitive memristive logic gates have been made as outlined earlier. It is envisioned that the Ex-logic gate can further be improved and enable efficiency in multiple applications, such as error correction and image encryption, which will be evaluated elsewhere. Figure 13. Proposed optimization method for the IMC logic gates based on a) the optimized κ region for the logic operation, b) κ-plot for the sustainability (left) and success constraint (right) of the representative logic gates, and c) κ-mapping method for the proposed logic gate. The dark blue and orange colors indicate the device stress of the success constraint, and light blue and orange indicate that of the sustainability constraint for the set and reset transitions.