An approximate CNTFET 4:2 compressor based on gate diffusion input and dynamic threshold

Here, a new 4:2 approximate compressor is presented by the gate dif- fusion input (GDI) technique. Although GDI cells suffer from threshold voltage drop, the dynamic threshold approach and carbon nanotube ﬁeld-effect transistors are merged to overcome the mentioned problem. The proposed cell has full-swing outputs, while its error and power de- lay product are at low rates. Therefore, low voltage multipliers that are used in image processing can beneﬁt from the proposed compressor.

Introduction: Owing to the ability of error tolerability in applications such as multimedia, machine learning, and digital signal processing (DSP), the concept of digital approximate arithmetic circuits has become a focal concentration area for designers [1]. Significant advances in this area have led to energy efficiency improvements in modern systems on chip. Full adders (FAs) and compressors are the main cores of sophisticated integrated circuits (ICs) such as multipliers and digital filters, and they are well known for a high rate of energy consumption [2]. Therefore, the approximate concept can be used to solve the mentioned defects.
Approximate 4:2 compressors with appropriate compression capability can be mentioned as one of the most vital cells for various applications [3]. Numerous approximate 4:2 compressors have been proposed in the literature [1][2][3][4][5][6][7]. Unlike exact compressors ( Figure 1a) [8], which are designed by four main inputs (X 1 -X 4 ) along with input carry (C in ) to produce three required outputs (Sum, C out , and Carry), most of the approximate designs in this area have ignored C in and C out (Figure 1b,d) as an effective methodology for efficient design [1,6]. Two well-known 4:2 approximate compressors are shown in Figures 1b [1] and 1c [7], which benefit privileged trade-off between preciseness and circuitry performance. Figure 1b is implemented based on XNOR and NOR logics, while Figure 1c uses the stacking circuit concept and an FA. Although both circuits have a proper error rate, they suffer from high area occupation of the complementary metal-oxide-semiconductor (CMOS) technique, and a high short circuit and dynamic currents at switching time, which causes high power consumption. Therefore, it is expected that by reducing the area for energy saving, a circuit with lower preciseness will be achieved based on majority logic as Figure 1d [6]. In this circuit ignoring one of the inputs, X 2 , along with the C in and C out has caused area reduction while increasing the error rate. Here, a new 4:2 approximate compressor is presented with reliable and solid performance. The gate diffusion input (GDI) technique is used to reduce area and power. The  [1]. (c) Compressor with stacking circuit [7]. (d) Compressor with majority logic [6]  Proposed circuit: Considering the emerging paradigm as Figure 2a, accessibility of a new reliable 4:2 approximate is feasible. In recent years, to achieve smaller area occupation, the GDI technique has been used instead of the CMOS because as shown in Figure 2b, it is possible to form different gates with only two transistors. So, the high complexity of the previous compressor circuits can be solved. However, in the GDI technique, to solve the non-full-swing outputs, the carbon nanotube fieldeffect transistor (CNTFET) technology of Figure 2c is used. The CNT-FET technology has excellent electrical characteristics, high ballistic transportation, and excellent V th controllability, along with the high potency of using the body bias dynamic threshold (DT) technique [9]. Using the body bias method drives the threshold voltages of the CNTFET to be changed dynamically by the input voltage at the gate terminal. When V GATE is '0' body terminal voltages of P-type and N-type transistors are low. Therefore, V body-source-P-type = −V DD and V body-source-N-type = 0 while otherwise, V body-source-P-type = 0 and V body-source-N-type = V DD [9]. So, the output becomes less influenced by threshold drop and will be full-swing. For instance, in the GDI-based AND gate the P-type transistors are weak for low voltage generation, so the DT-CNTFET technique can overcome this problem, as illustrated in Figure 2d. Here, the output of the AND gate is connected to fan-out8. To optimize this effect, the V th regulation feature in CNTFETs has also been considered. Unlike other techniques, in CNTFET-based GDI cells, pull-up and pull-down network dimensions can be considered identical. The chirality vectors of these transistors can be considered (38,0) so that the values of CNT diameter (D CNT ) and V th are equal to 2.97 nm and 0.144 V, respectively.
The proposed gate-level circuit of the 4:2 approximate compressor is shown in Figure 3a which operates according to (1) and (2).
Here, the sum producer is similar to [1] which is based on XNOR and OR gates, but the carry generator is different and can be separated into two parts. In the first part, the stacking circuit of Figure 1c is used to produce the essential signals for the second part which is a multiplexer (MUX). Table 1 gives a comparison between the proposed compressor with [1,6], and [7] circuits. Here, the circuits of [7] and [6] with two and eight errors, respectively, have the smallest and the largest errors of the outputs.
Although the proposed circuit is a combination of [1] and [7] with a new implementation which is using a MUX instead of a FA along with GDI and DT-CNTFET techniques composition, it exhibits 5 errors, which is more than [1] with 4 errors and less than [6] with 7 errors. An important issue with the proposed circuit, unlike other circuits, is the     FA into one MUX, which leads to a smaller critical path and area occupation.
Circuitry performance and error rates: The parameters of Table 2 are used to simulate reference circuits and the proposed cell by HSPICE software, while the technology is 32 nm MOSFET-like single-walled carbon nanotubes (SWCNTs). Table 3 shows that the proposed cell with 18 transistors has significant power and power delay product (PDP) reduction compared to other designs, specifically based on the CMOS. The closest rival in terms of power to the proposed cell is the [6], while for PDP it is the [7]. The main problem related to [6] is its high delay. Regarding the power and PDP, the proposed cell shows 44.91% and 3.81 × improvements compared to [6] and [7], respectively. Since some circuits have a higher or lower number of transistors compared to the proposed circuit, a figure of merit as PDP-area product (PDAP) which is PDP × (transistors × tubes) can be used. The circuits of [6], [2]-Design2, and [5], respectively have fewer transistors than the proposed circuit, but the proposed circuit has a dramatic reduction of power and PDP. The main point to consider is the considerable difference between the PDAP of the proposed cell with other designs.
The accuracy factors of the approximate compressors are measured using MATLAB software. At first, the error rate of the cells is compared under two conditions, possible 16 input patterns and randomly one million applied input signals. The results are shown in Table 4. The error rate (ER) is defined as the percentage of erroneous outputs among all outputs [10,11]. Among the circuits, the minimum average error rate of both outputs belongs to [7] while the next ranks are for [4] and the proposed cell, respectively.
More illustrative metrics such as the mean ED (MED), normalized error distance (NED) [10], and normalized mean error distance (NMED) [11] are used to distinguish the accuracy of the circuits more properly (3).
where N is the bit length of the 4:2 compressor. The error results are shown in Table 5. In applications that deal with the human senses, the difference between the exact and approximate results (i.e. the exact and inexact colours in a pixel) is more important than their relative difference. Hence, the most important parameter in Table 5 is NMED. It is   notable that the compressor designs of [2][3][4]7], provide lower NMEDs with high energy consumption and transistor count costs compared to the proposed cell.
To make a trade-off between preciseness and energy efficiency, a practical figure of merit (FoM 1 ) as (4) [6] is used, whose results are shown in Figure 4a.
Where i indicates the desired circuit to be evaluated and max is the circuit with the highest PDMP, which here is [4]. According to the results, the smaller the FoM 1 the better the hardware-accuracy trade-off is. In this case, the proposed cell has excellent performance compared to other cells which shows 2.68× better results compared to [2]-Design1. On the other hand, the proposed cell with a higher rate of MED and lower considerable value of PDP compared to [7] still has the best results of NPDMP. Although the proposed cell only has a 1.25% improvement compared to [7] the efficiency of the proposed circuit is more visible when it is compared to other cells, specifically those that are not based on CMOS or have lower transistors counts. Here, the performance of the proposed cell is 71.46% and 83.49% better compared to [2]-Design1 and [6] with 13 and 12 transistors, respectively.
Conclusion: To attain energy-efficient digital systems, approximate computing is used as an emerging approach. In this research, a new 4:2 approximate compressor with its significant role in low accuracy applications is presented. The cell is based on the GDI technique. It has 18 transistors and the outputs are full swing. Due to the emerging dynamic threshold (DT) feature and CNTFET technology with the GDI concept, the circuit performance of the presented 4:2 compressor is highly efficient. Also, reduced complexity and significant power reduction with an acceptable error rate (ER) of outputs lead the proposed circuit as a great alternative for previous designs.