Dual segmentation approximate multiplier

This letter proposes a new design for a dual segmentation approximate multiplier. The proposed approximate multiplier uses the static segment method (SSM) to select an initial multiplication segment. Following that, the dynamic segment method (DSM) is utilized to further reduce

Introduction: Approximate computing can be applied in error-resilient applications to reduce power, area, and delay [1][2][3][4][5][6][7][8][9][10]. Multiplication is a fundamental high-energy operation in image processing and deep learning applications [11][12][13][14]. Prior works have explored different techniques to reduce the cost of multiplication using approximate multipliers. Examples of these techniques include rounding the multiplicands to the nearest power of two [10], partial product matrix simplification (e.g. using approximate compressors and sub-multipliers) [5][6], and segment-based approximate designs (e.g. truncating the operands to designated bit-width) [7][8][9]. Segment-based approximate multipliers allow for a trade-off between accuracy and performance by adjusting segment size m. In [7], a static segment method (SSM)-based multiplier was presented, which statically split the input operand into i (m-bit) segments and performed the multiplication utilizing the segment containing the most significant one. Hashemi et al. [8] extended the idea of leading one segment to implement the dynamic segment method (DSM) in approximate multiplier designs. In [9], a truncation and rounding-based scalable approximate multiplier (TOSAM) was proposed where the multiplicands were truncated with two different lengths and rounded to perform smaller core multiplications. The DSM-based multiplier can provide notably high accuracy, although it has a larger area and higher energy consumption than the SSM design. The SSM-based multiplier has lower accuracy, but it is faster and it consumes less energy compared to other segment-based approximate multipliers such as [8][9]. In this paper, a dual segmentation approximate multiplier is proposed. The proposed multiplier relies on using SSM initially to reduce the bit length of the input operands and then followed by dynamic segment selectors to further reduce the segment size of each operand. The proposed multiplier achieves better performance in terms of the delay, area and energy compared to the DSM-based multiplier while having almost the same accuracy level as the DSM-based multiplier.
Proposed approximate multiplier: Segment-based approximate multipliers can be classified into two categories: dynamic [8][9] and static [7]. The dynamic segment selector requires utilizing extra complex circuitry, which leads to significant energy and area overheads [7]. As seen in Figure 1a, the n-bit leading one detector (LOD) is implemented to locate the most significant one in an n-bit operand. Then an encoder and a MUX capture the following m -1 bits and set the least significant bit of the segment to one. The LOD is the module with the highest power consumption and the largest area in the dynamic segment selector [7]. The static segment selector is much faster, smaller and consumes less energy compared to the dynamic segment selector. However, the approximation using the static segment selector results in large mean relative errors (MREs). The implementation for the static segment selector is presented in Figure 1b where the n-bit multiplicand is split into i (m-bit) segments. Then using OR gates and a MUX, the segment which contains the leading one bit is detected. Thus, the n-bit static segment selector can be implemented in the DSM-based multiplier to truncate the n-bit input operands to n/2-bit segments or even n/4-bit segments, which can In the proposed design, dual segmentation design, the static segment selector truncates the n-bit inputs to the k-bit segments as shown in Figure 2. As previously discussed, the dynamic segment selector of the DSM-based multiplier only needs to identify the leading one bit from the k-bit segment instead of the n-bit operand. After selecting the m-bit segments that contain the leading one bit of each k-bit segment, the chosen segments are applied to an accurate m-bit multiplier. The result of this multiplication is shifted according to the positions of the leading one bit of each n-bit operand to generate the final output. The main benefit of the dual segmentation logic is that the size of the most power-consuming components, LODs, is from n-bit down to k-bit for dynamic segment selectors.
As can be seen in Figures 3a and 4a, the dynamic segment selector captures m-2 bits segment in a number starting from the leading one bit and set the least significant bit of the segment to one when the leading one bit is not in the least significant m-bit. In the case where the leading one bit is within the least significant m-bit of a number, then the most significant bit and the least significant bit of the m-bit segment are set as '1', and the bits between them are approximated by zeros (see Figures 3b and 4b). Thus, the m-bit segment of the proposed selector and the dynamic segment selector always have the same most and least significant bits. Also, the dynamic segment selector and the proposed selector can produce the same m-bit segment when the leading one bit is not within the least m-bit of the k-bit segment. As expected, the proposed multiplier has similar accuracy as the DSM-based multiplier. Figure 5 shows an example of a 16-bit dual approximate multiplication using k = 8 and m = 4. As can be seen in the figure,  the result of the proposed multiplication is the same as that of the DSM-based approximate multiplication. This is due to the same m-bit segments that are captured in the proposed and DSM-based multipliers.
Results and discussion: In this section, the proposed design is compared with an exact multiplier and three previously proposed segment-based approximate multipliers. All designs were implemented using Verilog HDL and synthesized using Synopsys DC complier the Synopsys in a 65 nm library at the typical process corner. All designs are unsigned multipliers. To evaluate the impact of changing the multiplier size, each multiplier is exploited to design 16-bit and 32-bit multipliers, and we fix k = 8 and m = 4. The accuracy is evaluated by using the MRE for 16-bit and 32-bit multiplier designs with ten million uniformly distributed random input pairs.
The results show that the SSM-based multiplier has the lowest delay, energy, area, energy-delay product (EDP) and power-delay-area (PDA) product, but it has the worst MRE. The DSM-based multiplier provides the highest accuracy. The proposed multiplier outperforms the DSM-based multiplier in terms of the delay, area and energy consumption while having almost the same accuracy level as the DSM-based multiplier.
The delay, energy, and EDP of the 16-bit proposed multiplier are approximately 12%, 17% and 27% lower than those of the DSM-based multiplier. Also, the 16-bit proposed multiplier has 21% smaller area compared to the DSM-based multiplier. As can be seen in Table 1, the MRE value of the 16-bit proposed multiplier is about 11% lower than the 16-bit SSM. The 32-bit proposed multiplier improves the speed, area, and energy up to 58%, 92%, and 98% compared to the exact multiplier as can be seen in Table 2. Additionally, the MRE value of the 32-bit proposed design is 18% lower than that of the 32-bit SSM-based multiplier. Conclusion: This paper presented a new design for a dual segmentation approximate multiplier. This proposed design can approximate an n-bit multiplication using an n/4-bit multiplication or even an n/8-bit multiplication. The multiplier relies on using static segmentation initially followed by a dynamic segmentation. The static segment selector implemented in the proposed multiplier significantly reduces the hardware cost of the LOD of the DSM-based approximate multiplier. The efficiencies of the proposed multipliers as 16bit and 32-bit designs were evaluated and compared against an exact multiplier and previously proposed segment-based approximate multipliers. The proposed design achieves a better performance-accuracy trade-off compared to previously proposed segment-based approximate multipliers.