A full-duplex transceiver for 20-Gbps high-speed simultaneous bidirectional signaling across global on-chip interconnections

This paper presents a high-speed simultaneous bidirectional transceiver (SBT) for on-chip wireline communications. A MOS hybrid transistor is utilized to split the received data from the superimposed signal at both ends of the on-chip interconnection with the assistance of two drivers, namely main and auxiliary. Moreover, a high-pass filter (HPF) is used as a differentiator to generate the echo cancelation signal. Consequently, the echo-cancelation for simultaneous bidirectional signaling (SBS) is realized by the combination of the hybrid device and the differentiator. The proposed SBT has been designed and evaluated using 28-nm CMOS technology over a narrow 5-mm on-chip interconnection, which possesses 11.9-dB loss at the Nyquist frequency (half a bit rate). The energy-efficiency of the proposed full-duplex transceiver (FDT) for 20-Gbps simultaneous bidirectional data transmission is 0.147 PJ/b/mm. The performance results show that the proposed SBT has better overall performance compared to the previous architectures reported in the literature to date. The layout of the presented SBT occupies a low area of 1574 μ m 2 .

Lately, SBTs have been presented over on-chip interconnections. [27][28][29][30][31][32][33] A current-mode SBS for on-chip interconnection is studied in Huang et al, 27 which has overall poor performance. The authors are presented a current-mode differential SBT in Huang et al 28 by utilizing an adaptive impedance-matching structure. However, the maximum achievable data rate of 5 Gbps is reported across a 1-mm-short on-chip interconnect. Afterward, a SBS solution has been presented and less than 2-Gbps data rate is attained at the price of increased power consumption. 29 A hybrid circuit topology has been proposed in. 30 The designers are tried to improve the achievable data-rate by employing a MIMO (eight-parallel interconnects and transceivers) architecture. Nevertheless, a 2 Gb/s/channel data rate can be achieved over a 3-mm-short interconnect. Another detailed analysis of a SBT is provided in Wary and Mandal. 31 This current-mode design employs a directional inverter/buffer (DIB) circuit to reverse the transmitting (outbound) signal whilst it passes the inbound signal with the same phase and a certain amplification. Then, the outbound and the inbound signals are separated from each other by adding two signals with proper ratios on both sides of the DIB by using a transconductor circuit. Nevertheless, a low data-rate of 4 Gbps is obtained. Moreover, authors are used a replica-based transceiver in Wary and Mandal, 32 and hence, 10-Gbps speed is reported.
In this paper, an alternative solution is proposed to overcome the problems in the attainment of high-speed FDTs over the on-chip interconnections and improve the performance of the previously reported SBT in Ebrahimi Jarihani et al. 33 A transistor is utilized as a hybrid device to employ the impedance-matching at both sides of the on-chip interconnection, separating the transmitting and the incoming signals and also carrying out voltage-mode echo cancelation with the assistance of an auxiliary and a main drivers. In addition, using the compensation capacitance (C c ) in 33 not only creates the similar loading effects in the presented topology for better echo cancelation in voltage-mode but also lowers the overall bandwidth of the system and limits the maximum achievable data-rate to 16 Gbps at full-duplex operation. Nevertheless, the presented topology 33 is incapable to eliminate the high-frequency components of the selfinerface signal which appears as spikes in the middle of the received data (due to the propagation delay of the interconnect) coming from tranceiver in the other end of the interconnect. In the proposed SBT, a RChigh-pass filter (HPF) is utilized as a differentiator to eliminate residual echo by generation cancelation current and perform better echocancelation by the incorporation of a hybrid device. As the SBT transmits and receives the data on the same interconnect simultaneously, the interconnect utilization is increased by 2X in comparison with the unidirectional transmission which reduces the chip area significantly. After all, the performance of the proposed SBT was assessed by post-layout simulation results that are realized in 28-nm CMOS process across a 5-mm on-chip interconnection. 34 Thanks to the proposed SBT, the high speed of 20 Gbps is achieved.
The rest of the paper is organized as follows: Section 2 describes the architecture of the proposed SBT. Section 3 analyses the transistor level of design and the simulation results are discussed in Section 4. Lastly, the paper is concluded in Section 5.

| ARCHITECTURE DESIGN
SBTs or FDTs use an identical interconnection for transmitting and receiving signals at the same time for a twofold increase in the data-rate. This forms a superimposed signal by the combination of inbound (V ib ) and outbound (V ob ) signals at both sides of the interconnect. Therefore, SBT suffers from the self-interposition owing to the superposition of the inbound and outbound signals. To separate the received data from the superimposed signal and perform echo-cancelation, a hybrid circuitry and/or echo-cancelation block is required. Thus, a new architecture to perform echo cancelation in SBTs with comprehensive analysis is presented. A single-ended block drawing of the presented SBT is illustrated in Figure 1.
The proposed SBT includes a hybrid transistor (M hybrid ), the main and the auxiliary drivers, an echo-cancelation (EC) block, and a trans-impedance amplifier (TIA). The hybrid MOS device is used not only to separate the transmitted and received signals from each other at both sides of the on-chip interconnect, but also employs low-impedance termination to enhance the bandwidth of the system. Moreover, it converts the variations in its source-gate voltage (Vsg) to a small-signal drain current (i D ) by virtue of its transconductance ( g m ). 35 Nevertheless, by producing identical (equal magnitude and in-phase) signals at the source and the gate of the M hybrid (V g =V ob ), V sg variation will not be a function of the outbound signal. Accordingly, the i D of the hybrid transistor is predominantly a function of the inbound signal (V ib ) from the SBT on the other end of the interconnection, which is sensed on V sg . Afterward, the detected inbound or received voltage signal (V ib ) is converted to the inbound current (I ib ) by the g m of the hybrid transistor.
The main driver generates the signal (V g ) at the gate of the M hybrid for transmitting. However, M hybrid acts as common-drain (CD) in the transmitting functionality of the SBT. Thus, the signal at the source of M hybrid is V ob,M ¼ αV g , which is created by the main driver while α ¼ A vðCDÞ <1. To realize echo cancelation, the gate and the source voltages of the M hybrid must be in-phase with equal amplitude. To fulfill the above-mentioned condition, an auxiliary driver is utilized in the SBT. It generates the signal V ob,A ¼ βV ob as a function of the steering current (i tx, A ) from the auxiliary driver and the impedance seen by its output (R A ). Mathematically, V ob can be expressed by Equation 1 at both ends of the interconnection.
In order to obtain V ob ¼ V g and in turn having zero echo signal, α + β must be equal to 1. Consequently, the auxiliary and the main drivers produce the superposition signal V ob altogether. However, the loading is different at nodes A and B, which leads to the leakage of the strong transmitted signal into the receiver. Therefore, the full-duplex operation that deals with the received signal may comprise the undesirable signal I e as residual echo, which is stemming from the ill-matched V ob and V g signals and includes the high-frequency components.
Therefore, considering the SBTs at both ends of the interconnect which transmits data simultaneously, the drain current is the combination of two components. The current that comes from the far-end SBT (I ib ) and the echo signal (I e ) as the leakage near-end current. To eliminate the I e , an echo-canceling current (I c ) is generated by the EC block which is approximately equal and opposite to I e (I c = -I e ). Finally, the received current signal (I rx ¼ I ib þ I e þ I c ) is amplified and converted into the output received voltage (V rx ) by the TIA for further processing. Figure 2 shows the single-ended transistor-level schematic of the proposed FDT. The main driver generates a significant portion of the outbound signal. At the design of the main driver, a structure similar to the hybrid branch is used to minimize the process mismatches. The portion of the transmitted signal, which is generated by the main driver (V ob,M ¼ αV g,B ) can be calculated by the following equations:

| CIRCUIT DESIGN
where V g,B is the gate-voltage created by the main driver at node B, i tx,M is the current steering from the main driver, and R int is the ohmic resistance of the interconnection. A simple differential pair including M 3 , M 4 and M 5 transistors, is utilized as an auxiliary driver. The M 4 and M 5 transistors can operate in both ON or OFF modes, according to the applied pseudorandom binary sequence (PRBS) data. These transistors will be operated in the saturation region in their ON state and the driver will benefit from a higher output impedance. Moreover, the steering current i tx,A from M 3 device is used to control the amplitude of the outbound signal at node A and fulfill the echo-cancelation condition. Consequently, the auxiliary driver generates the minor portion of the transmitting signal, which can be expressed as The hybrid circuitry consists of two transistors (M 1 & M 2 ) and a resistor. The transistor M 2 and resistor R are used to bias the transistor M 1 , which operates as a hybrid device.
Generally, a transceiver suffers from the signal distortion, reflection, and attenuation in the lack of proper termination for impedance matching. Therefore, impedance matching between the transceivers and the interconnect is an important issue. Unlike the off-chip transmission lines, termination with the characteristic impedance (Z o ) of the TL is not required for on-chip interconnections. Because the characteristics and behavior of the on-chip and off-chip interconnections are different. However, the bandwidth of the full-duplex systems can be enhanced by employing a low-impedance termination at both sides of the interconnection, 11 which leads to better signal integrity and lower bit error rate (BER). Thus, low-impedance termination is performed by the impedance seen from the source of the hybrid transistor (≈ 1/g m ).
However, perfect echo-cancelation can be performed in the flat portion of the transmitting signal (contain the lowfrequency components) by using M hybrid , auxiliary and main drivers. This is because of the RC dominated behavior of F I G U R E 2 Single-ended transistor level of the proposed full-duplex transceiver the on-chip interconnects. The interconnect capacitance (C int ) is almost 1 PF for the used on-chip interconnection, which has a length of 5 mm 34 while the total junction capacitances for M 1 , M 2 , and M 5 are approximately 60 fF. Therefore, the rise/fall times are slower due to the large capacitance in node A (C A ¼ C int þ C D2 þ C S1 þ C D5 ≈ C int ). On the other hand, the signal V g,B has a faster rise/fall times in node B. Therefore, the hybrid device cannot eliminate the self-interface of the SBT completely by the help of the auxiliary driver. Therefore, some uncanceled echo signal (I e ) which includes the high-frequency components leakages into the receiver. I e appears similar to spikes at rising/falling edges and it is proportional to the derivative of the transmitting signal. Hence, a high-pass filter (HPF) which acts as a differentiator, 36 is employed to produce the echo-canceling signal I c . The transfer function of a first-order RC HPF is where ω c ¼ 1 R f C f is the corner frequency of the HPF. According to Equation (5), the echo-canceling signal I c (t) is proportional to Equation (6). Moreover, the series resistor R s is used to attenuate the echo-canceling signal.
So as to evaluate the echo cancelation functioning of the proposed full-duplex transceiver, the following steps have been performed. First, the proposed transceiver is evaluated in half-duplex mode. Therefore, PRBS data is applied only to the near-end transceiver. So, the far-end one is kept in its receiver mode. In this case, the signal at the output node of the near end transceiver is fundamentally the echo signal I e . To eliminate the echo signal (I e ), the values of the C f , R f , and R s are optimized to generate the corresponding echo-canceling signal I c approximately equal and opposite to I e (I c ffi -I e ). Finally, the signal is applied to SBTs at both ends of the interconnect to perform simultaneous bidirectional signaling. In this case, the output current signal is the sum of I e and I ib . The simulated waveforms are plotted in Figure 3.
It shows that a good amount of echo-cancelation is performed. Even though there is still some amount of uncanceled echo-signal, however, the residual echo or noise signal (I n ) is insignificant in comparison with the amplitude of the received signal from the far-end transceiver in full-duplex operation. Therefore, the received signal (I rx + I n ) is detectable in the presence of the residual echo.

| RESULTS AND DISCUSSION
The transistor-level implementation of the proposed SBT has been realized in TSMC 28-nm standard CMOS technology. Figure 4 indicates the layout of the proposed SBT has a small area of 33.2 μm Â 47.7 μm. To evaluate the functionality of the proposed SBT, the post-layout simulations are carried out and its performance was evaluated across an on-chip link. The on-chip interconnect which is used for simulations has a length of 5 mm and width of 0.6 μm while there is 1-μm distance between the interconnect and the adjacent shield layers. 34 The simultaneous bidirectional signaling has been performed by transmitting 10-Gbps data from each of the SBTs at both sides of the on-chip interconnection. Consequently, a total data-rate of 20 Gbps is obtained (i.e., bit period is 100 ps). The applied data streams are 2 7 À 1 random bit patterns, which are produced by an on-chip PRBS generator. 37 Figure 5 illustrates the differential voltage eye diagrams of the received data for both FDTs.
The eye diagrams have a vertical opening of 165 and 170 mV and horizontal openings of 79 and 78 ps, respectively. The random and peak to peak jitters of the eyes for both FDTs are almost 5 and 22 ps, respectively. To perform 20-Gbps full-duplex operation, each transceiver consumes 14.7-mW power from 0.9 V supply voltage. Thus, the SBT including TIA has an energy efficiency of 0.147 pJ/b/mm. The transmitted and the received data pattern of one of the SBTs can be seen in Figure 6.
Higher bit rates reduce the sampling/bit period. Subsequently, the jitter measured as a percentage of the bit period and referred to as unit interval (UI) affects the signal integrity significantly which may translate in increased bit errors. Therefore, the bit error rate (BER) performance of the SBTs at both ends of the interconnect has been evaluated. Figure 7 shows the horizontal (timing) bathtub curves have 29% UI and 31% UI at BER of 10 À12 , respectively. The robustness of the echo-cancelation against the process, supply voltage, and temperature (PVT) variations was also tested. For this purpose, these variations are applied to the SBT at 2 Â 10-Gbps full-duplex operation. The variations of horizontal and vertical eye openings for the different process corners are shown in Figure 8. The worst-case performance occurs in the slow-slow (SS) corner where the vertical and the horizontal openings are reduced to 105 mV and 64 ps, respectively. Nevertheless, the eye is still open enough and leads to reliable data detection.
The performance of the SBT was then evaluated while the supply voltage varies in the range of 0.85 to 0.95 V. The simulation results are plotted in Figure 9A and shows that the eye opening increases by increasing the supply voltage and contrariwise. Similarly, the variation of the horizontal and the vertical eye openings are observed by varying the temperature from À20 C to 100 C. The simulation results are shown in Figure 9B. By decreasing the temperature from F I G U R E 4 Layout of the proposed full-duplex transceiver room temperature to À20 C and increasing to 100 C, the vertical eye opening increases by 6% and decreases by 15%, respectively. Nonetheless, the horizontal opening changes less than 10 ps for the whole range. Figure 10 shows the variation of the maximum height and width of the eye for 200 runs of Monte Carlo simulation. Based on the mean and the standard deviation values given in Figure 10, it can be concluded that the proposed SBT is satisfactorily durable against the device mismatches.
Previously reported designs are implemented in different CMOS technologies and a brief explanation about these solutions have been given in the introduction. However, most of the architectures are unidirectional and only a few numbers of SBTs are reported in the literature. Table 1 summarizes the performance comparison of the proposed SBT with the state-of-the-art full-duplex transceivers over on-chip interconnects. Thanks to the proposed SBT, the maximum achievable data rate is doubled with respect to Wary and Mandal 32 and improved by 25% in comparison with Ebrahimi Jarihani et al. 33 Therefore, the proposed solution has the highest data rate (20 Gbps) among the previously reported FDTs while consuming comparable power. [28][29][30][31][32][33] The design reported in Wary and Mandal 32 is superior in terms of energy-efficiency while this solution offers a lower data-rate of 10 Gbps. It can be imagined that the performance of the solution described in 28 will be degraded by using longer on-chip interconnections. Resistance and insertion loss of the interconnections will be increased by diminishing the width of them. Moreover, it leads to consuming more power and also a reduction in the overall bandwidth of the full-duplex system. For this reason, the performance of the proposed SBT is assessed over a narrow link. In conclusion, the overall performance of the proposed SBT is superior compared with the state-of-the-art.

| CONCLUSIONS
This paper presented a 20-Gbps high-speed transceiver architecture for simultaneous bidirectional data transmission over an on-chip interconnection. A hybrid transistor is used to split the received signal from the superimposed signal at both ends of the interconnection and accomplish echo-cancelation with the additional usage of a differentiator. In addition, the hybrid device is utilized as an active low-impedance termination to enhance the operation bandwidth of the  transceiver. The proposed SBT structure has been realized in 28-nm low power CMOS technology with a supply voltage of 0.9 V. The FDT has an energy-efficiency of 0.147 pJ/b/mm for full-duplex data transmission of 20 Gbps. The performance of the proposed FDT has been validated by performing post-layout simulations and results have been carried out over an on-chip interconnection with a length of 5 mm and narrow width of 0.6 μm. The results show the robustness of the SBT to the introduced PVT variations and device mismatches while performing FD operation. The proposed architecture achieves the highest data rate among the previously reported SBTs, which makes it suitable for high-speed die-to-die on-chip wireline communications like SoC applications.

FUNDING
This research was funded by the Austrian Research Promotion Agency (FFG) under the Ideation project.