Adaptive data ‐ transition decision feedback equaliser with edge emphasis

Adaptive data ‐ transition (DT) decision feedback equalisation (DFE) with edge ‐ emphasised (EE) taps is presented. DFE is performed when DT is detected using a loop ‐ unrolling algorithm. EE ‐ taps provide enhanced equalisation at the edges of data eyes, where the impact of channel impairment is most severe, and adequate equalisation between the edges and centre of data eyes to achieve both minimum jitter and maximum vertical eye ‐ opening of equalised data. Reference voltages for measuring the DFE error signal are adjusted to further improve the vertical eye ‐ opening of equalised data. The effectiveness of the proposed DFE is investigated using the simulation results of a 10 Gbps (gigabits per second) link over a backplane channel with − 23.9 dB channel loss at the baud ‐ rate in TSMC 65 nm 1.2 V CMOS technology. Simulation results show that DT ‐ DFE with EE ‐ taps improves vertical eye ‐ opening by 3.9 times and lowers data jitter by 4.86 times.


| INTRODUCTION
The date rates of serial links over wire channel are limited primarily by intersymbol interference (ISI). ISI can be minimised by means of channel equalisation with decision feedback equalisation (DFE), which is the most robust and widely used method [1]. Most reported DFE falls into the category of datastate (DS) DFE, where taps are chosen to maximise the vertical eye-opening of equalised data [2][3][4]. DS-DFE suffers from the inherent drawback of reduced vertical eye-opening when consecutive 1s or 0s are encountered in data, as evidenced in Figure 1, where an eye diagram of 5-GHz equalised data with 1-tap DS-DFE is shown [5]. This drawback can be eliminated by activating DFE only when a data transition is present, as evidenced in Figure 2. DFE of such a nature is termed datatransition (DT) DFE [6]. A drawback of the reported DT-DFE is that tap strength remains constant over the duration of data, even though the impact of channel imperfection is most severe at the edges of data where high-frequency data components concentrate and is least severe at the centre of data where the low-frequency data components reside. As a result, although DS-DFE equalised eyes have good vertical eyeopening at the centre of data eyes, eye-opening deteriorates rapidly when moving away from the centre of data eyes, resulting in the poor jitter of equalised data. To address this result, a DT-DFE with edge-emphasised (EE) taps is proposed that provides enhanced equalisation at the edges of data eyes and adequate equalisation at the centre of data eyes to both minimise jitter and maximise the vertical eye-opening of equalised data. The preliminary results of this study are reported in [7]. This study extends our preliminary investigation of DT-DFE with EE-taps by providing an in-depth comparison of DS-DFE and DT-DFE characteristics backed by simulation results. In addition, it details the principle, design, and implementation of adaptive DT-DFE with EE-taps. Further, it performs clock and data recovery (CDR) from data equalised with the proposed DT-DFE of EE-taps using a blind 4x oversampling approach. Finally, it provides an in-depth comparison of the simulation results of a 10 Gbps data link over a backplane channel that is equalised with DS-DFE of constant taps, DT-DFE of constant taps, and DT-DFE of EE-taps.
The remainder of the paper is organised as follows. Section 2 introduces adaptive DT-DFE with edge emphasis, specifically EE-taps; the algorithm of DT-DFE with EE-taps; the detection of data transition using a loop-unrolling approach; the adaptation of EE-taps; and adjustment of reference voltages to maximise vertical eye-opening of equalised data. Section 3 presents 4x blind oversampling CDR.
Simulation results validating the proposed DT-DFE with EE-taps are presented in Section 4. The paper concludes with Section 5.

| ADAPTIVE DATA-TRANSITION DECISION FEEDBACK EQUALISER WITH EDGE EMPHASIS
The strength of the EE-tap is set to provide enhanced equalisation at the edges of data eyes and adequate equalisation at the centre of data eyes, as shown in Figure 3, with its implementation detailed in Figure 4. In this study, the duration of each data is partitioned into four equal intervals, τ = T s /4. The EE-tap consists of subtaps 1, 2, and 3, whose strength is given by c 1 + c 2 + c 3 , c 1 + c 2 , and c 3 and duration by 2τ, 3τ, and 4τ, respectively. The gating signals of the subtaps are the buffered outputs of the digitally controlled ring oscillator shown in Figure 5. Similar to the DT-DFE with a constant tap (C-tap) proposed in [6], DT-DFE with EE-tap is activated only when a data transition occurs. The equalised data in the nth cycle, denoted by y EE,n , is given by where x n is the unequalised data, c EE,n is the EE-tap, D n−1 is the previous state, and S n = 1 when a data transition is present and 0 otherwise. Because two consecutive data transitions with opposite polarities-for example 010 and 101-contain more high-frequency components than other data patterns, EE-taps are optimised using the following sign-sign-sign least-meansquare (sign 3 -LMS) algorithm, which maximises the vertical eye-opening of the equalised data when two consecutive data transitions of opposite polarities occur: where Δ tap is the step size to adjust c EE , e n is the difference (error) between equalised and desired data measured at the centre of the data eye, and sign[x] = 1 if x > 0 and −1 if x < 0. Similar to DS-DFE, the error signal e n in (2) is the difference between the voltage of the desired and that of the equalised data measured at the centre of the data eye. Although one could use errors measured at the locations where subtaps are F I G U R E 1 Simulated eye diagram (data-state decision feedback equalisation, 5 Gbps, channel: Figure 12) F I G U R E 2 Simulated eye diagram (data-transition decision feedback equalisation, 5 Gbps, channel: Figure 12) LI AND YUAN added to/subtracted from unequalised data, such an approach is rather costly simply due to the need for multiple slicers. It is also difficult to implement due to the unavailability of the voltage of the desired data at the locations of the subtaps. Because S n = D n ⊕ D n−1 , where ⊕ is the exclusive-OR operator and D n is not available at the time when y EE,n is computed, a loop-unrolling-alike approach is used to overcome this difficulty. Specifically, DT-DFE is performed for both D n = 1 and D n = 0 blindly, and the appropriate result is selected once D n becomes available, as detailed in Figure 6. Paths 010 and 101 perform DT-DFE with EE-tap when data are 010,101…and 101,010…, respectively. Paths 11 and 00 perform DT-DFE with C-tap when data are consecutive 1s and 0s, respectively; specifically, they equalise data that do not have two consecutive data transitions with opposite polarities such as 011…and 100…. Incoming data are processed by all four paths simultaneously, and proper equalised data are selected once information on data transition becomes available. Let us examine paths 11 and 00 first. Assume D n D n−1 = 11. 11 . Because S n = 0, we have X n = 1. As a result, the output of path 11 is selected by MUX 2. To verify whether the selection is correct, we make use of Figure 7, where the solid and dashed lines represent unequalised and equalised data, respectively. Because D n = 1, in path 11, v þ c is incremented by c 11 and v − c is decremented by c 11 , while in path 00, v þ D is decremented by c 00 and v − D is incremented by c 00 . The decision of the slicer in path 11 whose input is increased should therefore be chosen. Similarly, one can show that if D n D n−1 = 00, the decision of the slicer in path 00 will be selected. Figure 8 shows how taps c 11 and c 00 are adjusted. The slicers determine whether the output of the gain stage is larger or less than the reference voltages. The delay flip-flops (DFFs) following the slicers are for synchronisation. Enabling Taps c 00 and c 11 also need to be adjusted accordingly using paths 00 and 11, respectively. Move on to paths 010 and 101. If data are 010, path 010 will perform equalisation on both the rising and the falling edges. If data are 011, although path 010 will perform equalisation on the rising and falling edges, the result of the falling edge will not be selected, as path 11 will be selected in this case. Similar operations are performed by path 101. S n is generated using the circuit shown in Figure 6, with its operation depicted in Figure 9. Assume that 1 is transmitted. The output of the slicer in path 11 is selected because If the next bit is 0, a 1 → 0 data transition will occur. Because the data is 0, the output of the slicer in path 00 is selected as The XOR2 gate whose output is T 10 flags the occurrence of a 1 → 0 transition. Figure 9b depicts the detection of a 0 → 1 data transition. Tap c EE is obtained by transmitting data of two consecutive data transitions of opposite directions (S n S n−1 = 1) and determined using the following sign 4 -LMS algorithm that minimises the power of the difference between the desired and equalised data: Although the EE-tap has three subtaps, since unequalised data typically follow a sinusoidal profile, to minimise cost, the weight of the subtaps is set in a sinusoidal profile such that only one parameter, and subsequently one charge pump, needs to be adjusted. Figure 10 shows the circuit that implements (3).
o;A = 0, D n−1 = 0 will follow. D n−1 is used to select charge pump CP EE,A− and deselect charge pump CP EE,A+ . These charge pumps drive the same low-pass filter whose output voltage adjusts c EE .

| CLOCK-AND-DATA RECOVERY
Two approaches, namely phase-tracking and phase-picking, are at our disposal for CDR. The former uses a phase-locked loop that locks to the edges of equalised data to synchronise a locally generated clock with incoming data, whereas the latter allocates the edges of equalised data by oversampling the data multiple times per symbol time and locating transition edges using elementary digital logic. Phase-tracking offers the advantage of excellent resilience to the impact of processvoltage-temperature (PVT) uncertainty but suffers from a large latency due to the time needed for the phase-locked loop to establish lock and is therefore not particularly attractive for Gbps data links when data rates are high. Phase-picking, on the other hand, features a rapid edge allocation attributive to its open-loop operating nature but suffers from a large phase error and high sensitivity to the impact of PVT uncertainty. To improve phase resolution, phase interpolators are often needed [8][9][10]. In this work, a blind 4x oversampling approach similar to that given in [11] was used to recover both clock and data. Figure 11 provides the detailed operation of the blind 4x oversampling CDR. Each data symbol is sampled four times, thereby yielding a phase resolution of 45°. To make the edge selector less sensitive to the erroneous decisions of the edge generator caused by disturbances coupled into the samplers, edge accumulators were implemented using counters and functioning as low-pass filters [12]. Because CDR allocates the rising edge of the data, while gating signals in Figure 5 have a specific phase arrangement, a phase selector that selects the appropriate outputs of the digitally controlled oscillator as the gating signals of the EE-tap is needed. Once the edge of data is located, the clock is recovered. A clock signal that is 90°away from the allocated edge will be used to recover data. In this work, edge information will be updated when two consecutive allocations of the edge are identical. Such an arrangement is important, as it minimises the impact of any erroneous decisions of the samplers in CDR.

| SIMULATION RESULTS
The proposed adaptive DT-DFE with EE-taps was designed in an IBM 65 nm 1.2 V CMOS technology and used to equalise a backplane channel whose time and frequency responses are given in Figure 12 [13]. The channel consists of two identical sections, each with three backplane traces and two connection points and an overall channel length of 66 cm and loss of 30 dB at 100 Gbps. This channel was selected to assess the effectiveness of the proposed adaptive DT-DFE with EE-taps because of the availability of measurement data of channel response that can be readily incorporated in the simulation. The 4x blind oversampling CDR scheme detailed in Section 3 was used for CDR. The first example used to quantify the effectiveness of the proposed adaptive DT-DFE with EE-taps is a 5-Gbps data link over the channel whose characteristics are detailed in The EE-tap consists of seven subtaps of equal duration T s /8 with three subtaps before the data transition, three subtaps after the data transition, and a large subtap at the data transition, as shown in Figure 13b. Although fewer or more subtaps could be used, the former suffers from the drawback of possible inadequate equalisation, whereas the latter provides better equalisation at the expense of more power consumption and silicon area. No tap was placed at the centre of data eyes in this example, as we wanted to specifically investigate the impact of the absence of a tap at the centre of data eyes on the vertical eye-opening of equalised data. Further no adaptive DFE was implemented in this example, as we wanted to manually adjust the strength of the subtaps to allow us to gain more insightful information about and a better understanding of the impact of the strength of the subtaps on the performance of DT-DFE with EE-taps. The strength of the subtaps, specifically the current of the subtaps, was manually adjusted according to the predefined profile of equalised data. Specifically, because unequalised data typically follow a sinusoidal profile, and the ideally equalised data is a square wave, the strength of subtaps 1, 2, 3, 5, 6 and 7 was the difference between ideally equalised data and unequalised data, as shown in Figure 13(b). Subtap 4 at the threshold-crossing was chosen to be large in order to maximise transition slopes.  Figure 14 shows the key signals of the data link. It can be seen in Figure 14d-f that the tap of DS-DFE is activated in every data eye, while that of DT-DFE is only activated when a data transition is present. Both taps are constant over their respective time interval. The duration of the DS-DFE is set by the number of consecutive 1s and 0s, while that of the DT-DFE is only one symbol time. The unequalised and equalised data with DS-DFE are compared in Figure 14c. It is seen that DS-DFE improves the vertical eye-opening of high-frequency data at the expense of the reduced vertical eye-opening of lowfrequency data. The improvement in transition slopes is rather marginal. The unequalised and equalised data with DT-DFE of a constant tap are compared in Figure 14b. It is seen that because DT-DFE is activated only when a data transition is present, the vertical eye-opening is comparable to that with DS-DFE. No reduction in the vertical eye-opening of lowfrequency data is observed. Both DS-DFE and DT-DFE with constant tap marginally improve transition slopes. The unequalised and equalised data with DT-DFE of the EE-tap are compared in Figure 14a. It can be seen that DT-DFE with EE-tap sharply improves transition slopes and the vertical eyeopening of high-frequency data without sacrificing the vertical eye-opening of low-frequency data. Figure 15 and Table 1 compare the eye diagram of equalised data with DS-DFE, DT-DFE with constant tap, and DT-DFE with EE-tap. The percentage indicates the improvement over the immediate left counterpart. DT-DFE with EE-tap is shown to improve horizontal eye-opening (H-opening) by 8.5% compared with DT-DFE with C-tap. Vertical eye-opening is reduced by 19.9% due to the absence of taps at the centre of data eyes. DT-DFE with EE-tap improves edge slope by 124.7%.
The second example is a 10-Gbps serial link over the same channel using the proposed DT-DFE with EE-tap and blind 4x-oversampling CDR. Although increasing the oversampling ratio of oversampling CDR improves phase resolution, it is at the cost of more power and silicon area [14]. Four-stage differential ring oscillators are generally preferred in CDR because their quadrature phases, which are needed for CDR, are readily available without employing costly phase interpolators; blind 4x-oversampling CDR was used in this work as well as a four-stage 5-GHz differential ring oscillator. Figure 16 (top) plots the eye diagram of unequalised and equalised data. Unlike the preceding example where each data symbol is partitioned to eight subintervals of equal duration, each data symbol in this example is partitioned to four subintervals of equal duration T s /4, as detailed earlier in Figure 3, allowing blind 4x-oversampling CDR. Although one could increase the number of subintervals per symbol duration to improve equalisation, doing so is costly due to the increasing number of subtap generators, the number of  Table 2, which shows that DT-DFE with EE-tap improves vertical eye-opening by 146%. When reference voltages are adjusted, vertical eyeopening is further improved by 60%. DT-DFE with EE-tap improves data jitter by 59%. When reference voltages are adjusted without phase-picking, data jitter is improved by 13.7%. Once phase-picking is performed, jitter is further improved by 42%. Overall, vertical eye-opening is increased by 3.9 times, data jitter is reduced by 4.76 times, and the slope of transition edges is improved by 2.83 times. Figure 17 plots transmitted, received, equalised, and recovered data along with the key voltages of the link. As shown, the proposed DT-DFE with EE-tap and blind 4x oversampling CDR correctly recovers the transmitted data. Figure 18 shows the adaptive process of c 11 , c 00 , and c EE , which are the output voltages of the charge pumps that tune c 11 , c 00 , and c EE , respectively. It is seen that tap adaptation processes converge and complete in approximately 20 ns. Table 3 compares the cost of DT-DFE with C-tap and DT-DFE with EE-tap. Table 4 provides a breakdown of power consumption by block.