The implementation of the clustered-OFDM-based transceiver on an FPGA device: A comprehensive comparison

Aiming at reducing hardware complexity and power consumption of multicarrier-based broadband transceivers, this contribution discusses and analyzes the implementation of clustered-orthogonal frequency division multiplexing (OFDM)-based transceiver in a ﬁeld programmable gate array (FPGA) device. In this sense, several OFDM schemes covering baseband and passband data communications are implemented. The hardware resource utilization and power consumption related to each scheme, after being implemented in the FPGA device, are discussed and compared. Numerical results show that the OFDM scheme based on  ( ⋅ )-II and  ( ⋅ )-III, which is a modiﬁed version of the hermitian symmetric (HS)-OFDM scheme, which performs both baseband and passband data communications, demands the lowest hardware resource utilization and power consumption, reducing to 1/3 the hardware resource utilization. As a result,  ( ⋅ )-II and  ( ⋅ )-III constitute an interesting candidate for implementing new generations of high data rate, low hardware resource utilization, and low-power consumption clustered-OFDM-based transceivers for broadband data communication. Last but not least, numerical results show that the controllers used to integrate all components of a clustered-OFDM-based transceiver can demand sizeable hardware resource utilization and power consumption and, therefore, they have


INTRODUCTION
Nowadays, there are increasing demands for ubiquitous telecommunication infrastructures that are capable of fulfilling applications requesting disparate data network resources (e.g. Smart City, smart grids, the Internet of Things (IoT), and Industry 4.0). These applications cover the distances demanded by industrial, urban, and rural facilities [1][2][3][4][5]. Several wireless and wired technologies are being introduced and deployed (e.g. 5G, Long Range (LoRA), SigFox, narrowband and broadband power line communication (PLC), visible light communication) [6,7] to cope with these growing telecommunications demands. Moreover, [8,9] show a tendency to enormous power consumption in the telecommunication sector. Therefore, it is necessary to introduce new generations of low-cost and energyefficient data communication devices because their widespread use drives the astonishing increase in power consumption in the telecommunication sector.
Among the existing technologies, PLC emerges because of the benefits associated with the use of already deployed electric power systems' infrastructures [5,10]. In PLC systems, the use of the orthogonal frequency division multiplexing (OFDM) scheme prevails because the whole frequency spectrum is divided into multiple narrowband subchannels; the OFDM scheme can be efficiently implemented with the fast Fourier transform (FFT. algorithm; and that the channel equalization is remarkably simplified when the channel convolutional matrix is circulant. As a consequence, the OFDM scheme have been included in HomePlug [11], IEEE 1901 [11], IEEE 1901.2 [11], ITU.T G.hn [11] and ITU-T G.hnem [11], and PRIME [11] standards. Furthermore, the use of orthogonal frequency division multiplexing access (OFDMA) was suggested for allowing multiple access in a PLC network [12,13]. In OFDMA, multiple access is attained by assigning subsets of subcarriers to individual users for some time (i.e. on a time × frequency basis.
In the sequel, the suitability of clustered-OFDM for broadband PLC systems [14] was theoretically investigated. The main advantage offered by the clustered-OFDM scheme over OFDM and OFDMA is the reduced complexity of transceiver used by users because they are designed to operate in a subband of the available bandwidth, while OFDM and OFDMA operates in the entire bandwidth. On the other hand, the base station needs one OFDM scheme per subband, meaning that the base station transceiver's complexity is high compared to the transceiver of a user. Due to its characteristics, clustered-OFDM may not offer the maximum data rate offered by OFDM or OFDMA schemes when a single user operates using the whole frequency band [15][16][17]. From the author's point of view, clustered-OFDM is a compromise between OFDM and OFDMA in terms of flexibility for multiple access under a hardware resource constraint of a transceiver.
In [14] the use of the modified version of discrete multitone modulation (DMT) for performing data communications through both baseband and passband channels by using only one-channel analog to digital converter (ADC) and digital to analog converter (DAC) was introduced. In the sequel, [18] provided a comprehensive analysis of digital filters that must be applied to implement the modified version of the DMT for both baseband and passband data communications. This analysis covered finite impulse response (FIR) and infinite impulse response (IIR) digital filters under finite precision. However, a discussion about the hardware complexity of clustered-OFDM is missing and, therefore, demanding research efforts to point out the hardware resource it will entail.
Besides the power consumption analysis, the hardware resource utilization analysis of a practical implementation of a clustered-OFDM-based transceiver minimally requires the following components: physical (PHY) layer transmitter unit; PHY layer receiver unit; medium access control (MAC) sub-layer strategies implementation; parameterization functionality; synchronization among units; dynamic exchange of parameters between MAC PHY layer; internal buffers and memory control; DAC control; ADC control; Ethernet data flow control; clock internal distribution; and input/output (I/O) interface availability.
Aiming to investigate the gains associated with a clustered-OFDM-based transceiver, which is designed under the constraints mentioned above, for constituting broadband PLC systems, this paper outlines the prototype of a clustered-OFDMbased transceiver, which is supposed to allow both baseband and passband data communications, by using a reconfigurable chip (RC) based on an FPGA device. The performance is evaluated in terms of hardware resource utilization and power consumption 1 when the Altera's Cyclone IV EP4CE115 FPGA device [21] is used. Overall, the following contributions are addressed in this work.
• Discussion about development platforms for implementing OFDM-based transceivers. It allows a comparison among the main issues that have to drive the choice of a platform suit-1 A previous contribution addressed bit error rate performance and computational complexity, see [14,15,[18][19][20].
able for the practical implementation of a clustered-OFDMbased PLC transceiver (e.g. cost, flexibility, and development time). • Implementation description of a clustered-OFDM-based transceiver in the Altera's Cyclone IV EP4CE115 FPGA. The implementations consider device when five OFDM transmitter and six OFDM receivers (i.e. single-side band (SSB)-OFDM, double-sideband (DSB)-OFDM, hermitian symmetric (HS)-OFDM and the modified versions of HS-OFDM 2 , are applied to perform both baseband and passband data communications, as a clustered-OFDM-based transceiver requires it. Also, the relevance of the controllers' hardware complexities is addressed. • Performance analysis of clustered-OFDM-based transceiver, in terms of hardware resource utilization and power consumption, when all OFDM-based schemes are considered to implement such a transceiver.
The major findings of this work are as follows: • The hardware resource utilization and the power consumption related to the implementation of clustered-OFDM varies according to the employed OFDM scheme. For instance, HS-OFDM must be combined with SSB-OFDM or DSB-OFDM to perform both baseband and passband data communications, respectively, while a modified version of HS-OFDM can be solely used to perform both of them. As a consequence, hardware resource utilization is remarkably reduced by using a modified version of HS-OFDM. The modified version of HS-OFDM, which is defined by the combination of  (⋅)-II and (⋅)-III reduces to 1/3 the hardware resource utilization in comparison to the clustered-OFDM-based transceivers that use HS-OFDM together with SSB-OFDM or DSB-OFDM to perform baseband and passband data communications, respectively. • The majority of hardware resource is consumed by the physical layer of the clustered-OFDM-based transceiver. Therefore, it is recommended to drive attention to the optimization of its implementation. Moreover, it was found that the hardware resource utilization associated with the transceiver's controllers is not irrelevant, and, as a consequence, the optimization of its implementation needs to be pursued. • Regarding the modified version of HS-OFDM, numerical results show that the order of the digital filter, which is the main component for performing baseband and passband data communications, can remarkably influence the hardware resource utilization of a clustered-OFDM-based transceiver, see [18] for specific details about it.
This paper is organized as follows: Section 2 reviews the use of platforms for prototyping OFDM transceivers. Section 3 discusses FPGA implementation of the prototype of an OFDM transceiver. Section 4 briefly describes typical schemes for OFDM transceivers and discusses their implementations on an FPGA platform for usage in a clustered-OFDM-based PLC system. Section 5 addresses modified schemes for OFDM transceivers and their implementation on an FPGA platform. Section 6 presents numerical results, and Section 7 poses the concluding remarks.

DEVELOPMENT PLATFORMS FOR OFDM TRANSCEIVERS: A REVIEW
Development platforms for OFDM transceivers are built on software, Digital Signal Processor (DSP), Application Specific Integrated Circuits (ASIC), Application-Specific Instruction set Processor (ASIP), and FPGA devices. Software, DSP and FPGA were initially conceived to facilitate new designs and to allow easy modification of the existing ones. Currently, they are applied in several commercial applications.
The use of general-purpose processor (GPP) and DSPs were addressed in [22,23] and [24], respectively. However, the results show that these implementations are not power efficient for high-speed data rates. The high power consumption is due to the use of several GPPs or DSPs cores in parallel. On the other hand, for low-speed data rates, this strategy has been successfully applied to introduced PLC technologies for smart grid communication applications [11,25] Considering ASICs, the majority of the work is conducted by integrated circuit (IC) manufacturers addressing wireless and wireline standards. Almost all contributions focus on individual components of the OFDM transceivers, such as the FFT and inverse FFT (IFFT), time synchronization, channel coding, channel estimation [26,27]. In [28] the design of two complementary metal-oxide-semiconductor (CMOS) based chips for digital baseband communications is presented, where they are part of an OFDM system partially conforming to HiperLAN/2 and IEEE802.11a standards. They are suitable for low-power devices such as mobile and hand-held terminals. Usually, lowcost, high-performance, and low-power consumption characterize transceivers designed with ASIC, however, the lack of flexibility does not help ASICs to support the continuously developing standards. It is important to point out that discussions of impact of hardware complexity of the controllers are missing [29][30][31].
The use of ASIP was discussed in [32][33][34][35][36]. Currently, ASIP is applied to low-speed data rates transceivers. For instance, an application of smart grid communication that uses acceleratorassisted architectures is discussed in [37]. An ASIP performing the main OFDM's components, by deploying specific instructions for each OFDM component, was discussed in [34]. In this case, a processor is used for operating at a maximum clock frequency of 280 MHz and a total of 107,000 gates using a 0.18 μm standard cell library. Another kind of ASIP is a software-configurable processor that accommodates a costeffective and time-saving implementation of PHY, MAC and a stack of the IEEE 802.  standard for the base station and the customer-premise equipment were presented in [36]. These contributions do not address the hardware resource utilization related to controllers.
RC platforms offer large computational capacity, high level of flexibility, reconfiguration functionality, and fast and easy design flow [38,39]. In addition, they make use of design tools that allow to trade-off complexity, performance, and development and design times [40]. Last but not least, they facilitate the implementation and operation of several processing units in parallel, making them appropriate for adoption in data communication systems that need to be flexible and capable of being updated in the field to fulfill specific local needs and/or new requirements or evolutions.
Reconfigurable platforms based on FPGA devices have been considered in several areas [41][42][43][44][45][46][47]. Regarding data communication based on the OFDM scheme, the majority of contributions reports partial implementation of OFDM transceivers. The designed FFT/IFFT processing units are analyzed, synthesized, and simulated in Altera Cyclone II EP2C35F672C6 [41]. The implementation of a fixed-WiMax transceiver that utilizes approximately 70 % of the FPGA's resources was presented in [487576]. The implementation of an OFDM modulator based on the IEEE 802.  standard was addressed in [48]. The use of FPGA platforms to accommodate the OFDM transmitters of both IEEE 802.11a and 802.  standards was addressed in [49,50]. The implementation of individual parts of OFDM transceivers, such as FFT and its inverse [51,52], a peakto-average power ratio (PAPR) reduction [53], an interleaving [54], a higher-order square M Quadrature amplitude modulation (QAM) detection/demodulation [19], a 64-QAM demodulator [55], a channel error detection and correction [56], and baseband transmitter [57,58] have been investigated so far. Overall, these contributions show the suitability, flexibility, and abundance of resources of FPGA platforms, such as lookup table (LUT), memory, multipliers, and other intellectual property (IP) for designing high-speed and reconfigurable transceivers.
Regarding the use of FPGA-based RC platforms for prototyping OFDM transceivers, two challenging issues remain open. First, the quantification of the complexities related to the controllers have not been well addressed in the literature, even though these complexities can substantially translate in high hardware resource utilization and energy consuming. Second, there are several ways to implement OFDM-based schemes for baseband and passband data communications for a clustered-OFDM system; however, previous contributions have not discussed the best approaches that could be applied to implement an OFDM-based transceiver when hardware resource utilization and power consumption are brought to the center of the discussion. Both issues are addressed in the following sections.

THE OFDM TRANSCEIVER ON AN FPGA-BASED RC PLATFORM
This section discusses the implementation of an OFDM transceiver on an FPGA-based RC Platform. In addition to the OFDM scheme, all controllers for a system on a chip (SoC) based on an FPGA device are covered. The idea behind it is to build as SoC that combines the flexibility of firmware upgrades, parallelism, and high performance, which are all found in an FPGA device.
Considering the requirements for a practical implementation in FPGA devices [59,60] and the recommendations for the implementation of an OFDM transceiver for high-speed  The architecture is divided into eight units: physical layer of a receiver (Rx-PHY) and a transmitter (Tx-PHY), softcore processor for running Rx-MAC and Tx-MAC, Ethernet controller, memory controller, DAC controller, ADC controller, phase-locked loop (PLL), and debugger. This general architecture allows short-time implementations of the OFDM transceivers' MAC and PHY layers as well as the inclusion of new components into the PHY and MAC layers, such as channel coding, interleaving, symbol synchronization, sampling rate correction, scheduling, and resource allocation. The OFDM transceivers, which are implemented in this paper, are evaluated within this general architecture. Brief descriptions of all units are as follows: Ethernet controller: This unit controls data flux between the Ethernet interface and Tx/Rx buffers. The adopted Ethernet interface uses the 88E1111 chipset [61], because it implements the PHY layer of 1000BASE-T, 100BASE-T, and 10BASE-T. The Ethernet controller makes use of the Triple-Speed Ethernet (TSE) IP that implements the MAC layer and allows it to control and transfer data through the 88E1111 chipset. This chipset makes use of the gigabit mediaindependent interface (GMII) to connect the MAC and PHY layers, see IEEE 802.3 standard for more details. The GMII is implemented using an 8-bit data bus that is clocked at 125 MHz. The data flux between the Ethernet interface and the Tx/Rx buffers is assisted with the use of 32-bit Avalon-MM interface, which is a typical address-based read/write interface for masterslave connections in Altera's FPGA. A complete description and analysis of the Ethernet interface implementation is given by [20].
Debugger: This unit applies techniques for debugging the design. Basically, it generates a priori known data to validate the outputs related to the corresponding inputs. It also analyzes buses with an embedded logic analyzer, which is an instance of the SignalTap II Logic Analyzer [62]. This embedded logic analyzer is a system-level debugging tool that captures and displays real-time signal behavior while the design is running at maximum clock speed, without using extra I/O pins. The use of a JTAG interface allows a personal computer to display and export these data.
DAC controller: This unit controls data flux and conversion between the 18-bit output data bus of the Tx-PHY unit and the 14-bit DAC. It is responsible for ensuring that all components of the Tx-PHY unit are synchronously clocked. The chosen DAC5672IPFBR DAC chipset [63] is set to work at a sampling frequency equal to 120 MHz.
ADC controller: This unit controls the data flux and conversion process between the 14-bit output of the ADC and the 18-bit input data bus of the Rx-PHY unit. It is also responsible for ensuring that all components of the Rx-PHY unit are synchronously clocked. The adopted AD9254BCPZ DAC chipset [64] is configured to work at a sampling frequency equal to 120 MHz.
PLL: This unit is responsible for generating the clock signal of the FPGA device. The PLL device is sourced by an external 50 MHz and 50 part per million (ppm) oscillator [65], whose frequency is denoted by F in . The output frequency of the PLL is equal to the frequency of the voltage controlled oscillator, F vco , divided by the post-scale counter L c ∈ ℕ * . By generating the frequencies we obtain the output clock with a frequency given by Note that L f ∈ ℕ * and L p ∈ ℕ * denote feedback and pre-scale counters, respectively. In this contribution we have F in = 50 MHz which, as a consequence, result in the main clock being F out = 120 MHz, that is equal to the clock frequency of ADC and DAC controllers.
Memory controller: This unit controls the external and on-chip memories with the use of the 32-bit Avalon-MM interface. The external memory is an synchronous dynamic random-access memory (SDRAM) based on the IS42S16320D x2 chipset [6676], which has 128 MB of capacity and 32-bit data bus. The total FPGA on-chip memory is 6480 Kb (kilobits).
Softcore processor: This unit implements a softcore based on Nios II [21,66,67], which is a 32-bit reduced instruction set computer (RISC) embedded processor designed by Altera. It is responsible for running and controlling the MAC layer of the implemented OFDM transceiver in the chosen FPGA device. Furthermore, it operates with F = 50 MHz and makes use of several Avalon interfaces to control all the architecture. The MAC layer is a software implemented with C/C ++ language compiled on the Altera's GNU tool chain to make it compatible with NIOS II.
Tx-PHY and Rx-PHY: These units implement the algorithms associated with the OFDM scheme at the PHY layers of the transmitter and receiver parts of a transceiver. Due to the scope of this contribution, only the algorithms related to the OFDM schemes discussed in [14] are considered. The Tx-PHY and Rx-PHY units are implemented into FPGA devices. The data exchanges between these components are mediated by the 32-bit Avalon-MM interface.

Description of Tx-PHY
The block diagram of the Tx-PHY is shown in Figure 2. Each block is described as follows: General controller: It controls all OFDM functions associated with the transmission at the PHY layer. This unit allows the Tx-PHY to meet the requirements of timing and scheduling for real-time transmission of OFDM symbols. A 32-bit Avalon-MM slave interface and an on-chip buffer are deployed to exchange control parameters with the softcore. The control parameters, which are generated by the MAC layer, deal with the dynamics of the OFDM-based data communication.
Bit processor: It accesses the Tx buffer at the memory position, dictated by the general controller, to get a 32-bit data word. The bits of the data word are serialized and then delivered to the QAM mapper block. Note that the general controller receives the memory position information from the MAC layer.  forming the bit allocations in the subcarriers. The S/P block is in charge of receiving and stopping serial bit streams to yield words with length equal to b ∈ {1, 2, 4, 6, 8, 10, 12} bits. To do so, it reads the parameters stored in the shared memory. Then, it sends a control data message with the number of bits in the data word to the mapper block, which is responsible for allocating the corresponding constellation points to the selected subcarrier a priori. In our implementation of the square M -QAM, the constellations defined by are derived from the 2 12 -QAM one, which considerably reduces the need for storage memory. A further development of M -QAM with even and/or odd numbers of bits was introduced by the authors in [68]. This novel implementation has the potential of eliminating the necessity of large memory usage when highorder M -QAM constellations are used.

P(⋅):
This unit implements an OFDM scheme to transmit OFDM symbols through baseband and passband channels. Sections 4 and 5 describe the schemes that were implemented and evaluated. Figure 4 shows the block diagram of the Rx-PHY part. Each block in it is described as follows: General controller: It controls all OFDM components related to the reception of OFDM symbols at the PHY layer. This unit allows the Rx-PHY to meet the timing and scheduling requirements for the reception of OFDM symbols. A 32-bit Avalon-MM slave interface and an on-chip buffer are responsible for exchanging control parameters with the softcore. The control parameters, which are generated by the MAC layer, control the dynamics for the reception of the OFDM symbols. Its implementation is similar to the general controller of the Tx-PHY.

Description of Rx-PHY
Bit processor: It accesses the Rx buffer, at the memory position dictated by the general controller, to get a 32-bit data word. Then, it converts the data words at the output of the QAM demapper block to a serial stream and stores it in a 32-bit data memory position provided by the general controller. The general controller receives the memory position information from the MAC layer.  Once the 12-bit data word, which is constituted by real and imaginary components, is available at the input of the QAM demapper block, a binary search tree related to both components is applied to associate the data word with a constellation point. Finally, it presents the bit stream associated with the constellation point at its output. In [19], the authors offer a detailed description of this implementation.

Q(⋅):
This unit covers an OFDM scheme for receiving OFDM symbols through baseband and passband channels. Their FPGA implementations are described in Sections 4 and 5, respectively.

IMPLEMENTATION OF CLUSTERED-OFDM: THE TWO-CHANNEL ADC AND DAC APPROACH
This section briefly presents the FPGA implementation of multicarrier schemes for passband, such as single SSB-OFDM and DSB-OFDM and baseband (DMT) schemes, as detailed in [14,[69][70][71][72]. According to the clustered-OFDM system, both schemes for baseband and passband data communications need to be implemented in a transceiver. We assume that the bandwidth for data communication occupies the frequency band [0, B) Hz and each scheme makes use of B∕P bandwidth for data communication. P ∈ ℕ + is the number of subbands of data communication, as it is considered in the clustered-OFDM scheme in [14]. Furthermore, let us assume that the PLC channel is linear and time-invariant (LTI) with a representation given by the vector h = [h 0 h 1 … h L h −1 ] T , in which T is the transpose operator. Also, we assume that the length of the cyclic prefix, L CP , is such that L CP = L h ; D and U refer to the downsampling and upsampling factors. The number of subcarriers of the OFDM symbol is N and L PC = N ∕4. Following [14,18], D = U = P and P = 5.
It is important to emphasize that the baseband and passband data transmissions are digitally implemented and, therefore, allow us to reduce hardware complexity associated with the front/end; however, it increases the cost of the digital processing device, which is not an important concern because the cost for digitally processing signal is decreasing.

SSB-OFDM transmitter  (⋅) SSB and receiver (⋅) SSB
The SSB-OFDM scheme is an SSB modulation scheme for passband data transmission. For this scheme, the bandwidth of the passband channel is equal to ∕U in the discrete time domain, which corresponds to a frequency bandwidth equal to B∕U in the continuous time domain because it is considered a sampling frequency equal to f s = 2B and frequency band ranging from 0 up to B Hz. The length of the OFDM symbols is 2N , but only half is used for data communication. Figure 6 shows the block diagram of the SSB-OFDM scheme. The output of the normalized inverse discrete Fourier transform (IDFT) is expressed as where X ∈ ℂ 2N ×1 is an OFDM symbol provided by a digital modulation technique, W ∈ ℂ 2N ×2N is a discrete Fourier transform (DFT) matrix, † is the Hermitian operator, = [I N 0 N ] T , I N is an N -size identity matrix and 0 N is an Nsize square matrix of zeros. After cyclic prefix (CP) insertion, upsampling by U and low-pass filtering, the signal at the output of the Tx-PHY is given by where x e [m] is obtained by upsampling x[n] by U , h a LP [m] is an analytic low-pass filter with maximum passband frequency bandwidth equal to ∕U . The symbol ⋆ is the convolution operator and p is the modulation frequency. The operators ℜ{⋅} and ℑ{⋅} extract the real and imaginary components of their arguments, respectively. Note that x[n] is constituted by the elements of x. For a signal z[n], the upsampling operator is denoted by We assume perfect synchronization. Then, after cyclic prefix removal, the vector y =ỹ + v, which consists of the samples of the sequence {y[n]} at the output of the PLC channel, is obtained. As a result, the estimated OFDM symbol can be expressed asX such that denote the variance of the signal and noise in the j th subcarrier and subband, respectively; H = diag{H 0 , H 1 , … , H 2N −1 } and H j is the j th element of the vector given by where 0 L denotes an L-length column vector of zeros. The SSB-OFDM scheme was implemented using the Verilog language. The block diagrams associated with the registertransfer level (RTL) representation of the units of the SSB-OFDM are briefly described in this section. For the sake of simplicity, in the following figures, the blue and green lines denote the data and control paths, respectively. According to the AVALON standard [21], the control signals are: enable (ena), start of packet (sop), end of packet (eop), and clock (clk). The data signals named real and imag correspond to the real and imaginary components of an SSB-OFDM symbol.
The description of processing tools (blocks) applied to one OFDM symbol, which can be straightforwardly used for processing consecutive OFDM symbols, is as follows: X: The RTL representation that implements the X block is shown in Figure 7. This block makes use of four N -length buffers (two for the real and two for the imaginary components of the data). Write and read controls guarantee that one buffer is being written, while the other is being read, for each component of the data (i.e. the ping pong scheme). The output control imposes that N samples of the input sequence are outputted followed by N zeros for each component, resulting in 2N samples of an SSB-OFDM symbol.
IDFT: The IDFT block was implemented using the IFFT IP from Altera [21]. The lengths of input and output are equal to 2N and the number of bits used to quantize the input and output is 18.

CP insertion and upsampling:
The RTL representation of the unit that performs cyclic prefix insertion and upsampling is shown in Figure 8. Both processing are performed at the same unit due to the need to obtain the maximum operating frequency of the FPGA device. For performing the CP insertion, the write and read controls in this block ensure that one buffer is being written while the other is being read for each component. The read control reads the last L CP samples and outputs them before starting to output the stored samples from the beginning, resulting in (2N + L CP ) samples. For performing the upsampling, the upsample control imposes that each sample is outputted and followed by U − 1 zeros for each component, resulting in a (2N + L CP )U -length sequence at the output of this block.
Filter: The RTL representation of the block responsible for performing digital filtering based on an FIR digital filters is shown in Figure 9. This block is composed of three parts: (i) control, which is responsible for picking up the appropriate

FIGURE 9
The RTL representation for the implementation of the digital filter block

FIGURE 11
The RTL representation for the implementation of the digital passband demodulation coefficients to use (read control and coefficient control); (ii) coefficient memories that store the coefficients of the digital filters that select in which frequency band the OFDM symbol will be transmitted; and (iii) filter that is responsible for performing the filtering. The input data, named band, has three bits that select one of the five memories that store the coefficients of five (P = 5) FIR digital filters.
Passband modulation: The RTL representation of the unit that performs the passband modulation is shown in Figure 10. This block uses a numerically controlled oscillator (NCO) IP [21] to generate the in-phase and quadrature sinusoidal signals. Two 18 × 18-bits multipliers together with one 36 × 36bits adder are used to perform the digital passband modulation.
Passband demodulation: The RTL representation of the block that performs the digital passband demodulation is shown in Figure 11. This block uses a NCO IP to generate the in-phase and quadrature sinusoidal signals. Also, two 18 × 18-bits multiplier together with one 36 × 36-bits adder are used to perform the digital passband demodulation.
Cyclic prefix removal and downsampling: The RTL representation of the block responsible for performing the CP removal and downsampling functions are shown in Figure 12. Both functions are performed within the same block to maximize the operating speed of the FPGA device. To remove the cyclic prefix from a (2N + L CP )D-length sequence, the first DL CP samples are discarded. To perform the downsampling operation, the next sample is registered and the following D samples at the input are discarded until a 2N -length sequence is outputted.
Frequency channel estimation and equalization: The RTL representation of the unit that performs channel estima-

FIGURE 13
The RTL representation for the implementation of the channel estimation and frequency domain equalization tion and channel equalization are shown in Figure 13. The estimation block receives a previously known signal and performs channel and noise estimation, while the equalization block performs frequency equalization based on the MMSE criterion. T : The RTL representation of the unit that performs the T function is shown in Figure 14. This unit uses four Nlength buffers (two for the real and two for the imaginary components of the data). Write and read controls ensure that one buffer is being written to, while the other is being read from, for each component of the data. The output control unit imposes that 2N appropriate samples are outputted.

DSB-OFDM transmitter  (⋅) DSB and receiver (⋅) DSB
The DSB-OFDM is a DSB modulation scheme for passband data transmission. The frequency bandwidth of the signal in the baseband and passband are ∕2U and ∕U , respectively. N is  Figure 15. The vector at the output of the IDFT block is given by After cyclic prefix insertion, the resulting signal is upsampled by 2U . It results in the sequence x e [m]. By low-pass filtering x e [m], considering the in-phase and quadrature components, the following signal is obtained: as discussed in [14]. The DSB-OFDM scheme was implemented in Verilog language. The block diagrams associated with the RTL representations are similar to those presented in Section 4.1, except for the following details: Upsampling: The implementation of the upsampling unit is similar to the one shown in Figure 8. The only difference is that the upsampling factor is equal to 2U .
Downsampling: The implementation of the downsampling unit is similar to the one shown in Figure 12. The difference is that the downsampling factor is equal to 2D.

HS-OFDM-transmitter  (⋅) HS and receiver (⋅) HS
The HS-OFDM, or DMT scheme, is the OFDM scheme version for baseband data communication [70,73] , which is supposed to transmit/receive information using only one-channel DAC and ADC devices, respectively. Figure 16 shows a block diagram of an HS-OFDM scheme applied to the baseband data communication. For this scheme, 2N is the length of the OFDM symbol. The output of the IDFT is given by where x ∈ ℝ 2N ×1 and (⋅) is a mapping function, as discussed in [73]. After CP insertion, the signal is upsampled by U and filtered by a low-pass digital filter (with maximum passband frequency equal to ∕U ). The signal in which x e [m] is the signal after upsampling x[n] by U , is sent to the one-channel DAC for transmission through the PLC channel.
Assuming perfect synchronization, after low-pass (LP) filtering, downsampling by D, and CP removal, the estimated OFDM symbol isX in which  −1 (⋅) denotes the inverse of (⋅) [73]. The HS-OFDM scheme was implemented using the Verilog language. The block diagrams associated with the RTL representation are similar to those presented in Section 4.1, except for the following details: • : It is responsible for performing the hermitian symmetric mapping. The operation is denoted by (⋅). The RTL representation for this unit is similar to the one depicted in

FIGURE 17
Modified HS-OFDM transmitter  (⋅)-I block diagram Figure 7, except for two main differences. The first one is that the read control unit reads the samples following the sequence {0, 1, … , N − 2, N − 1, N − 1, N − 2, … , 1, 0} to output 2N samples. The second is that to compose the 2N imaginary output, the first N read imaginary samples are followed by N read imaginary samples with its signal exchanged.
To simplify the implementation, we considered that no data is transmitted in the first subcarrier, that is, the real sample and the imaginary samples are equal to zero. •  −1 : It is responsible for applying the hermitian symmetric demapping function as shown in Figure 14. The operation is denoted by  −1 (.).

IMPLEMENTATION OF CLUSTERED-OFDM TRANSCEIVERS: THE ONE-CHANNEL ADC AND DAC APPROACH
This section briefly addresses HS-OFDM scheme ( (⋅) and (⋅)) adapted for both baseband and passband data communications applying only one OFDM-based scheme, see [14], because it allows us to transmit and receive OFDM symbols through the baseband or passband channel using only onechannel ADC and DAC devices. Therefore, they are suitable for the clustered-OFDM scheme. Sections 5.1 and 5.2 address two versions of  (⋅) for the transmitter whereas Sections 5.3, 5.4, and 5.5 focus on three different versions of (⋅) for the receiver. Two  (⋅) and three (⋅) allow six modified versions of HS-OFDM scheme that are capable of transmitting data through baseband and passband channels. The schemes discussed in Sections 5.2 and 5.5 allow both passband and baseband data communications by changing the digital filter after the upsampling and before the downsampling at the transmitter and receiver, respectively. As a result, these changes eliminate the need for using DSB-or SSB-OFDM together with HS-OFDM for providing passband and baseband communications in the same transceiver, see [14]. Note that the length of the OFDM symbol is 2N and the baseband and passband bandwidths are equal to ∕U .

5.1
Modified HS-OFDM transmitter  (⋅)-I Figure 17 shows the block diagram of  (⋅)-I. The length of the OFDM symbol is 2N . It uses DSB modulation in the baseband and SSB modulation in the passband. After CP insertion, the signal is upsampled by U , low-pass filtered (with maximum passband frequency equal to ∕U ), modulated for in-phase in which x e [m] is obtained by upsampling x I [n] by the upsampling factor U . The LP filter eliminates the images of the upsampled sequence x e [m]. A PB filter, centered at the central frequency of the passband, the one used to generate the SSB version of the signal in the passband, is applied. It is remarkable that only the filtering and modulation need to run at f s = 2B Hz at the transmitter and at the receiver. The  (⋅)-I transmitter was implemented in Verilog language. The RTL representation is similar to that presented in previous sections, except for the following details: Passband modulation: The implementation of the passband modulation unit is the same as the one showed in Figure 10, however, only the in-phase carrier is considered. The PB filter has center frequency equal to p , p = 1, … , P and bandwidth equal to ∕U . Figure 18 shows the block diagram of  (⋅)-II for OFDM that is a simple alternative approach for baseband and passband data communications based on HS-OFDM. Let the OFDM symbol be given by X  = (X), and x be defined as in Section 4.3. Then, the baseband signal x I [n] consists of samples of the vector x. It is upsampled by U and results in the signal x e [m] so that its Fourier transform is expressed as

Modified HS-OFDM transmitter  (⋅)-II
which corresponds to periodically repeated images of the baseband signal X e (e j ′ U ) at the harmonic frequencies ′ = 2 l ∕U, −∞ < l < +∞. Then, a bandpass filter h PB [m] is applied to select the upper sideband (USB) or the lower sideband (LSB) of one of the images of the signal to generate the signal to be transmitted through the channel [74]. For baseband data communication the PB filter turns out to be a LP filter. As a result, the output of the transmitter is expressed as  The  (⋅)-II transmitter was implemented using the Verilog language. The RTL representation is similar to that presented in previous sections, except for the following details: Baseband and passband modulation: The passband and baseband modulation is composed of two units: upsampling, which is the same as that one showed in Figure 8, and PB filter, which is the same as that one portrayed in Figure 9, with center frequency equal to p and bandwidth equal to ∕U . This digital filter is the key unit to implement baseband and passband modulations in the same transmitter.

Modified HS-OFDM receiver (⋅)-I
The HS-OFDM scheme adopted for passband data communication uses SSB modulation. Therefore, the SSB signal can be recovered by using the conventional demodulation scheme applied to OFDM and SSB-OFDM when x ∈ ℂ 2N ×1 . If x ∈ ℝ 2N ×1 , only in-phase demodulation is needed. The block diagram of this receiver, designated as (⋅)-I, is illustrated in Figure 19.
Assuming perfect synchronization after demodulation, LP filtering, downsampling by D, CP removal, and DFT usage, the estimated OFDM symbol is obtained as in (12). Note that (⋅)-I makes use of the in-phase and quadrature demodulations of the SSB-OFDM scheme, see details in Section 4.1.
The (⋅)-I receiver was implemented in Verilog language. The RTL representation is similar to that presented in previous sections.

Modified HS-OFDM receiver (⋅)-II
Disregarding in-phase and quadrature demodulations, the demodulation of SSB signals can be performed by using (⋅)-II, whose block diagram is shown in Figure 20. For baseband data communication, the (⋅)-II converts to the (⋅) of the HS-OFDM scheme. Note that the bandwidth of bandpass filter  [m] is ∕U . After the application of FEQ, the estimated OFDM symbol is obtained as in (12).
The (⋅)-II receiver was implemented in Verilog language. The RTL representation is similar to that presented in previous sections, except for the following details: Passband demodulation: The passband demodulation of this modified HS-OFDM receiver, (⋅)-II, is almost the same as the one highlighted in Figure 11 disregarding the quadrature carrier. The PB filter has center frequency equal to p and bandwidth equal to ∕U .

Modified HS-OFDM receiver (⋅)-III
The (⋅)-III receiver exploits filter bank theory to replace the modulation/demodulation by a simple combination of upsampling/downsampling with filtering, see [74]. A block diagram for (⋅)-III for a clustered-OFDM system is shown in Figure  filter is to remove all aliasing components except those associated with the desired band of interest. After the CP removal, the DFT usage, the estimated OFDM symbol is given as in (12). The (⋅)-III receiver was implemented in Verilog language. The RTL representation is similar to that presented in previous sections, except for the following detail: Baseband and passband demodulation: The baseband and passband demodulation of this modified HS-OFDM receiver, (⋅)-III, is composed of PB filter, that is the same as the one showed in Figure 9, and downsampling, as described in Section 4.2.

IMPLEMENTATION RESULTS
This section presents comparative analyses about the implementations of the OFDM and HS-OFDM-based schemes, which were briefly discussed in the previous sections, for a clustered-OFDM system on an FPGA device. The hardware resource utilization and power consumption demanded by the  Table 1 lists the parameters adopted by the aforementioned schemes. All schemes were implemented in an EP4CE115 In order to estimate the power consumption of the FPGA device, the power analyzer tool [21] was used. The estimates were obtained by taking into account the following considerations: commercial temperature grade; core voltage of the FPGA device equal to V ccint = 1.20 V; ambient temperature T A = 25 • C; 23 mm medium profile heat sink; 1.0 m/s airflow; toggle rate equal to 1∕8; and (vii) 120 MHz for the clock's frequency of the FPGA device. Table 2 lists the demands in terms of LC, LR, Mem, and Mult, in percentage of total resource available, and the total power consumption for the implementation of the five transmitters. According to this table,  (⋅) HS and  (⋅)-II demand the lowest hardware resource utilization and power consumption of the device. The results related to resource utilization agree with the theoretical ones presented in [14]. The  (⋅)-II is the only one that can operate in both baseband and passband and, at the same time, achieve the lowest hardware complexity and power consumption.
The demands for resource utilization and power consumption by the six receivers are highlighted in Table 3. This table shows that the (⋅) HS and (⋅)-III demand the lowest amount of hardware resource utilization of the FPGA device in terms of LC, LR, Mem, and Mult. They also require the lowest power consumption; however, only the latter may perform baseband and passband data communications. This confirms the theoretical results presented in [14].
By analyzing Tables 2 and 3, it is clear that a transceiver composed of  (⋅)-II and (⋅)-III demands the lowest hard- ware resource utilization and power consumption of an FPGA device. Based on the fact that only the transceiver resulting from the combination of  (⋅)-I or  (⋅)-II together with (⋅)-I, (⋅)-II or (⋅)-III can perform both baseband and passband data communications, then the transceiver based on  (⋅)-II and (⋅)-III schemes should be favored due to the lowest hardware resource utilization and in the lowest power consumption. It means that if baseband and passband data communications capability is demanded in the same transceiver, such as in the clustered-OFDM scheme, then power consumption and hardware resource utilization at the PHY layer can be reduced by approximately three if  (⋅)-II and (⋅)-III are implemented, instead of a combination of HS-OFDM together with DSB-OFDM and/or SSB-OFDM, which are the standard approach. Table 4 summarizes the complexity of the full duplex transceiver, which is based on  (⋅)-II and (⋅)-III, when the controllers are taken into account. Note that if a half duplex transceiver is considered, several functions could be shared between transmitter and receiver, which reduces the total hardware resource utilization as well as power consumption. Also, the Ethernet controller, the memory controller and the softcore are responsible for the largest utilization of the available  memory. This very important information could not be noticed if the complexity of controllers were not taken into account. To add more information, Table 4 informs that no resource utilization is demanded by the PLL because the chosen FPGA device have five PLLs. Last but not least, the power consumption for generating and distributing the clock signal is relevant, because it represents 17.11% of total power. Figure 22 shows the relative power consumption of the  (⋅)-II and (⋅)-III transceiver when the controllers are taken into account. As pointed out in Section 1 this result is a very important contribution to verify that in real time applications the controllers must be taken in account to analyze the power consumption. Note that the consumption of the controllers is 36%, when  (⋅)-II is 19% and (⋅)-III is 19% of total power consumption.
Regarding the digital filter implementation, Table 5 lists the multipliers used when the order of a linear-phase FIR filter assumes four different values. This table shows that the FIR digital filter used by Tx-PHY and Rx-PHY demand a large number of multipliers. In order to highlight the demand of multipliers of the FIR digital filters, the collected results showed in this table indicate that this FIR digital filter can use almost all available multipliers if its order is over 300. Therefore, the design of reduced order linear-phase FIR or IIR digital filter is of utmost importance for such kind of scheme. Further investigation about it was carried out in [18].
A floorplanning analysis was carried out to identify structures that should be placed inside the FPGA device to maximize the operation frequency. It is an important task to be accomplished in order to ensure that the design meets the performance con-

FIGURE 24
The prototype of two transceivers based on  (⋅)-II and (⋅)-III schemes straints and target hardware utilization. The floorplan of the FPGA device for a full duplex HS-OFDM transceiver based on  (⋅)-II and (⋅)-III, which is capable of providing 100 Mbps with a 120 MHz clock is shown in Figure 23. This picture shows that the majority of the implemented units of the transceiver are close to which other, which allows the hardware operation at high clock frequency.
Finally, Figure 24 shows a photo of the prototypes of two transceivers based on  (⋅)-II and (⋅)-III schemes on FPGAbased development kits. The oscilloscope's screen shows the OFDM symbols transmitted by both prototypes.

CONCLUSION
This paper has discussed and analyzed the implementation, in an RC platform based on an FPGA device, of the main components of the clustered-OFDM scheme. In this regard, it detailed and discussed the implementation of several OFDM schemes (SSB-OFDM, DSB-OFDM, HS-OFDM, and the modified versions of HS-OFDM, which was introduced in [14]). The obtained results showed that the implementation of clustered-OFDM based on  (⋅)-II and (⋅)-III results in the lowest hardware resource utilization and lowest power consumption in comparison to other OFDM-based transceiver. 3 Moreover, it has paid particular attention to practical concerns for implementing a clustered-OFDM-based transceiver, such as hardware resource utilization and power consumption associated with the controllers used to accomplish an operational implementation. The attained results showed that the controllers could demand relevant hardware resource usage, and, as a consequence, they deserve special attention in order to devise low-cost and low-power consumption clustered-OFDMbased transceiver and the like. Furthermore, the whole investigation offers useful results for reducing hardware resource usage in the next generation of multiuser and flexible data communication technologies because clustered-OFDM is flexible and suitable for implementing multiuser PLC systems.
Future work is the integration with a programmable gain amplifier (PGA) circuit to evaluate the performance in outdoor and indoor low-voltage electric power systems [10]. Also, its extension for dealing with hybrid PLC/wireless media [1] can potentially reduce costs associated with hybrid PLC/wireless transceiver. Last but not least, the prototype constitutes a platform for evaluating different techniques in real-time conditions, and, as a consequence, it will be used to evaluate dynamic resource allocation techniques.