Deep learning-based massive multiple-input multiple-output channel state information feedback with data normalisation using clipping

Massive multiple-input multiple-output (MIMO) can provide real-time high-capacity data transmission service to the user equipment (UE). However, the use of massive MIMO exponentially increases the chan- nel state information (CSI) feedback overhead. Deep learning-based ap-proaches have been proposed to reduce CSI feedback overhead with signiﬁcant CSI accuracy. Data normalisation is a very important part of deep learning, because the performance of the deep learning highly depends on the data normalisation. In this letter, an efﬁcient data normalisation method for deep learning-based CSI feedback in a massive MIMO system is proposed, where the proposed method uses a clipping technique based on the received signal strength. Simulation results show that the proposed normalisation with the clipping decreases the CSI feedback error and increases the accuracy of the beamforming vec- tor in deep learning-based CSI feedback.

Introduction: Massive multiple-input multiple-output (MIMO) is a key technique in future wireless communications. Wireless communications can take the advantage of excellent spectral efficiency and superior energy efficiency by using massive MIMO, which uses large antenna arrays at the base station (BS) [1]. To maximize data rate in the massive MIMO system, the BS adapts to a transmission with CSI reported by each user equipment (UE). However, large antenna arrays exponentially increase the feedback overhead; there have been techniques such as codebookbased quantization [2] and spatial correlation adjustment of CSI [3] to reduce the feedback overhead. Moreover, recent techniques based on deep learning, such as a convolutional neural network (CNN) and long short-term memory models (LSTM), showed both the reduction of feedback overhead and the high accuracy of CSI feedback [4,5]. The authors of [6] proposed a neural network architecture including CNN, LSTM, and attention mechanism in order to achieve a compromise between performance and complexity, where they utilized a single-stage network to reduce the number of training parameters. The authors of [7] designed a separate deep learning-based denoise network to overcome the influence of various interference and non-linear effects in practical feedback channels. The authors of [8] proposed a multiple-rate compressive sensing neural network to compress and quantise the CSI. Here, they improved the original CsiNet of [4]; and developed a new quantization framework and training strategy considering that the feedback is in the form of the bitstream in practical feedback systems. Most previous studies for CSI feedback using deep learning focused on the deep learning architectures. However, data preprocessing such as normalisation is also a very important factor in deep learning. In this letter, we propose a normalisation method with clipping in CSI feedback using CNN-based deep learning.
System model: We consider a massive multiple-input multiple-output orthogonal frequency-division multiplexing (MIMO-OFDM) system with N subcarriers. The base station (BS) is equipped with M antennas and the user equipment (UE) is equipped with a single antenna. The downlink received signal on the ith subcarriers at UE can be expressed as: where x i ∈ C M × 1 is the transmit signal, h i ∈ C M × 1 is the frequency domain channel vector, n i ∈ C M × 1 is the additive white Gaussian noise (AWGN), thus the channel matrix in the spatial-frequency domain vector where W d and W a are N × N and M × M DFT matrices, respectively. To reduce feedback overhead, we retain only the first N r rows ofH due to limited multipath time delay in the delay domain [4]. Finally, the number of feedback parameters is reduced from 2NM to 2N r M (H are separated to the real and imaginary parts, used as inputs), and the truncated matrix becomesH. After generatingH, normalisation is performed before training the CNN. The normalisation proceeds in two stages: First, the clipping N c ( · ) is applied and then the scaling N s ( · ) is applied sequentially. The normalisation channel matrix,H, is given by: The encoder of CNN is trained to make the compressed representation L (codewords) from the CSI data, trained encoder T enc ( · ) can be expressed as: and decoder T dec ( · ) is trained to reconstruct the original CSI information as follows:Ĥ The overall CSI feedback architecture including data normalisation is shown in Figure 1.
Proposed normalisation with clipping: Data normalisation before training the network is crucial to improve the accuracy and reduce the time complexity [9]. Another advantage of normalisation speeds up the gradient descent convergence [10]. In many machine learning algorithms, feature scaling is needed so that some numbers (outliers) do not impact the model just because of their large magnitude. In particular, the feature scaling is essential for deep learning that calculates distance between data. If not scaled, the feature with a high value starts dominating when calculating distances. Moreover, the outliers can skew the distribution artificially and gradient descent methods can have trouble due to outliers [10].
Geometry-based stochastic channel models (GSCMs) are widely used to generate the massive MIMO channel matrix. The channel elements of GSCMs are generated by summing the rays with delay, power, and angle. Therefore, some channel elements can have very high power. These elements have a great influence on the statistical operation of deep learning. One approach to fix outliers for linear models is to keep values between the upper and lower bounds. There are several common data preprocessing techniques, such as scaling to a range, clipping, log scaling, and z-score. However, due to the dynamic channel variations in the practical environment, it is difficult to use a scaling technique based on the mean and standard deviation of the data. Moreover, because the fading of the wireless channel can produce outliers with high abnormal power, some outliers can be regarded as exceptional anomalies. Hence, in this letter, we propose a simplified normalisation with clipping, where we regard the channel elements with high abnormal power as outliers. Clipping: To reduce the negative impact of very high power elements during the normalisation process, we limit the maximum and minimum values on the basis of a specific threshold. First, we choose a clipping threshold, A, where A is the magnitude of the channel element at ξ % of the entire dataset. Next, the clipping is performed based on the clipping threshold as follows: whereh s,l,m is the sth sample in the channel dataset and it represents the channel element on the lth delay and mth angle, and θ is the phase of the channel elementh s,l,m .
Scaling: Most of the real and imaginary parts of the element are distributed around zero. The output value of CNN has a range of [0, 1]. Hence, for the accuracy of the CSI feedback, the input value needs to be scaled in [0, 1]. A general min-max scaling method can be applied. However, the min-max scaling is sensitive to high power elements and it also increases the complexity of inverse scaling. Therefore, we first scale the data to [−0.5, 0.5] and then rescale to [0, 1], by using simple multiplication and shifting. We present the scaling procedure with clipping in Algorithm 1. Figure 2 compares the probability density function (PDF) of the scaling without clipping and the scaling with clipping.

Fig. 3 NMSE according to the size of codewords
We use a CNN-based model called CsiNet for CSI feedback [4]. When ω = {ω enc , ω dec } is the set of parameters after CNN training, the reconstructed channel matrix is given by: Simulation results: We generate two types of channel matrices at a centre frequency of 3.65 GHz and a bandwidth of 50 MHz. First, we assume the Berlin Urban Macro (UMa) non-line-of-sight (NLOS) scenario in the QuaDRiGa channel model [11], where the number of transmit antennas is M = 32 and the UE moves along a circle. Second, we assume the semiurban NLOS scenario in the COST 2100 channel model [12], where the number of transmit antennas is M = 32 and the UE moves in several directions. We retain the first 32 rows of the channel matrix in the angular-delay domain by limiting the time period due to multipath arrivals. Therefore, the size ofH is 2 × 32 × 32. The clipping threshold ξ is set as 3.0%. The size of channel vector dataset for training, validation, and testing is 60,000, 20,000, and 20,000, respectively. The other parameters are as follows: the learning rate is 0.001, the epoch is 1000, and the batch size is 100. Additionally, Adam optimizer is used in the simulation.
The difference between the reconstructed channelĤ and the inputH is defined as the normalised MSE (NMSE), The cosine similarity can be used to evaluate the accuracy of the beamforming vector. The cosine similarity is given by, Figures 3 and 4 show the simulation results. Figure 3 shows the NMSE as the size of codewords increases from 64 to 512. The proposed data normalisation with clipping has a lower NMSE than the data normalisation without clipping for all codewords sizes. The scaling with clipping significantly reduces NMSE, especially at high codewords. If we clip the outliers (actually, we transform the outlier channels to a specific upper bound of the magnitude without losing the phase information), we can focus more on the distribution of the normal elements, and therefore improve the accuracy of the machine learning processing linear models. However, if we regard too many elements as outliers, the original distribution of all elements may be distorted and the performance will no longer improve. Figure 4 shows the accuracy of ρ as the size of codewords increases from 64 to 512. The proposed data normalisation using clipping improves the performance of cosine similarity, ρ, in the QuadDRiGa channel model. Notice, because we clip the only magnitude of the channel information, the phase of all the channel information is maintained. Small  codewords result in poor performance regardless of how the data is normalised with or without clipping due to limitations in the CNN structure.
In the QuaDRiGa channel model, however, as the size of codewords increases, the proposed data normalisation with clipping significantly improves the cosine similarity. Because the COST 2100 channel model does not fade the clusters in and out over time, the cosine similarity is higher in the COST 2100 channel than in the QuaDRiGa channel. Moreover, the cosine similarity is almost similar regardless of clipping in the COST 2100 channel. Figure 5 shows the validation loss in the QuaDRiGa channel as the number of epochs increases, when the size of codewords is 512. The data normalisation without clipping shows large fluctuations as the number of epochs increases, which makes it difficult for CNN to learn CSI feedback behaviour. However, the proposed data normalisation with clipping converges quickly without fluctuations. In massive MIMO systems, the channel state may change rapidly and significantly due to the fast fading. Hence, fast convergence is one of important factors in the learningbased CSI feedback. The proposed data normalisation using clipping reaches the desired MSE in a very fast time. Because the complexity of the neural network depends on the network architecture (e.g. number of parameters and number of layers), network pruning has been widely used to reduce the network complexity and over-fitting by reducing the number of parameters or connections [13]. Hence, the data preprocessing may be less relevant to the network complexity. However, the proper data preprocessing can accelerate the training as well as it may reduce the complexity of the neural network because the reduction of training time due to the data preprocessing makes the possibility to compress the network.

Conclusion:
The proposed data normalisation using clipping improves the performance of the deep learning-based CSI feedback in massive MIMO channels. The data normalisation with clipping reduces the NMSE and increases the cosine similarity as the size of codewords increases. The proposed data normalisation with clipping significantly reduces the learning time, making it suitable for use in a practical massive MIMO system where the channel environment is dynamically changed. Additionally, the performance of the proposed data normalisation using clipping is robust regardless of the channel model, QuaDRiGa and COST 2100. In a practical system, the feedback is in the form of the bitstream. Hence, the UE quantises the feedback information, where a non-uniform quantiser can be used in order to reduce the quantization error [8]. The proposed data normalisation using clipping method can simplify the quantization process by reducing the negative influence of high power channel elements in the quantization process, so the proposed normalisation would be useful for digital CSI feedback.