Millimetre wave channel modeling based on grey genetic optimization model

In this paper, grey genetic optimization model (GGOM) is proposed for predicting insuf-ﬁcient channel parameters without increasing the amount of measurement data. Based on the millimetre wave 28 GHz indoor measurement data for both LOS and NLOS scenarios, the GGOM model is compared with traditional back propagation (BP) and grey model (GM) to analyse channel parameters like delay spread, excess delay and azimuth spread. Results show that the ﬁtness of GGOM is better than the grey model in improving the stability of system. It works well with insufﬁcient data (size less than 30) in most cases as it is set regardless of the speciﬁc scene and measurement data. This is veriﬁed by QuaDRiGa platform by generating uniformly distributed and interpolated data between the experimental measurement data. GGOM ﬁts best with the measurement data compared with other prediction methods in channel characterization. Moreover, the mean absolute percentage error (MAPE) for GGOM is the least compared with GM and BP methods. The proposed GGOM model has good performance in modeling insufﬁcient data of propagation channel, practically.


INTRODUCTION
The ever-growing need in wireless communications is promoting the exploitation of millimetre wave (mm-wave) frequencies, which also influences 5G communication development deeply [1]. Accurate channel characterization is important for system's design and performance analysis [2]. So far, the research at mmwave frequency bands is an essential basis for the design and engineering realization of 5G wireless communication system [3]. Due to the complexity and variability of radio propagation environment [4], there will be complex and changeable channel characteristics [5][6][7], which brings difficulties to measurement and applications. Therefore, the modeling of channel parameters has been extensively used in channel analysis, mainly to obtain the characteristics of wireless channel more effectively [8][9][10].
With the development of machine learning (ML), artificial intelligence (AI) algorithm has gradually become an effective method in the field of channel modeling [11]. In ML modeling methods, large-capacity sample data are usually the basic requirement. Back propagation (BP) algorithm is the most traditional neural network method. However, there are sometimes insufficient sample data for channel modeling due to the limitation of the time and space conditions. Modeling of small sample data is a challenge in data processing. Therefore, a channel parameter modeling method which is suitable for small sample data is needed.
Grey theory is a methodology that focuses on insufficient data and information uncertainty [12]. Grey theory defines the partially known information as grey quantity, which can be obtained via generating formulas and extracting from known information [13]. It has been widely used in many fields, for example, in predictions of population, fund, and system performance. Reasonable predictions are also made by grey model in photovoltaic power transformation and electricity consumption [13,14], and the feasibility and advantages of applying grey model in socio-economic area are analyzed in [15,16]. Grey neural network is a prediction algorithm that introduces the grey theory into the machine learning, which can effectively adjust the weight of the network through the feedback of the characteristics. By the adaptive system, the grey neural network realizes the accurate description of both system behaviour and evolution law, which makes it an attractive solution in system modeling [17].
For verifying the accuracy of prediction methods, Quasi-Deterministic Radio Channel Generator (QuaDRiGa) can be used to generate more simulation data due to limited measurement data. The QuaDRiGa model is an enhancement of the WINNER model and uses random statistical distribution to generate scatterers [18].
In this paper, grey genetic optimization model (GGOM) is proposed in predicting insufficient data of channel parameters, based on mm-wave 28 GHz indoor measurements performed in waiting hall of Qingdao High-speed Railway Station for both LOS and NLOS scenarios. Specifically, 28 GHz channel characteristics like delay spread, excess delay and azimuth spread are investigated by several prediction methods: traditional back propagation (BP), grey model (GM) and grey genetic optimization model proposed in this work. Results show that the GGOM model fits well with the measurement data and improves stability and performance for insufficient data in mmwave channel modeling.

MM-WAVE CHANNEL MEASUREMENTS
The 28 GHz indoor channel measurements were performed in the waiting hall of Qingdao North Railway Station for both LOS and NLOS scenarios. Figure 1(a) and (b) shows the LOS and NLOS measurement locations. Measurement antennas used at transmitter (TX) is omni-directional biconical horn and at receiver (RX) is uniform linear array antenna with 8 array units (ULA8), the ULA8 is moved 8 times on the horizontal plane to form a 1 × 64 virtual multiple-input single-output system (SIMO) [7]. In the measurements, the TX was fixed at a position, and the RX was replaced with different positions in the LOS and NLOS scenarios. In the LOS case, a total of 21 points are selected and marked in Figure 1 Table 1.

GREY THEORY
Grey theory is important in mining relationship between insufficient data, which makes full use of known insufficient sample data to reveal the rules of system and transform the irregular sequence of system data into regular sequence [12]. It defined grey derivative and grey time-series differential equation based on the concepts of correlation space and smoothing discrete function [13], and it can establish the dynamic model of differential equation by using discrete data series [17]. By solving the grey time-series differential equation, the system output can be predicted without knowing system characteristics.

Grey model (GM)
Grey model (GM) is of differential equation form established from grey theory [14]. GM weakens the randomness of original data by transforming it into more regular generating data. This transformation is thus called data whitening process. The whitened data can predict changing process and generate differential equations. In GM, original data sequence is accumulated to begin prediction process. Assuming a sequence of time-series data x (0) By accumulating the preceding term x (0) , the elements sequence x (1) can be obtained: The basic equation of GM is formed as [19]: where z (1) ) is the average of adjacent elements in sequence x (1) , a and u are the development grey number and endogenous control grey number, respectively, they are the undetermined coefficients and crucial in establishing differential equations.
In order to express the undetermined coefficients clearly, the original sequence is reformed into vector Y n The coefficients a and u can be expressed as vector form̂: Vector B is defined as: The grey model of Equation (3) can be expressed as: By matrix operation and least squares method, the vectorî s calculated as [17]:̂= Witĥat hand, the whitening differential equation is established as: By solving this differential equation, we can obtain: where k = 1, 2, … , n − 1. After the above steps, the cumulative sequence x (1) can be obtained. The prediction data sequence can be easily calculated by following subtraction:

Multi-variable grey neural network model
Multi-variable model can be used to predict the desired variable of training data [20]. Consider that n input variables consist of 1 desired variable and n − 1 related factor sequences. When defining x (0) 1 as the desired variable, the other related factor sequences can be established as follows: ) .
The above formula can be applied in channel parameter analysis, for example, 2 and x (0) 3 can be the excess delay and azimuth spread etc. These characters can be obtained by measurement data. Refer [21] and [22] for detailed formulas.
Accumulated by a sequence of adjacent mean values, the GM can be expressed as [23]: where a is the development grey number as introduced, b i and b i x (1) 1 (k) are the driving factor and driving term. The undetermined coefficients of the model can be obtained as: After cumulative reduction, the desired values are: In order to map the desired values to an extended back propagation network, it is necessary to transform formula (15) into the following: . (16) This transformed formula satisfies network structure of N input and 1 output parameters. The traditional back propagation (BP) method can adjust these weights in the whole network through backward feedback. Given the input of the measured parameter sequence [x 1 (k), x 2 (k), … , x n (k)], the multi-variable grey neural network model can produce the output x * (k) by adjusting its weights.

GREY GENETIC OPTIMIZED MODEL
Genetic Algorithm (GA) is an evolutionary method that simulates the natural selection and genetic mechanism of Darwin's biological evolution theory [24]. Relying on natural evolutionary mechanisms such as selection, crossover and mutation, the GA makes adaptive solutions to improve fitness of the prediction. The GA is often used to generate high-quality optimal solutions, because it can remove redundant variables and choose the variables reflecting the relationship best between samples to build model.

Genetic algorithm based grey model process
The evolution of GA usually begins with a set of original data, called the first generation in the process of iteration. In each generation of iterations, the fitness of each individual in the population needs to be assessed, and the fitness level determines whether to carry out genetic manipulation. The fitness can be defined as the reciprocal of errors, such as mean absolute percentage error (MAPE) [25]. Then select more suitable individuals for inheritance in the current population, modify each individual's genes by crossover or mutation operation, and form a new generation of continuous iteration [26].
The process of optimizing the weights and thresholds of grey model based on GA is as follows: 1. Calculate the fitness. After initialization, the fitness of all individuals in a population are calculated and used as feedback to adjust the weights in grey neural network.

Genetic manipulation. A new generation of population is
formed by selection, crossover and mutation, the next iteration and fitness calculation are carried out. 3. Judgment of termination. The iteration is terminated when the specified evolution is completed or the fitness value is reached. The obtained individuals with the greatest fitness are selected as the optimal output.
Genetic manipulations are important to exert self-adaptive function, realize fast convergence and avoid local optimum. However, the genetic manipulation usually depends on fixed factors which means that the crossover or mutation rate does not vary with the process [25]. When deciding whether to perform genetic manipulation by fixed factors, a pseudo-random number R with uniform distribution in the [0,1] is produced. R is compared with the specific crossover or mutation factor. Note that, with the fixed genetic manipulation rate and constant adjustment in continuous iteration, the convergence speed of grey neural network is decelerated and the results are prone to be local optimal solution, which affects the prediction accuracy [27].

The grey genetic optimization model
Considering the sample and individual data diversities, grey genetic optimization model (GGOM) is proposed; it is a nonlinear model to adjust the crossover and mutation factors globally. The GGOM, or the probability of crossover or mutation can be expressed as: The two evaluation dimensions, E 1 and E 2 , adjust genetic operations globally. E 1 presents the sample diversity, the smaller the value, the more likely to converge to local optimum. E 2 presents the individual data diversity (the average value is E 2 = 0); when it is positive, it means the sample has higher fitness, and vice versa.
The above GGOM can smooth the influence of individual fitness in different iteration stages; it has more adaptive evolution and reasonable weights, thus it improves fitness and avoids local optimal solutions by non-linear exponential functions. Figure 2 shows the fitness of channel parameters of delay spread, excess delay and azimuth delay in the LOS and NLOS measurements by different prediction methods with 50 generation of data. It is seen that the fitness of the GGOM converge FIGURE 2 Fitness curves of channel parameters predictions better than the GM in general. With reasonable genetic manipulation rate, the GGOM can improve the convergence speed and performance. It also improves the stability and prediction accuracy by preventing local optimal solutions and premature convergence.

Model verification by QuaDRiGa
For verifying GGOM in the application of insufficient data, Quasi-Deterministic Radio Channel Generator (QuaDRiGa) is used to generate simulation data because of limited measurement data. The QuaDRiGa model is an enhancement of the WINNER model and uses random statistical distribution to generate scatterers. Detailed information on QuaDRiGa platform can be found in the previous work. Combining the characteristics of statistical random model and deterministic model, the scenario is set as 3GPP-38.901 model at 28G Hz in the QuaDRiGa model [22]. In this work, scenarios of uniform distribution and interpolation sequence are chosen in generating statistical data. Figure 3(a) shows the uniform distribution, where the TX is set as 1.5 m in height at the centre of the indoor environment, and the RX moves around the TX at the radius of 200 m in uniform distribution. Figure 3(b) shows the interpolation distribution, which simulates the Qingdao High Speed Railway Station scenario, that is, the TX position is fixed, and the RX positions are as same as in the LOS measurement positions (marked as red circles), and some interpolated positions by QuaDRiGa (marked as blue circles) are generated between the original measured data. The quantity of simulation points generated by both methods can be adjusted, thus, simulation data of different sizes can be obtained.
The mean absolute percentage error (MAPE) curves of delay spread in different data sizes and prediction methods with uniform and interpolation distributions are shown in Figure 4(a) and (b), the data sizes are 200 and 100 chosen in the uniform distribution and interpolation distribution, respectively. Note that in the interpolation distribution scenario, both experimental and interpolation data are considered. It is seen that the MAPEs of GGOM and GM are smaller than that of the BP in both uniform and interpolation distributions when the data size is under 30 and 36, and the GGOM has the smallest error rate. That means both the GM and GGOM can describe channel delay spread properly when the data size is less than 30 and 36 for the uniform and interpolation distributions. Moreover, the simulation results by uniform distributions are based on the assumption of random scatterers which are independent of

PREDICTION RESULTS AND COMPARISON
Channel parameters like delay spread, excess delay and azimuth spread are important and directly affect the performance of wireless communication systems. Based on SAGE (Space-Alternating Generalized Expectation-maximization) results, a set of corresponding channel parameters can be obtained for each measuring point. For better prediction performance, the first 70% of the data should be extracted as training set and the remaining 30% are test set, which are experiment data and used in our previous study [25]. This means that in the LOS scenario, the first 15 data should be used as training set and the last 6 data are test set, and in the NLOS scenario, the first 8 data are used as training set, and the last 4 data are used as test set in analysing channel parameters. Specifically, the measured data is moved as circle-shift to predict the former data. For example, in the LOS case, moving the last 6 data in front of the first data to form a new 21 position sequence, in turn the new data set in the measurement can be predicted by GGOM. To get the all predict data, the measurement data should be circle-shift moved by 4 (for LOS scenario) or 3 (for NLOS scenario) times in generating data.

Delay spread
Delay spread (DS) is an important channel parameter; it describes the dispersion caused by multiple paths in the time delay domain. Figure 5 shows delay spreads in the LOS and NLOS channels for several prediction methods. It is seen that both the GGOM and grey model have better performance than the BP model, and the GGOM fits better with measurement data in general. However, in the NLOS case, the prediction models fluctuate greatly due to the less measurement data and its characteristics in channel.

Excess delay
Excess delay (ED) depends on the effect of environmental reflection and scattering; it is an important parameter to locate in complex environment [22]. Figure 6 shows excess delays in the LOS and NLOS channels for several prediction methods. Compared with the GM and BP models, the GGOM performs best in the prediction of excess delay in LOS and NLOS channels. GGOM can accurately predict the trend of excess delay curve, even in the NLOS scenario where the excess delay fluctuations largely.

Azimuth spread
Azimuth spread (AS) is caused by the fading of multipath signals. In spatial domain, multipath reflection and scattering broaden the angle of arrival of signals on the receiving antenna, so that the angle spread can describe spatial correlation. Figure 7 shows azimuth spread in the LOS and NLOS channels with different prediction methods. Again, it is seen that GGOM is better than GM in fitting the measurement data, and it can best fit the measurement data in general. Table 2 shows the mean absolute percentage errors (MAPE) for the GGOM, GM and BP methods in the LOS and NLOS  Since the NLOS data is more insufficient to predict. However, in predicting the azimuth spread (AS) in the LOS case, the MAPE parameters are bit larger for both GGOM and GA predictions than the BP method. Similar result was also found in previous study, and it was caused by (1 × 64) SIMO antennas in measuring azimuth spread [21]. Overall, it can be concluded that GGOM is more suitable in mm-wave wireless channel modeling by adjusting the intensity of genetic operation and convergence rate. Thus, GGOM works better in exploring the law between insufficient data of measurement data in wireless channel.

CONCLUSIONS
In this paper, GGOM is proposed for predicting insufficient data of channel parameters. The measurements were carried out at mm-wave 28 GHz indoor environment in waiting hall of Qingdao High-speed Railway Station for both LOS and NLOS scenarios. Channel parameters like delay spread, excess delay and azimuth spread are investigated based on several prediction methods of back propagation (BP), grey model (GM) and GGOM. The fitness of GGOM shows advantages with adjustable convergence. It climbs faster to reach the best fitness and more similar to best fitness, which means GGOM is a better way for insufficient data prediction in channel modeling. It works well for insufficient data (size less than 30) in most cases because of the independence of its settings. This is verified by QuaDRiGa platform by generating uniform distributed and interpolated data between the experimental measurement data. The GGOM fits better with measurement data than other prediction methods in general. Moreover, the assessment of the mean absolute percentage errors (MAPE) for different prediction methods are compared. Results show that GGOM can improve prediction performance of insufficient data significantly in all scenarios, especially in NLOS scenario as its data is more insufficient to predict. The provided results can be used for design of mm-wave systems and performance evaluations.