SVM-based online learning for interference-aware multi-cell mmWave vehicular communications NationalKeyResearchandDevelopmentProgramofChina,Grant/AwardNumbers:2019YFB2103004,2020YFB1806608;NationalNaturalScienceFoun-dationofChina,Grant/AwardNumbers:61771252,61772287,61801240;KeyUniversityScienceResearchProjectofJiangsuProvince,Grant/AwardNumbers:18KJA510004,18KJA510005;JiangsuProvinceSpecialFundProjectforTransforma-tionofScientiﬁcandTechnologicalAchievements,Grant/AwardNumber:BA2019058;OpenResearchFundofKeyLabofBroadbandWirelessCommu-nicationandSensorNetworkTechnology(NanjingUniversityofPostsandTelecommunications),Min-istryofEducation

This paper proposes a data-driven method of mmWave beam selection in multi-cell systems to achieve a near-optimal fast beam allocation with low complexity. In particular, an online learning algorithm based on support vector machine (SVM) equipped with the radial basis function kernel, namely SVM-based online beam selection (SBOS) algorithm is proposed. The proposed algorithm starts with an adaptive beam selection process for certain trafﬁc pattern that uses an SVM learning model to adaptively reﬁne the beam selection strategy. Speciﬁcally, SVM-based model labels the feedback (the average information rate) from the cellular system, then learns from samples, and makes the scheme space smaller by maximising samples’ minimum distances to all labelled samples in the sample space constrained by newly learned boundaries. Then, according to the aggregated data about the trafﬁc patterns and the performance of corresponding beam selection strategy, SBOS algorithm exploits beam selection schemes recorded in the database or explores new schemes for unknown situations, respectively, and how to tune the hyperparameters for the SBOS algorithm is discussed. Furthermore, the extensive simulation results show that the proposed algorithm achieves a better performance versus upper conﬁdence bound and Random methods.


INTRODUCTION
MmWave cellular systems are expected to be densely deployed to provide multi-Gbps links to support high data rate by using the small-size antennas [1][2][3]. Especially, multi-Gbps links will be necessary for future vehicle-to-everything (V2X) communications to support the intensive sensory data transmission for (semi)-autonomous driving or the augmented reality (AR) and virtual reality (VR) services for passengers [4,5]. However, mmWave signals are prone to the environment blockages due to high severe path loss. Such losses can be compensated with the elaborate beamforming which can be enabled by large antenna arrays to provide extra transmit gains. One promising solution is to let base stations (BSs) or transmission points (TPs) in different geographical cells coordinate in transmission [6]. Choosing suitable beams for users is a high complexity-cost procedure, especially for high-speed vehicle communications, since the time-varying channel characteristics will lead to extra channel estimation overhead and access delay, which may cause serious problems for vehicle time-sensitive services [3,[7][8][9]. Therefore, developing a low-complexity method is motivated to mitigate interference via the adaptively adjusting of spatial beam patterns.
Machine learning (ML) technologies about extracting hidden insights from training data has been regarded as the basic discipline for cellular networks. The deep neural network (DNN) learning as model-free and data-driven approaches have been studied extensively to reduce the computational complexity with available training. For example, the autoencoders are used in [10] to model an end-to-end communication system comprising the encoding, channel and decoding blocks for MIMO interference channels. The adoption of feedforward deep neural networks [11] and convolutional neural networks (CNN) [12] are studied for joint channel estimation and sensing, respectively. However, IET Commun. 2021;15:1015-1027.
wileyonlinelibrary.com/iet-com the DNN models generally require generating optimal labelled data for offline training, which limits the application in real-time systems. On the other hand, the reinforcement learning (RL) as another main branch of ML can be a feasible option for real-time decision-making tasks, such as resource allocation in heterogeneous networks [13] and IoT systems [14]. The machine learning algorithms have also been investigated to enhance the beamforming techniques. In [15], the authors leveraged offline learning to identify the vehicle's optimal beam pair index based on the past beam training data by using support vector machine (SVM) classification. The selection of codewords for analogue beamforming is modelled as an offline SVM multiclass-classification problem in [16], where the training dataset consists of a large number of samples of the millimetre-wave channel. In [17], the authors examine the interaction between codebook design and the performance exhibited by the DNN-based analogue beam selection methods through offline angle-of-arrival (AoA) data. The authors in [18] propose a deep learning scheme for the downlink beamforming optimization to solve the SINR balancing and power minimisation problems. By assuming the channel impulse response (CIR) model, the work [19] proposes an adaptive recursive least squares (RLS) CIR prediction approach to predict future orthogonal frequency division multiplexing (OFDM) block CIR coefficients. In [20], the authors utilize reinforcement learning to achieve a mean-field equilibrium which is intended to optimize the hybrid precoding matrix for the unmanned aerial vehicles (UAV) to ground user links. In [21], deep reinforcement learning (DRL) technique was covered, a DRL-based method called PrecoderNet is proposed to design the digital precoder and analogue combiner for beamforming. Although the precoding techniques have been well studied in the above works, significant beam misalignments will occur frequently due to the mobility. This problem becomes particularly challenging for mmWave vehicle communications, especially with multiple users and inter-cell interference.
The Multi-Armed Bandit (MAB) online learning algorithms are used in [22,23] for beam alignment for the mmWave link with a transmitter and a receiver. Beam switching architectures for mmWave vehicle communications have been studied in [24] to reduce the amount of repointing required by leveraging position prediction. The authors in [25] proposed a MAB online learning scheme that allows the base station/access point to explore, learn from, and adapt to their surroundings autonomously to select multiple beams for multiple users simultaneously based on the available contextual information. The above works focus on the single-cell situation. For multi-cell mmWave vehicle communications, the increased complexity will stem from the additional learning required for inter-cell interference management. The supervised SVM can be employed to efficiently handle interference classification, but the offline learning may not be able to adapt to environmental change.
The main contribution of this paper is to introduce SVMbased active learning for interference-aware multi-cell mmWave vehicle communications. In particular, the proposed method starts with aggregating the traffic patterns data, followed by an adaptive beam selection process that uses a fuzzy support vec-tor machine active learning algorithm to adaptively refine the selection scheme in each cell. Our empirical and synthetic studies show that the proposed method performs better than the existing MAB learning algorithm, where the beams are greedily picked with the highest upper confidence bound. In addition, we demonstrate that the proposed method provides robust performance for initialization condition in adaptive learning. Note that, the desirable property of the proposed SVM-based active learning algorithm comes from the fact that the adaptive beam selection problem in each step can be transformed to a dual convex optimization problem and is primarily scaled by the size of the training data, rather than the dimensionality of the data vector. Therefore, the proposed algorithm not only uses the traffic pattern information but also offers an explicitly defined unique optimum that is easily solvable by most software. Consequently, the SVM-based active learning method is particularly suitable for the multi-cell mmWave beam selection with interference awareness.
In order to use the aggregated data from the historic traffic patterns, we introduce the exploration-or-exploitation learning strategy to accelerate the beam selection process. Particularly, if two traffic patterns share some commonality, they can be treated to use the identical beam selection strategies. In this paper, the similarity is measured by the cosine similarity of the vectorized traffic pattern matrices. The basic intuition is that the SVM-based active learning algorithm is performed for every similar traffic patterns. We label these results according to their relative performance and store the data into the database. Consequently, the exploration-or-exploitation learning scheme can exploit the beam selection schemes when with a certain similarity or explores new schemes for unrecorded patterns, respectively, We evaluate several hyperparameter tuning (HT) methods for the proposed model and employ the cross-validation (CV) technique to select the optimal parameters for the proposed SVM-based active learning model. Finally, numerical simulations are performed to evaluate the performance of the proposed algorithm and the impacts of hyperparameters. We find that the proposed algorithm with Bayesian hyperparameter tuning can achieve near-optimal performance with low computational complexity.
The rest of the paper is organized as follows: In Section 2, the system model is described. In Section 3, we introduce the SVM-based adaptive beam selection method. In Section 4, we present the online learning algorithm by considering the similarity of dynamic traffic patterns. Section 5 discusses the hyperparameters tuning methods. In Section 6, the simulations are constructed to evaluate the performance and the impacts of hyperparameters. Finally, we conclude the paper in Section 7.

MmWave vehicle communications networks
Consider a mmWave vehicle communications network where the passing vehicle wants to communicate with the MmWave requires beamforming to compensate for the high path loss. As illustrated in Figure 1, the mmBS is equipped with the steerable directional antenna array and can use a finite set of distinct, orthogonal beams to cover the road in the cell [26]. For instance, the similar beam switching architectures for mmWave vehicle communications have been studied in [24] to reduce the amount of repointing required by leveraging position prediction. Also, the MAC layer of IEEE 802.11ad introduces the virtual antenna sectors, discretizing the azimuth plane based on the antenna beam-width [27]. The mobile vehicles are equipped with both LTE and mmWave interfaces [28] to guarantee the connection to the mmBSs and obtain the possible high-speed transmission services in a millimetre wave base station (mmBS) cell. For instance, the large file intermittent download service is considered. The GPS information can be gathered through the LTE interface to get the direction of arrival (DoA) information [28]. The inter-cell interference is considered for the mmWave vehicle communications.
Assuming there are C adjacent mmWave cells in a certain region and the c-th cell is denoted as c ∈ C = {1, … , C }, the mmBS can use a limited set  of beams. The transmit power of the mmBS is denoted as P, but at a time, the mmBS just can use a subset of K beams, where K ∈ ℕ, K < || is a fixed value which is related to hardware constraint. This limitation could be imposed by the mmWave beamforming technique, the hardware characteristics and cost constraints etc. Then, a mmBS can serve at most K vehicles/user equipments (UEs) simultaneously. The mmBS updates its beam selection in regular time periods. We use U t c to represent the current number of UEs in cell c at time t .
Hence, at time t , there are in total U t = ∑ C c=1 U t c UEs/vehicles in C cells. Let u t c,m , c ∈ C, m ∈ U t where U t = {1, … , U t } represent the m-th UE in cell c at time t . Define M t = max c∈C U t c as the largest number of users of all cells at time period t .
We construct a traffic status/pattern matrix S t = [s t 1 , … , s t C ] ′ where every column s t c = [s t c,1 , … , s t c,M t ] ′ , c ∈ C for c ∈ C at time t . We use s t c,m , c ∈ C, m ∈ M t to represent location information of UE u t c,m . At time t , for the cells whose number of UEs is smaller than M t , the corresponding elements in S t is set to 0.
In each period t ∈ {1, … , T }, the purpose of all mmBSs are to maximise the amount of expected received data at the vehicles over all periods. We construct a beam selection indication ] ′ , c ∈ C represents the beam selection scheme for cell c ∈ C at time t . The element of b t c takes the value of 1 or 0, which means the user is allocated for the beam or not. Note The noise power is denoted as N = n 0 W where W represents the bandwidth and n 0 is the power spectral density of the noise. Then, we can get the SINR of UE u t c,m , In general, the SINR of a vehicle using a beam during one period is a random variable that depends on the environment of the mmBS (e.g. blockages etc.) and the inter-and-intra cell interference. Then, the average data rate of the vehicle in period t is, In the above equation, the expectation is taken with respect to the randomness of blockages and the interference. We call the random variable r B t c,m as the beam performance. The main notations of this paper are summarized in Table 1.

Problem formulation
Given an arbitrary sequence of vehicle arrivals with beam selection strategies B t , t ∈ T, the total expected data rate of the The iteration counter of SBOS Algorithm ITER Total learning iterations/periods vehicles in period t can be calculated as, The mmBSs aims at selecting a subset of beams which maximises the expected total data rate for all vehicles, that is, maximises the sum of the expected beam performances. We assume that the mmBSs is unaware of its surroundings, and interference. Hence, the mmBSs should learn over time the best subset of beams through aggregated data. This will significantly reduce the channel estimation overhead and access delay for highly time-varying traffics, thus, will be more suitable to the future time-sensitive vehicle services.
In the following, we will propose a data-driven beam selection method for mmWave base stations to allow the base stations to perform the beam selection autonomously by ensuring a dynamic balance between exploration and exploitation through SVM-based adaptively learning, that is, in each round, the algorithm can pick the action based on the observed information rate, and the learning model for the possible interference and blockages. The logic to solve the problem is illustrated in Figure 2.

ADAPTIVELY SVM-BASED MULTI-CELL INTERFERENCE CLASSIFIER
In this section, we propose an SVM-based learning model with multi-cell interference awareness to reduce the beam selection space gradually for a certain traffic pattern. Specifically, each subsequent beam selection strategy is customized to a certain traffic pattern, while accounting for possible interference. The resulting cost function turns out to be equivalent to an SVM for regression. The performances of the beam selection schemes for all traffic patterns will be stored into the database for the next exploration-or-exploitation learning phase.

3.1
The data gathering and samples labelling Given a BSI matrix or a beam selection strategy, the vehicle is able to measure beam performance then report it back to the associated mmBS. Also, the LTE BSs periodically record the locations of the vehicles/users into the database. Recall that, we have denoted the beam selection scheme for a cell c at time t as a vector For example, b t c = [1, 1, 0, 1] ′ means that there are four vehicle/UEs in cell c at time t and the mmBS selects the beam to serve UEs 1, 2 and 4.
We define vec() as a zero-complementation and vectorization function, which means that vec(B t ) complements the matrix B t with a size of C × M t to a C ×M matrix with zeroes and then vectorize it, whereM is the greatest value of M t in the database. Further, let B it be a BSI matrix and let  t be the beam allocation scheme space of all vec(B t ). Then, any x t i ∈  t represents a beam selection scheme sample. In particular, in the first iteration, the algorithm will generate l samples to form the initial scheme set  0t and label these l schemes according to their individual performance, that is, "1" for those schemes whose performance is above the average performance and "-1" vice versa. One major advantage of labelling the sample strategy schemes is that such information can be leveraged to construct the pool of candidate strategy profiles to be evaluated for the consideration of exploitation or exploration in future stages. The rationale for having the majority of profiles in the candidate pool labelling with "1" and "-1" criterion is that we achieve the adaptive learning about the multicell interference by easily using the observed information rate.
We represent such a labelling process as a function

SVM-based learning model and its solution
The formulation of SVM model can be viewed as a classification problem that results from the Vapnik Chervonenkis (VC) theory which states that to minimise the predict error, a balance should be taken between minimising the training error and the model complexity [29]. Based on the beam allocation schemes {x t i } l and the corresponding labels {y t i } l , y t i ∈ {−1, 1}, then, the formulation of an SVM classifier is, where w is the weight vector, b is the bias value and¸is the slack variable vector.  is a hyperparameter to control the relative weight of maximising the margin, and ensuring that most samples have a least margin to the hyperplane. Generally, the training data may be not totally separable, thus, the slack variables i ≥ 0, ∀i = 1, 2, … , l is introduced in the formulation. The training data samples that are nearest to the decision boundary are called support vectors and the distance is given by 1 ‖w‖ . The idea behind the SVM formulation is to maximise the margin which equals to 2 ‖w‖ . Obviously, such an optimization problem is equivalent to minimising ‖w‖. Generally, the solution to equation (2) will construct a hyperplane which separates the data into two half-spaces with the widest margin. A classification rule induced by h(x) for samples x t i is denoted as However, for the problems where the dimension of vector x is high, the classifier problem (2) may become complicated to be solved. Toward this issue, we can introduce the dual problem of ALGORITHM 1 Picker algorithm (2) in problem (3), and by applying the kernel trick, the original feature space can be mapped into to a higher-dimensional feature space where the training set could become more separable [30]. where However, various types of kernel functions can also be adopted, for instance, the Polynomial Kernel Function and the Gaussian Kernel Function (Radial Basis Function), which are given, respectively, in the following: where p ≥ 2 is the polynomial power and 2 is the variance.
In this paper, we consider to use the Gaussian kernel (Equation (4)), and then, the hyperplane achieved by solving the dual problem (3) can be obtained as, where i is the solution to problem (3). Based on the boundaries identified by the hyperplane, we propose the beam selection scheme pick algorithm. For the sake of simplicity, the algorithm is named as Picker algorithm and is presented in Algorithm 1. Let the sample set X t denote the set containing all the labelled samples; h_set t denote the boundary set. Then, the sample space  t can be reduced by Picker algorithm because it picks out the samples with the maximumminimal distance to X t in  t constrained by the boundary set h_set t . In other words, the observed information rate provides constraint(s) that makes the feasible space of beam selection schemes smaller.
We denote the process of the SVM-base learning as a function SVM({x t j }, {y t j }, parameters) where {x t j }, {y t j } are beam selection schemes and the corresponding labels. Particularly, parameters are the well tuned hyperparameters which will be discussed in detail in Section 5. By the way, the classical SVM can be easily solved by many softwares. For example, the libsvm package [31] for MATLAB and the MATLAB built-in fitcsvm package can be used to solve the dual problem (3) to get the boundaries or hyperplanes.

SVM-BASED BEAM ONLINE SELECTION ALGORITHM WITH MULTI-CELL INTERFERENCES
Conditional on the candidate pool of beam selection strategy profiles obtained for certain traffic pattern, this section proposes the SVM-based beam online selection (SBOS) algorithm which accounts for all possible traffic patterns. We first define the similarity of traffic pattern S t and S t ′ as s(t , t ′ ) in the following equation: where S t and S t ′ are status matrices as defined in 2.2, vec(S t ) represents the complemented and vectorized S t . Equation (6) measures the degree of the cosine similarity between vec(S t ) and vec(S t ′ ). We set as the threshold to the judge traffic pattern similarity.
Recall that we need to gather l samples and their labels in SVM-based learning model. Before going into the feedback gathering stage, we set a flag gath as True and a flag gath_ f as False. Every time we encounter the similar traffic pattern identified by the similarity threshold , we can apply one of these l beam selection schemes generated by Picker algorithm, and then record the average information rate. Until all of these schemes are applied, the algorithm will set gath as False and gath_ f as True. Given a large number of samples and the corresponding labels associated with different traffic patterns, the algorithm runs more efficiently in an online way by using the aggregated data in the database. In the following, we present the main steps of the proposed SBOS Algorithm. Iteratively, the algorithm repeats the steps until the termination criteria are satisfied. The termination criteria of the algorithm are that the Picker (Algorithm 1) cannot pick out new samples or reaches the maximum iteration count ITER. The process of the algorithm is summarized in Algorithm 2.
1. For the traffic pattern generated at time t , we record the traffic pattern information by S t . Next, we check whether there is a record for the similar traffic pattern in the database. If there is, we reload the last learning data (line 2-3 in Algo-ALGORITHM 2 SBOS algorithm rithm 2). If the traffic pattern in the database is not the same, we initialize the learning (line 4-8 in Algorithm 2) and enter the exploration or exploitation stage depending on the learning records (line 11-15 in Algorithm 2). In either of these two stages, the algorithm will generate l samples to form the initial scheme set  0t . 2. Gather the performance of the schemes and label these schemes in order to enable the SVM model. For example, in the second iteration, after the feedback gathering stage (line 16-17 in Algorithm 2), the labelling function is called to label these samples (line 18-19 in Algorithm 2) in  0t based on the cellular system feedback as introduced in 3.1. Then, after hyperparameter tuning, cross-validation and training, the SVM model learns from these samples. A decision boundary h 1t (x) = 0 will be generated from the data set "Good" beam selection scheme "Bad" beam selection scheme Sample size = 3, iteration 1 Sample size = 4, iteration 2

FIGURE 3 SBOS algorithm, pick and classification
"Good" beam selection scheme "Bad" beam selection scheme

Decision boundary at iteration p Decision boundary at iteration p<q<2p
Next sample only in the intersection of two preferred areas We illustrate the boundary generation process in Figure 3.

Generate the scheme set for the next iteration (line 31 in
Algorithm 2) and gather the performance (line 35 in Algorithm 2). For example, suppose that in the first iteration, m out of l samples are labelled as "1" which means l − m schemes are reduced. Then, in the second iteration, Picker algorithm picks out l − m schemes in the scheme space constrained by the decision boundary set. Each scheme is selected so that its minimum distance to all of the other labelled scheme is maximised (line 2 in Algorithm 1). To speed up the learning process, the decision boundary is recorded after REC iterations where REC is an adjustable parameter to balance the exploitation (small REC) and the exploration (large REC) (line 24-28 in Algorithm 2). Note that, the recorded boundaries will be used when rec = REC in order to determine the next sampling space. The space reduction procedure is illustrated in Figure 4. 4. Label the schemes labelled "1" and the new generated schemes (line 38 in Algorithm 2). For example, in the second iteration, after the feedback gathering stage, instead of labelling all 2l − m schemes, we will label these of the newly generated samples, and re-lable the schemes that have been labelled as 1. 5. Train and learn the updated scheme set (line 29-30 in Algorithm 2). For example, in the third iteration, by learning these new labelled l schemes and l − m schemes labelled as "-1" from the second iteration, a new decision boundary will be generated upon the current dataset {x t i , y t i } 2l −m and then used to make the scheme space smaller. SBOS algorithm is mainly based on SVM algorithm; therefore, it's computational complexity is mainly dominated by the computational complexity of the SVM algorithm. In this paper, SVM algorithm is implemented by libsvm which consists of two parts (svm_train and svm_predict) and procedures including svm_save_model and svm_load_model etc. According to [32], both svm_train and svm_predict have the highest complexity of O(l 3 ), so the complexity of SBOS algorithm is O(l 3 ) where l is the number of input data in problem (2), that is, pairs of beam allocation schemes and corresponding labels. Recall that the termination criteria of SBOS algorithm are that the Picker (Algorithm 1) cannot pick out new samples or reaches the maximum iteration count ITER. So, in the worst case, SBOS algorithm will execute ITER times for a specific traffic pattern. Because ITER is a constant, so the computational complex of SBOS algorithm is O(l 3 ). Note that the SVM algorithm is performed only for different traffic patterns, and l will be not large and can be set in the region of [6,12] in our setting.

HYPERPARAMETERS ADJUSTMENT FOR SBOS ALGORITHM
Note that there are two hyperparameters that need to be tuned in the algorithm, that is, the  and , which will impact the performance of the algorithm a lot. In machine learning, hyperparameter optimization or hyperparameter tuning is the problem of choosing a set of optimal hyperparameters for a learning algorithm [33]. Generally, the same kind of machine learning model can generalize different results when adopting different hyperparameters. In this section, we will discuss how to tune the hyperparameters for the algorithm. In particular, we first introduce the widely used hyperparameter adjustment methods, Grid search, Random search, Bayesian optimization, and then, discuss how to use cross-validation to tuning the parameters.

Parameter tuning methods
According to problem (3) and Equation (4) in the SVM model, let˘= (, ). Define a hyperparameter optimization problem in the following: where  is the cross-validation accuracy of SVM mode when applying hyperparameters˘, w is the weight vector introduced in problem (2) and {(x t i , y t i )} L is the cross-validation beam selection scheme sample set, that is, X t in the Algorithm 2. Each specific pair of these two hyperparameter is denoted as˘i = ( i , i ), where i denotes the hyperparameter adjustment step.
The objective of the iterative global optimization of a function f ∶ ℒ → ℝ is to find the sequence of points that converges to the optimal̂, f (̂) = sup˘∈ ℒ f (˘). In other words, we can approximate the value of f from previous steps [34].

Grid search
Grid search is simply an exhaustive searching through a manually specified subset of the hyperparameter space. Generally, a grid search algorithm must be evaluated according to certain performance metrics, that is, the cross-validation on the training dataset [35] or on a held-out validation dataset.

Random search
Random search selects the hyperparameters randomly in the parameter space instead of exhaustive enumeration of all possible values. Sometimes, it can outperform the Grid search, especially when only a small number of hyperparameters affects the final performance of the machine learning algorithm. In this case, the optimization problem is said to have a low intrinsic dimensionality. Additionally, Random search can use the prior knowledge by specifying the distribution of the samples [36].

Bayesian optimization
Bayesian optimization builds a probabilistic model of the function mapping from hyperparameter values to the objective evaluated on a validation set. Bayesian optimization tries to gather observations revealing as much information as possible about this function and in particular, the location of the optimum by iteratively evaluating a promising hyperparameter configuration based on the current model, and then updating it. The main idea behind Bayesian optimization is to use all of the information gathered in previous iterations for performing the next step in order to minimise the number of classifier trainings. In other words, we can solve the optimization problem directly by performing an adaptive process that on one hand tries to maximise the objective function and on the other hand samples the possible˘space.
For Bayesian optimization, if the exact form of f is known, then the optimization procedure would be much simpler. Unfortunately, f is a black-box function with a very complex structure, expensive even to evaluate. However, some simplifying assumptions for f might make a problem solvable. Assume that f can be represented as a sample from a probability distribution over a family of functions f ∼ P( f ), f ∈ ℱ. The expectation over the loss function n can be expressed as, where n (fi) = f (̂) − f (˘n), and̂is the optimal hyperparameters {C * , * }.
Given additional assumptions about the prior distribution of P, very efficient solutions for the entire process can be provided. For example, assuming that the target function f is a sample of the Gaussian stochastic process. A crucial advantage of such a simplification is that manipulating the distribution over such functions will be relatively easy, in particular, the posterior distribution is also a Gaussian process. Consequently, in each iteration a calculated posterior can be used as an informative prior for the next iteration, creating a relatively simple iterative procedure.

Implementation of hyperparameters tuning
For the SVM-based learning, by adopting different kernel functions, we can apply the "kernel trick" to map the original feature space to a higher-dimensional feature space to reduce the complexity of the SVM problem. Here, we choose kernel function (x t i , x t j ) (Equation (4)). A soft-margin SVM classifier with a RBF kernel has at least two hyperparameters that need to be tuned, that is, a regularization constant  and a kernel hyperparameter ⨏. Here, is a critical parameter. A large leads to a high bias and low variance model, and vice versa. Another parameter  is used for the soft margin cost function, which controls the influence of each individual support vector. A large  may cause overfitting and a small  would cause underfitting. In a word, tuning hyperparameters and  is important for achieving the good performance of SBOS algorithm.
According to empirical studies, both and  can be found within the set {2 −10 , … , 2 10 } and we will find the best pair of and  by cross-validation. Cross-validation is one way to ensure the robustness of the model. The basic idea is that a portion of data (called a holdout sample) can be held back. The bulk of the data is trained and the holdout sample is used to test the model. This is different from the classical method of model testing, which uses all of the data to test the model. By using the cross-validation, we can make full use of the limited training data set. After performing the parameter search on training data set, we finally determine the best pair of parameters denoted as ( * , C * ).

SIMULATION AND RESULTS
In this section, we evaluate the performance of proposed SVMbased beam online selection algorithm with multi-cell interferences. First, the system parameters and channel model for simulation settings are given. Then, we consider various factors that may affect the performance of the proposed algorithm and compare it with some typical schemes.

Simulation setup
This study uses the powerful SVM tool package libsvm [31] and the fitcsvm package of MATLAB. The platform we used is an HP Laptop 14s-dp0xxx, AMD 3700U 4-Core 8-Thread, 16GB RAM and the software is Windows 10 amd64 Home Ed. v2004, MATLAB 2020a. In this simulation, we deploy 4 mmWave cells, and we set l = 5, ITER = 14. The channel parameters used in this simulation are shown in Table 2. We consider two kinds of blockages: temporary blockage and permanent blockage.

Performance analysis
In the simulations, we provide a thorough performance analysis by comparing the proposed SBOS Algorithm to the following methods: • Optimal. The optimal method is assumed to have the full prior knowledge of the multi-cell interference for all traffic patterns. Thus, it is able to choose the best subset of beams and provide the upper bounder of the beam performance for the current traffic pattern. • UCB. The MAB model has been widely adopted for studying many network problems with unknown parameters, that is, resource allocation, crowdsourcing etc. The classical version of the Multi-armed bandit problem is formulated as a system of n arms (or actions), each having an unknown distribution of rewards. One kind of purpose of MAB is to maximize the reward accumulated by the agent while playing the arms. This objective highlights the balance between staying with the arm that gave highest reward in the past and exploring new arms that might give higher reward in the future. UCB is one of the popular algorithms that can balance the exploitation and exploration in such bandit problems [37,38]. Specifically, in our simulation, UCB selects the arms/beams, with the highest estimated upper confidence bounds based on the history data. However, when UCB has no experience data, or in other words, it has no idea of the environment and the settings of arms, it will choose the arm randomly. • Random. This algorithm selects random beams based on a uniform random distribution.

6.2.1
Performance comparison with a certain traffic pattern Figure 5 shows the average information rate at each iteration with the different parameter REC. The REC is set to 3 and 5, Average information rate (Gbit/s/vehicle) Average information rate of the proposed method Average information rate of UCB method Average information rate of Random method

FIGURE 6
Performance of the proposed beam allocation method for a specific cellular status respectively. We can see from the figure that by using the proposed algorithm, we can get near-optimal average rate of the whole system after 12 iterations. Additionally, it can be observed that when REC = 3, the algorithm is more stable but achieves a lower performance compared with the situation of REC = 5. This is due to the reason that when REC = 3, Picker algorithm uses the historical boundary set every 3 iterations and is more likely to exploit the existing information to generate new scheme set. While for REC = 5, Picker algorithm trends to explore the new generated boundary set to generate new scheme set. Figure 6 illustrates the performance when adopting different beam allocation methods. In the figure, REC is set to 3. It can be observed from the figure that the proposed algorithm would always obtain a better performance in terms of the average information rate compared with the Random and UCB methods. Specifically, the gap between the proposed algorithm and UCB methods could be around 0.3 Gbit/s/vehicle. For instance, at iter = 11, the proposed algorithm obtains the  Gbit/s/vehicle. The gap is about 0.35 Gbit/s/vehicle. Also, the performance of the random algorithm fluctuates rapidly, and the performance of UCB fluctuates slowly. Thus, the proposed algorithm is more stable than the other two methods. Here, we present an intuitive idea to enhance the fairness among different users by using a more complex labelling strategy. For example, when labelling beam selection schemes, the ratio of the worst and best information rates can be sed as extra labels. However, theoretical study for the learning scheme to balance the fairness and the system performance is an interesting issue, goes beyond the scope of this article and be left for future work. Table 3 gives an example to show the change of the information rate of a user/vehicle. We select a user with the lowest information rate initially, and present the rates of the user in several iterations. It is observed from the figure that although the user has the lowest rate at the beginning, its rate will rises quickly due to the movement.

6.2.2
Impact of the hyperparameter tuning methods Figure 7 shows the performance of the proposed algorithm with different HT methods. REC is set to 3 and we apply three HT methods by using two different SVM-related software packages, that is, libsvm and fitcsvm. The figure shows that the proposed algorithm using libsvm package achieves the highest performance after the 12th iteration, and the proposed algo-

FIGURE 8
Performance of hyperparameter optimization methods of the proposed SVM-based beam allocation method for a specific cellular status using package fitcsvm rithm using the other three HT schemes terminate at the 8-th iteration, but obtain a lower performance. It also reveals that the proposed algorithm using the Bayesian optimization HT could obtain the relatively good performance after just 8 iterations. Table 4 lists the results of the proposed algorithm with different HT methods. It shows the number of iterations for the algorithm converges, the average and maximum information rate of schemes achieved by the algorithm, and the running time of the algorithm. We can observe from the table that the Grid Search outperforms the Random Search and runs slower. The reason is that the Grid Search will try every possible pair of hyperparameters, while the Random Search just chooses them randomly. The table also illustrates that the Bayesian HT method outperforms the Random Search and the Grid Search, since the Bayesian HT method can make use of the information gathered in previous steps. Figure 8 reveals the hyperparameter adjustment process of the Grid Search, the Random Search and the Bayesian optimization method. The figure shows that the Bayesian method outperforms the other two methods in minimising the objective loss function in Equation (8). Also, the Bayesian HT method achieves the minimum loss value of 0.048. As a result, the Bayesian HT method is more efficient than the other two methods.
Corresponding to Figure 8, Figure 9 plots the hyperparameter adjustment process of Bayesian optimization HT method from step 5 to step 11 in detail. We can observe from the figure that by adjusting hyperparameters and using the crossvalidation, the Bayesian optimization HT method finally converges to the optimal hyperparameterŝ= ( * , * ), for which the minimum loss value is 0.048.

6.2.3
The multi-cell interference analysis Figure 10 shows the average interference power of the system which is defined by the total interference of the system divided by the number of cells. We can see from the figure that, at the beginning of the algorithm, the average multi-cell interference is around −55 dBm. After 14 iterations, the interference has been reduced to around −70 dBm. This is due to the reason that the proposed algorithm finally selects the suitable beam set for the traffic pattern based on the corresponding samples and labels.
The figure also reveals that the Bayesian HT method achieves the similar performance compared with the Grid Search HT

FIGURE 12
The performance of the proposed SVM-based beam allocation method as the knowledge of cellular status raising up method implemented by libsvm, but the Bayesian HT method is more stable and faster. Figure 11 plots the average interference power of each cell, which is defined by the total interference in a cell divided by the number of UEs in it. The figure shows that the proposed algorithm can reduce the average interference power gradually. For example, in cell 4, at the beginning of the algorithm, the average interference is around −55 dBm and after 14 iterations it has been reduced to around −74 dBm. Also, the average interference power of each cell reduces gradually, and then converges to a stable value for a certain traffic pattern.

6.2.4
Performance analysis of SBOS algorithm with dynamic traffic patterns Figure 12 analyses the average information rate with dynamic traffic patterns by using the proposed SVM-based beam online selection algorithm. At the beginning, UCB algorithm has no history experience; it will also choose the beams randomly. We can see from the figure that the initial randomness of the UCB algorithm may lead a better performance at the beginning. However, after t = 40, the performance of SBOS algorithm is above that of UCB. Also, after t = 170 iterations, the proposed algorithm approaches to the optimal algorithm, but there is still an obvious gap between the UCB method and the optimal algorithm; this is because the UCB may take a long time to explore all the arms. Moreover, by employing Picker algorithm, SBOS algorithm does not have to try every selection schemes, so that it can converge to the optimal algorithm rapidly.
This also indicates that SBOS algorithm learns the beam selection strategy with multi-cell interference more efficiently than the other two methods. The reason is that SBOS algorithm is based on SVM classifier learning algorithm which can obtain a high classification accuracy even with very limited number of samples. Consequently, when selecting each beam selection strategy based on such samples, the negative impact of response errors in the process of adaptive strategy selection would be alleviated rapidly.

CONCLUSION AND FUTURE WORK
This paper has introduced an SVM-based active learning scheme for interference-aware multi-cell mmWave vehicle communications. In particular, we have used a fuzzy support vector machine active learning algorithm to adaptively refine the selection scheme in each cell. Specifically, if two traffic patterns share some commonality in terms of the cosine similarity of the vectorized traffic pattern matrices, they have been treated to use the identical beam selection strategies and the corresponding results have been labelled and stored into the database. We have labelled these results according to their relative performance and store the data into the database. Consequently, the exploration-or-exploitation learning strategy can exploit the beam selection schemes when with a certain similarity or explore new schemes for unrecorded patterns respectively. Our empirical and synthetic studies have shown that the proposed method performs better than the existing MAB learning algorithm where the beams are greedily picked with the highest upper confidence bound. Moreover, the results have also illustrated that the proposed algorithm with Bayesian hyperparameter tuning can achieve near-optimal performance with low computational complexity. In a nutshell, the proposed scheme has demonstrated the capability of SVM-based online learning on the data-driven mmWave vehicular communications with multi-cell interference coordination for future 5G/B5G vehicle applications. An interesting extension of this work is to study the fairness issue among different users both from an algorithmic design perspective and also from a mathematical stand point. Also, it is a challenging and open problem to coordinate the multi-cell beam allocation in a distributed fashion, in order to speed up the learning process. Topics about the attack in the future collaboratively intelligence-enabled networks need to be pondered and are very important as 5G/B5G networks are deployed, especially for vehicular communications.