On temperature ‐ dependent small ‐ signal modelling of GaN HEMTs using artificial neural networks and support vector regression

Machine learning ‐ based efficient temperature ‐ dependent small ‐ signal modelling approaches for GaN high electron mobility transistors (HEMTs) are presented by the authors here. The first method is an artificial neural network (ANN) ‐ based and makes use of the well ‐ known multilayer perceptron (MLP) architecture whereas the second technique is developed using support vector regression (SVR). The models are trained on a large set of measurement data obtained from a 2 ‐ mm GaN ‐ on ‐ silicon device operating under varying operating conditions (bias voltages and ambient temperatures) over a wide frequency range of 0.1 to 20 GHz. An excellent agreement is found between the measured and the simulated S ‐ parameters for both models over the entire frequency range. It is identified that the training process and prediction capability of ANN is superior to SVR. However, the SVR is more robust when compared to the artificial neural network (ANN) in term of its sensitivity to local minima and uniqueness of the final solution. Subsequently, the performances of the proposed ANN ‐ and SVR ‐ based models are improved by incorporating particle swarm optimization (PSO) in the model development process. The PSO improves the uniqueness of the ANN model whereas it enhances the performance of the SVR by optimising its control parameters. The proposed models exhibit very good accuracy and scalability.


| INTRODUCTION
GaN high electron mobility transistor (HEMT) possesses excellent features such as higher electron saturation velocity, electron mobility, breakdown voltage and operating temperature [1,2]. These features make it an optimal device for the design of advanced communication circuits such as power amplifiers (PAs) and low noise amplifiers (LNAs) [3][4][5]. However, the GaN devices face multiple technical challenges such as reduction in the output power, gain and power efficiency due to thermal effect and DC-RF dispersion [6,7]. Furthermore, the lower resistivity and thermal conductivity of Si (silicon) with respect to other substrates, such as SiC and Diamond, result in relatively inferior RF characteristics for the HEMTs due to self-heating and substrate-loading effects [8]. This self-heating in addition to the ambient temperature has strong impact on the small-and largesignal characteristics of GaN HEMT devices. It is also important to note that the reliability of circuits designed without consideration of these heating effects is also questionable.
Keeping the above aspects in perspectives, a number of electrothermal modelling techniques of GaN devices subjected to large-and small-signal induced thermal effects have been reported [9][10][11][12][13][14][15]. In recent times, the artificial neural network (ANN)-based modelling techniques are getting popular for both large-signal [9][10][11][12] and small-signal modelling [13][14][15]. One of the most important features of ANN is its ability to learn complex nonlinear relationship and adapt well to the new data [16]. In addition, ANN has the ability to learn and mimic the behaviour of devices, without looking into physics of the device, and hence finds usefulness in the modelling of GaN HEMT transistor device having strong nonlinearities. The support vector regression (SVR) is also being employed for solving RF and microwave device behavioural modelling problems [17][18][19][20][21][22][23]. The SVR is often preferred as it looks for global optima in contrast to the conventional structural risk minimization (SRM ) algorithms, which look for local minima as in the case of ANN. The ANNbased modelling for GaN-on-SiC HEMT using S-parameter measurements at a limited temperature range of 20 to 80 o C was reported in [13]. The model consisted of two hidden layers with 20 to 28 neurons in each layer and this complicates the model implementation in computer-aided-design (CAD) tools. The other ANN-based models either considered only the voltage and frequency dependence or used simple linear approach [14,15] to predict the temperature-dependence, which could be acceptable at low temperatures. However, at high temperatures, some S-parameters such as S 22 show nonlinear behaviour [11] and the reported techniques may not be scalable for such scenarios.
This paper proposes small-signal modelling techniques based on ANN and SVR for GaN-on-Si HEMT device strongly impacted by the thermal effects. The ANN-based technique utilises an enhanced procedure to implement a double hidden layer of five neurons. This drastically reduces the model complexity, simplifies its implementation and enhances simulation-speed of the CAD implemented model. This technique, developed using wide range of temperature data of 25 to 175 o C, provides more insight as compared to the earlier reported ANN-based small-signal modelling techniques [13,15]. Here, the temperature is the third input in addition to the voltage and frequency achieving excellent agreement between the measured and the modelled performance. During the model development process, the back propagation (BP)-based procedure is used to train the ANN model. One of the main limitations of BP is its initial guess dependency. The final solution may not converge to provide best fitting if the initial values of the model parameters (weights and biases) are far from the optimal ones. To overcome this limitation, particle swarm optimization (PSO) [24,25] has been used in conjunction with the ANN here. PSO as a global technique is used to initiate the training process by exploring the search space to find the area of optimal solution. Then in the next phase the BP as a local technique could provide stronger exploitation with higher rate of convergence to find the optimal final solution. This implemented PSO-BP technique improves the efficiency and accuracy of the NN small-signal modelling and this could be considered as an additional contribution with respect to the previously reported works. Furthermore, this paper also develops for the first time a SVR-based temperaturedependent small-signal modelling technique for GaN HEMT. It has been reported that SVR-based model with properly selected parameters could provide almost similar performance as compared to the ANN-based model [26]. However, inappropriate selection of the parameters could result in an over-or under-fitting and this may deteriorate the SVR-based model performance. This necessitates careful user intervention to set the parameters of SVR to overcome this issue of over-and under-fitting. This paper therefore makes use of PSO with the SVR model to address this problem.
The next section of this paper succinctly describes the device and the relevant measurement. Sections 3 and 4 provide the model development process using ANN and SVR respectively. Results and discussion are provided in Sections 5 and 6, whereas Section 7 concludes the paper.

| DEVICE PHYSICS AND CHARACTERISATION
The gate width of the considered device is 2 mm and is composed of 10 fingers of 200-µm gate-width. A photograph of the coplanar waveguide (CPW ) on-wafer device and its general structure is shown in Figure 1 [27]. This device has been grown on Si substrates and fabricated by Nitronex Corporation using NRF1 process. The performance of this device has been improved by adopting source field plate technique. The device is characterised using vector network analyser (VNA) and the data are represented in terms of real and imaginary parts of S-parameters.
The device is mounted on a temperature controlled thermal chuck to carry out the S-parameters at different external temperature and its representative performance in terms of real and imaginary parts of S 22 are shown in Figure 2. The internal temperature of the device at active bias condition is the summation of the temperature defined by the thermal chuck and the device self-heating. The measurements were carried out at the grid of bias conditions listed in Table 1. The same grid of measurements is conducted at different external temperature starting from 25 to 175 o C in steps of 25 o C. All measurements have been taken over the frequencies from 100 MHz to 20 GHz.

| Theoretical preliminaries
ANN consists of interconnecting layers of neurons as their fundamental constituent. These neurons can learn simple and sophisticated patterns associated with any data if they are fed with balanced and sufficient training data. Having learnt properly the information stored, it can be easily exploited for prediction on new sets of data [28]. The ANN has been widely used for solving modelling problems and it has many advantages with respect to other techniques. In particular, its ability to exploit the nonlinearity stands out the most. In its basic embodiment, the structure of a unit neuron is shown in Figure 3. Here, X 1 , X 2 , …, X n are inputs, W 1 , W 2 , …, W n are synaptic weights and b is the bias term.
In particular, a unit neuron takes in the inputs multiplied by its corresponding weights, sums that up, and then adds a bias term to scale the weighted sum. The weighted sum could have the value in the range (-∞, ∞). To force this to a specified range, it goes through a linear or non-linear activation function. The output y can be calculated using (1), which describes the learning equation of one neuron. Each neuron uses the same learning equation to learn the weights and biases. Now, each neuron can be defined by the same equation with additional information of layer number and its position in the same layer. The ANN is formed by interconnection of different neurons stacked on top of each other.

| Temperature-dependent model using ANN
The implemented multilayer perceptron (MLP) ANN architecture shown in Figure 4 consists of an input layer, intermediate layers that are often known as hidden layers and output layers [28]. The model topology (number of layers and neurons) is directly related to non-linearity associated with the considered problems. The number of the hidden layers and their sizes cannot be set a priori, so they are determined during training of the model and it depends on the degree of nonlinearity. The learning equation for each ANN model can be formed by vectorization of individual neurons. It is formulated in (2) for the considered topology. The architecture used consists of four input nodes, two hidden layers with five neurons in each layer, and an output layer for each S-parameter. The model is then used to simulate the voltage, temperature and frequency dependence of the real or imaginary parts of S-parameter. The complete model consists of eight ANN models to simulate the four complex S-parameters as shown in Figure 5. Here, V GS , V DS , T and f are extrinsic gate voltage, extrinsic drain voltage, ambient temperature, and frequency, respectively. w 1j , w 2j , w 3j and w 4j are input weights, w kj is the intermediate weights (between two hidden layers). Furthermore, w 1 bj , w 2 kb and w 3 b are the input-layer, hidden-layer and output-layer biases, respectively. The tanh is the non-linear activation function that is used to build the model. The weights and biases of the model are later optimised using PSO algorithm to find the optimal values.
One of the most important parts before training the model is feature scaling also known as pre-processing of the dataset. If there is significant difference in the range of the feature vectors, the contour of the cost function will have very skewed elliptical shapes (very tall and skinny elliptical shapes). If the training algorithm is run on this type of skinny contour, then the gradients will take very long time to reach the global optimum and sometimes they could easily get stuck into a local optimum. The model is built based on the S-parameters measurements under the listed bias conditions in Table 1 and over the frequency range of 0.1-20 GHz. The same listed measurements at ambient temperatures of 25, 75 and 150 C are used to train and test the ANN models. To check the model's generalisation capability, the model is validated using two independent datasets (which are not used for building the model). The first dataset consists of the same bias conditions and frequencies at T = 50 C and T = 125 C, while the other dataset is at T = 175 C. The proposed work makes use of the concept of normalisation. One of the main reasons why we used real and imaginary parts of S-parameters as opposed to the magnitude and phase of S-parameters is the distribution of data points. Except for S 21 , all other parameters are by default within the range of [−1, 1] in the case of real and imaginary parts of S-parameters.
In principle, the training algorithm initialises the random sets of weights and biases, updates them, and reassigns them again after each epoch until the algorithm converges. The cost function used for this problem is given in (3), where y i is the measured S-parameters andỹ i is the simulated S-parameters. The Levenburg-Marquardt (LM) BP algorithm is used to train and test the model [29], whereas the Nyugen-Widrow method [30] is adopted for initialisation of weights and biases. This initialisation method significantly expedited the training and almost all the neurons are utilised in the input space. Keeping the nature of the LM-BP as a local optimisation technique in context, the ANN

F I G U R E 6
Flow chart of proposed PSO-ANN-based model model has been trained multiple times until satisfactory results are obtained. Those outcomes resulting in minimum error computed using (3), are taken into consideration.

| ANN-based improved model
The ANN-based model suffers from the convergence issues at local minima. The BP algorithm utilises random initial weights to update and train the model, but this method sometimes produces over fitted results and this in turn does not perform well for test sets. So, in principle, the model needs to be trained multiple times until it gives satisfactory results. This needs more effort and may not be practical for strong nonlinear models [15]. Here, this major drawback of ANN model is addressed by utilising the global optimisation method of PSO. The PSO is a multiple-initial-guess based technique and therefore reduces the chance of convergence at local minimums. This attribute of PSO is very vital to improve the effectiveness of the model. The model is developed utilising the same conditions listed in Table 1, over a frequency range of 0.1−20 GHz. The flow diagram of the proposed performance enhanced ANN model is given in Figure 6. In this technique, the PSO starts by creating a set of initial particles. For each particle in the swarm, it evaluates the fitness function given in (3), set the local best and global best positions. Then it calculates and updates the velocity of the particle based on the current velocity, particle individual best position and global best. Having updated velocity, it again updates the particle position and goes forward for the next particle and repeats the same steps until termination criteria is reached. The proposed procedure is illustrated by the flow chart in Figure 6 and it can be briefly described as follows: � First the training set is inserted with inputs and corresponding S-parameter. The optimisation algorithm starts by creating a population of particles (100 particles). The weights/biases (61) are initialised using symmetric random weight initiation function. For each variable, the lower and upper bounds are set to −1 and 1, respectively. � The objective function is evaluated. The objective is to minimise the error between predicted values and the measured values. Utilising the fitness value, the algorithm sets the local best position and the global best position of the corresponding particle. � Once the local best and the global best positions are known, the next step is to update the velocity using the memory values of velocity, current best position and global best position. The PSO uses (4) and (5) to update the velocity and position, respectively.
where r 1 and r 2 are generated randomly between 0 and 1. v tþ1 is the new velocity and v t is the previous velocity. Similarly, x tþ1 is the new position and x t is the previous position.
JARNDAL ET AL. -941 The p best and g best are the local best and global best positions, respectively. The terms w, c 1 and c 2 are the inertia weight factor, self-confidence factor and swarm confidence factor, respectively. The inertia factor is calculated using (6).
Here, w max and w min are the maximum and minimum range for the inertia factor values, which can greatly influence the updating process of new velocity based on other parameters.
� The new positions are again passed through the same fitness function and then all the steps are repeated until the termination criteria is reached. To build the ANN model with PSO initialisation, the algorithm runs for 500 iterations. � After successful completion of all the above steps, the algorithm renders optimal set of weights/biases. These optimal values are later utilised as initial values for BP algorithm. � The initial optimal weights overwrite the BP algorithm.
Then the ANN model is trained on these weights to give the best results.

| Theoretical preliminaries
The SVRs are mainly characterised by the use of kernels, absence of local minima, support vectors and solution determination by use of feature space [31]. The effectiveness of SVR depends mainly on the parameters such as type of kernel function, kernel parameters, support vectors decision boundary and the imposed penalty factor that is box constraint (C ). The kernel trick maps the low-dimensional dataset into highdimensional for the algorithm to learn the non-linear behaviour of parameters such as S 22 . In principle, the SVR algorithm tries to find the margin or regression equation, which is maximally deviated by ε from the true values. Suppose the training example is described as {x i , y i }, where i varies from 1 to n, x i ∈ ℝ d and y i ∈ ℝ, then the fitting function is given by (7). Here, f ðxÞ denotes the predicted value based on the optimal sets of weights and biases, and ω is weighing vector and b is the bias term.
The input vectors x i can be mapped into higherdimensional space, φðx i Þ, where φðx i Þ serves as a kernel trick to facilitate the easier solution of the non-linear problems by mapping them into higher dimensional space and performing the simple dot products. Equation (7) is referred to as the learning equation for the SVR, which modifies according to (8). The model uses ε-insensitive loss function, which means the model will not care as long as error ranges between − ε to ε; which should be maximised to find the better generalisation. The loss function calculates the distance between measured y i and the ε boundary using (9). The parameter C controls the weighing between the twin goals of making the jjω 2 jj small (to make the margin large) and ensuring that examples have functional margin of at least 1. To address the training examples that lie beyond the ε-insensitive zone, slack variables ξ i and ξ i * at each point can be introduced as the soft-margin regression. Therefore, error equation can be modified to (10) [32] subject to the constraints in (11). This primal optimisation problem is solved by exploiting the Lagrange multipliers reproduced in (12) [32]. ðω; where α m , α m * , η m and η m * are non-negative Lagrangian multipliers. Putting the partial derivatives of (12) with  (14)- (15) [32]. The optimal solution can then be found using sequential minimal optimization (SMO) algorithm. So, finally the predicted value can be expressed by (16).  -943

F I G U R E 8 Flow chart of proposed particle swarm optimization based support vector regression model
So that:

-
where x j is the support vector, n is the number of support vectors, the terms α j * and α j in bracket are weight coefficients of support vectors, b is the bias and Kð⋅Þ is the kernel function.

| Temperature-dependent modelling using SVR
The SVR model is built for the measured S-parameters (real and imaginary parts), as can be seen in Figure 7, under the listed bias conditions in Table 1 and over the frequency range of 0.1-20 GHz. Once again, the same listed measurements at ambient temperatures of 25, 75, 100, and 150 C are used to train and test the model. To check the model's generalisation capability, the model is validated using two independent datasets (which are not used for building the model). The first dataset consists of the same bias conditions and frequencies at T = 50 C and T = 125 C, while the other dataset is at T = 175 C. The training data is standardised/normalised with respect to mean and standard deviation. The model utilises Gaussian kernel function and the SMO solver in this case. The kernels convert the data into high dimension, which is typically more representative. However, the high dimension implies higher computational cost and needs more training data. The process in repeated many times until we get the optimal results. The flow chart of the proposed model is summarised in Figure 8. The same PSO parameters are chosen as discussed in Section 3. All steps are the same, except here only three parameters are used with defined upper and lower boundaries. These optimised particle values obtained using PSO are then utilised to get the optimal model.

| Evaluation of simple model
The proposed ANN model is implemented using MATLAB. Subsequently, the mean squared error (MSE) is calculated using (3) for the training/validation and testing set (interpolation and extrapolation sets). The minimum errors using LM-BP algorithm for the two sets for randomly selected bias and temperature values are listed in Table 2. Figures 9 and 10 compare the modelled and measured S-parameters, randomly selected from training/validation and testing sets, at two extreme ambient temperatures under distinct bias conditions. An excellent agreement between the measurements and simulation is obtained over all considered bias voltages and temperatures. To reiterate, the prediction capability of the model has been evaluated by independent measured data sets than the one used for model development (see Table 3). It can thus be inferred from the validation, the developed ANN model provides a very good accuracy for both interpolation and extrapolation cases.

| Evaluation of improved ANN model
The improved model makes use of the PSO for initialisation of optimal weights and biases corresponding to global minimum. Once again, the same conditions are taken to check the model's ability and robustness over entire frequency range. It can be inferred from the results in Tables 4 and 5, the MSE for randomly selected ambient temperature and bias conditions for ANN model with PSO initialisation, is significantly superior as compared to the accuracy achieved by simple ANN model. This enhanced accuracy can be attributed to the improved initial values. A visual depiction in Figures 11-12 for the extreme ambient temperatures shows an excellent consonance between the modelled and measured S-parameters. This demonstrates the effectiveness of the PSO in the ANN-based model development process. It is thus safe to convey that the use of PSO is very     Figure 13 shows also the model simulation at a fixed active bias condition and different ambient temperature of 25, 100 and 175 C. This also validates the extrapolation capability of the model.

| Evaluation of simple SVR model
The proposed SVR model is implemented using MATLAB. The predicted MSE is calculated for the training/validation and testing set (interpolation and extrapolation sets) using LM algorithm. The MSEs for these scenarios, for randomly selected temperatures and bias conditions, are given in Tables 6  and 7. Once again, modelled and measured S-parameters at different bias conditions and extreme ambient temperatures are compared in Figures 14 and 15. An excellent agreement between the measured and modelled values for both the imaginary and real parts of S-parameters exist and this proves the accuracy, effectiveness, and robustness of the developed

| Evaluation of improved SVR model
The parameter tuning plays an important role to improve the performance. The improved model utilises the PSO to tune the box constraint, Epsilon, and sigma values for the Gaussian kernel. The final optimised values for these parameters are listed in Table 8. Subsequently, the predicted MSE is calculated for the training/validation, and testing set (interpolation and extrapolation sets) using LM algorithm. The MSEs for some randomly selected bias and temperature conditions are given in Tables 9  and 10. The corresponding Figures 16 and 17 compare the modelled and measured S-parameters at randomly selected bias conditions and extreme ambient temperatures. It can be observed from results that an excellent agreement between the modelled and measured values exist for the considered bias voltages and temperatures. It is also identified that the careful selection of the hyper parameters by PSO significantly improves the SVR-based model's accuracy which can be inferred from all the achieved MSE. It can also be seen that the extrapolation capability of the SVR model is also improved by optimising the model parameters using the PSO. Figure 18 shows also the model simulation at a fixed active bias condition and different ambient temperature of 25,100 and 175 C. This apparently validates the extrapolation capability of the model.
Overall, it can be inferred that the training and interpolation capability of both ANN and SVR is almost the same but ANN has shown better extrapolation capability as opposed to SVR for this particular problem. In brief, it can be said that the ANN achieves better prediction capability for this problem. It is well known that the performance of SVR decreases with the increase in the size of the training data as it requires more memory to store the kernel and gram matrix. It is also imperative to note that the training time for SVR is more than F I G U R E 1 8 Modelled and measured S-parameters at V GS = −1.5 V, V DS = 48 V, and T = 25, 100, and 175°C using particle swarm optimization-support vector regression model the training time for ANN. On the other hand, the ANN model needs to be trained many times until satisfactory results are obtained. This can be attributed to the initial guess dependency of the LM-BP and its higher chance of stacking in a local minimum. However, the introduction of PSO in the ANN model development effectively solves the initial guess dependency of LM-BP. On the contrary, the SVR does not face such a problem. Actually, the SVR-based model's performance can be greatly improved by careful tuning and optimisation of its hyper-parameters using PSO. The optimisation of hyperparameter tuning substantially improves the overall performance and the prediction capability of the SVR-based model. Finally, it is safe to convey that the ANN model with initialisation using PSO performs the best when compared to the other models proposed in this paper owing to less training time and improved accuracy.

| CONCLUSION
Here, the development of temperature-dependent ANN-and SVR-based models for GaN-on-Si HEMT has been reported. The models have been trained for large set of operating conditions and later tested on two independent sets. First, the simple topology of ANN has been used efficiently to model the device and then tested using two independent sets to validate its interpolation and extrapolation capability. Subsequently, SVR has been used to develop alternative modelling technique for the same device. It makes use of SMO solver to train the model for speedy model development process. It was identified that the SVR-based technique is more robust and training need not be repeated in contrast to the ANN-based technique. It is also worth noting that the SVR takes more training time than the ANN in the case of large dataset. This could be attributed to its extra computation operations due to requirement of more memory space to store the kernel matrix. Most importantly, the extrapolation capability of SVR model is inferior. In addition, the ANN model shows very good prediction capability for the in and out of range data. The use of PSO algorithm in the model development process improves the performance of both the ANN-and SVR-based model. In summary, the ANN model with PSO-based initialisation outperforms other models in term of prediction capability, training time, and accuracy.