Neural Compact Modeling Framework for Flexible Model Parameter Selection with High Accuracy and Fast SPICE Simulation

Neural compact models are proposed to simplify device‐modeling processes without requiring domain expertise. However, the existing models have certain limitations. Specifically, some models are not parameterized, while others compromise accuracy and speed, which limits their usefulness in multi‐device applications and reduces the quality of circuit simulations. To address these drawbacks, a neural compact modeling framework with a flexible selection of technology‐based model parameters using a two‐stage neural network (NN) architecture is proposed. The proposed neural compact model comprises two NN components: one utilizes model parameters to program the other, which can then describe the current–voltage (I–V) characteristics of the device. Unlike previous neural compact models, this two‐stage network structure enables high accuracy and fast simulation program with integrated circuit emphasis (SPICE) simulation without any trade‐off. The I–V characteristics of 1000 amorphous indium–gallium–zinc‐oxide thin‐film transistor devices with different properties obtained through fully calibrated technology computer‐aided design simulations are utilized to train and test the model and a highly precise neural compact model with an average IDS error of 0.27% and R2 DC characteristic values above 0.995 is acquired. Moreover, the proposed framework outperforms the previous neural compact modeling methods in terms of SPICE simulation speed, training speed, and accuracy.


Introduction
[3][4][5] An essential component for connecting the two processes is a compact model that enables the circuit simulation of a device by capturing its behavior.Commonly used industrystandard compact models, such as Berkeley short-channel IGFET model (BSIM), [6,7] employ physical equations and parameters to reproduce device operations.0][11] Therefore, circuit designers cannot provide effective feedback on the device technology because it is not well reflected in the models.Moreover, extracting physical model parameters from devices is time consuming and requires domain expertise for each model type, delaying device, and circuit development. [12]s an alternative to physics-based models, neural compact models that utilize neural networks (NNs) to reproduce device behaviors without requiring domain expertise have been proposed. [13]However, most research on neural compact models has focused on modeling a single device without parameters to adjust model behavior, [14] limiting the applicability of these methods to DTCO because they cannot capture the variations in different devices under different conditions.Previous studies have suggested neural compact models that can describe multiple devices using model parameters, [15] but they require a trade-off between simulation program with integrated circuit emphasis (SPICE) simulation speed and accuracy, considering that both metrics depend on network size. [12]herefore, we introduce a novel framework for neural compact modeling that allows a flexible selection of model parameters related to device technology while achieving both high speed and high accuracy.The neural compact model of the framework comprises two NN components, each playing a distinct role.The main network predicts the drain current (I DS ) based on the gate-source voltage (V GS ) and drain-source voltage (V DS ), DOI: 10.1002/aisy.202300435 Neural compact models are proposed to simplify device-modeling processes without requiring domain expertise.However, the existing models have certain limitations.Specifically, some models are not parameterized, while others compromise accuracy and speed, which limits their usefulness in multi-device applications and reduces the quality of circuit simulations.To address these drawbacks, a neural compact modeling framework with a flexible selection of technology-based model parameters using a two-stage neural network (NN) architecture is proposed.The proposed neural compact model comprises two NN components: one utilizes model parameters to program the other, which can then describe the current-voltage (I-V ) characteristics of the device.Unlike previous neural compact models, this two-stage network structure enables high accuracy and fast simulation program with integrated circuit emphasis (SPICE) simulation without any trade-off.The I-V characteristics of 1000 amorphous indium-gallium-zinc-oxide thin-film transistor devices with different properties obtained through fully calibrated technology computer-aided design simulations are utilized to train and test the model and a highly precise neural compact model with an average I DS error of 0.27% and R 2 DC characteristic values above 0.995 is acquired.Moreover, the proposed framework outperforms the previous neural compact modeling methods in terms of SPICE simulation speed, training speed, and accuracy.
whereas the parameter generation network (PGN) adjusts the main network according to the model parameter values.This two-stage network structure allows for high accuracy and high SPICE simulation speed without sacrificing one for the other.We applied this framework to model the current-voltage (I-V ) characteristics of multiple amorphous indium-galliumzinc-oxide (a-IGZO) thin-film transistor (TFT) devices, with the results obtained accurately matching the original technology computer-aided design (TCAD) simulation data for the DC characteristics and drain current value.Moreover, compared to conventional methods, the proposed framework achieved superior accuracy, SPICE simulation speed, and training speed.

Neural Compact Modeling Framework 2.1. Dataset Generation Using TCAD Simulation
To test the neural compact modeling framework, we used a top-gate coplanar a-IGZO TFT, with the device simulated using Silvaco TCAD. Figure 1a,b shows the device structure and subgap density of states (DOS) characteristics of a-IGZO.The device has an a-IGZO channel on top of a SiO 2 buffer oxide layer, with a SiO 2 gate oxide and a molybdenum (Mo) gate electrode, and has a gate length and width of 8 and 5 μm, respectively.We used previously published findings [16] to establish the sub-gap DOS model for a-IGZO.The simulated device was calibrated to the I-V characteristics provided by the industry, with the results shown in Figure 2.
The training and test datasets for the neural compact model were generated using TCAD simulation.First, the device properties that affect the I-V characteristics were selected as model parameters to determine the gate metal work function, device structure (L g , T ox , T IGZO ), a-IGZO doping concentration (N ch , N s/d ), and a-IGZO sub-gap DOS (N ta , W ta , E gd ).Next, we randomly generated 1000 model parameter sets using a uniform distribution.Table 1 lists the boundaries of the uniform distribution of each model parameter; these boundaries indicate the area of device properties that the model aims to describe.The range of model parameters was constrained so that the electrical characteristics of the simulated devices do not deviate too much from that of the calibrated device, to prevent the learning range of the NN from becoming excessively wide.Finally, we simulated each device corresponding to the model parameter sets to collect the I-V data, sweeping V GS from À2 to 5 V and the V DS from 0 to 5 V.In Figure 3, the transfer curves of the 1000 simulated devices are shown along with the curve of the calibrated device, which was chosen as the baseline device.The neural compact model was trained and tested using the generated dataset and the I-V data from the baseline device.

Neural Network Structure and Training Algorithm
The NN structure of the proposed model, as shown in Figure 4a, comprises two components: the PGN and the main network.Furthermore, the main network consists of two networks connected in series: a bias modification network (BMN) and a current prediction network (CPN).The PGN, a key feature that differentiates this model from conventional models (see Figure 4b), takes a model parameter set as the input and uses it to modify or create the network parameters (e.g., weight and offset) of the main network.PGN is composed of two parts: PGN1 and PGN2.PGN1 generates the network parameters of BMN, while PGN2 generates the network parameters of CPN.The outputs of PGN are the same shaped arrays as the network parameters of BMN and CPN, respectively.These arrays are then used as the network parameters of BMN and CPN.Since PGN  uses the model parameter as input to generate network parameters, the main network can match the I-V characteristic of each device.Once the network parameters are generated and put into the main network, as shown in Figure 4c, the PGN is no longer needed for I DS value inference and can be removed to reduce the computational cost, as shown in Figure 4d.The PGN structure of the proposed model allows the NN to receive the TCAD parameters and bias input in separate stages, which improves the model's accuracy and simulation speed compared to those of conventional methods.The BMN takes the initial bias (V GS , V DS ) as the input and output bias modification values (ΔV GS , ΔV DS ).The modified bias (V GS *, V DS *), obtained by adding the bias modification value to the initial bias, is used as input to the CPN, which maps the bias to the corresponding I DS values.The CPN takes the modified bias as input and outputs the predicted I DS value.We use the same conversion function (1) as in ref. [12], where x is the input value for the conversion function.The conversion function limits the output range of the NN and fixes I DS to 0 when V DS is 0.
CPN and BMN form the main network and work together to improve the accuracy in matching the I-V characteristic of a device.CPN adjusts the overall shape of the I-V curve, while BMN adjusts the fine-grained details in input bias.BMN supports CPN by providing additional flexibility in input bias.
The main network components and PGN utilize a multilayer perceptron (MLP) network structure.The PGN has a larger network size than the main network and consists of multiple MLPs to generate network parameters for each main network layer.Specifically, the BMN has two hidden layers with four nodes each, while the CPN has two hidden layers with six nodes each.PGN1 and PGN2 are each composed of three MLPs with two   hidden layers and 30 nodes per layer.The activation functions used for the BMN and CPN, which comprise the main network, are the Sigmoid function, and the activation functions used for the PGN are the ReLU function.
Figure 5 shows the training algorithm of the proposed framework, which consists of two steps: extraction of the base parameters and training of the PGN.We used the PyTorch framework and Adam optimization for both steps to train the model.In the first step, the CPN is trained using the I-V data of baseline device, which is the calibrated device.After the CPN is trained, its network parameters describe the I-V characteristic of the baseline device.These parameters are then extracted and used as base parameters.In the second step, PGN communicates with the main network (BMN and CPN) in both directions, and the network parameters of PGN are updated.When PGN1 and PGN2 generate the network parameters and are "connected" to BMN and CPN, which means that they pass the network parameters to the main network, the main network predicts the I DS value from the V GS and V DS input using the provided network parameters, and calculates the loss by comparing the predicted value to the ground truth value.This loss is backpropagated to PGN through the main network, and the network parameters of PGN are updated so that PGN can generate the network parameters of the main network more accurately.In the second step, the main network (BMN and CPN) is connected to the PGN in the following configuration: the extracted base parameters are adjusted by the output of PGN2 to form the network parameters for the CPN.The output of PGN1 is utilized as the network parameters for the BMN.Finally, the connected network is trained in this configuration.The data of the aforementioned 1000 parameter sets were divided evenly for training and testing, resulting in training and testing datasets representing 500 devices each.
We combine the losses of I DS , log 10 I DS , g m , and g DS using hyperparameters, as shown in Equation ( 2), to construct the loss function for training.Each component of the loss function was calculated using the symmetric mean absolute percentage error (SMAPE) (Equation ( 3)), a measure of accuracy based on the percentage of errors.We used SMAPE loss because it is a ratio-based error metric that can accurately model not only the high I DS values in the strong inversion region, but also the low I DS values in the subthreshold region.
To balance the loss terms, we iteratively trained the model and adjusted the hyperparameters a-d.We lowered the weight of a loss term if it was large compared to the other loss terms to prevent the learning process from focusing only on reducing that loss term.By adjusting hyperparameters, we prevented any individual loss components from becoming overly dominant or isolated in the total loss.The loss function was then utilized for the two training steps illustrated in Figure 5.
During the training process, the training loss is backpropagated through the main network to the PGN because the main network parameters are completely determined by the PGN.As a result, the network parameters of the PGN are updated based on the propagated loss, as illustrated in Figure 4e.

Accuracy of the Proposed Neural Compact Model
To validate the accuracy of the proposed neural compact model, we compared DC characteristics, including threshold voltage (V t ), subthreshold swing (SS), on current (I on ), and off current (I off ), obtained from the neural compact model and TCAD simulation data; a comparison of the results obtained is provided in Figure 6.
The histograms confirm that the DC characteristics of the model closely track those of the actual data.Furthermore, both the scatter plot and its contour plot exhibited concurrence in the distributions, indicating that the model's results align well with the TCAD simulation data in the correlation and density of DC characteristics.Figure 7 displays the R 2 plots and error histograms for the DC characteristics of the neural compact model  and TCAD simulation data.R 2 values exceeding 0.995 were obtained from the model, and the error histograms exhibit narrow distributions and low mean absolute error (MAE) values.For example, V t has a low MAE of only 1.83 mV.These results suggest that the model can achieve high precision and is closely aligned with the actual DC characteristic data.Table 2 lists the mean absolute percentage errors (MAPEs) of the I-V characteristics predicted by the neural compact model on the test dataset.The model accurately predicted I DS , with an error of only 0.27% on the linear scale and a 0.035% error on the log scale, while also capturing the derivatives of the I DS curve, g m , and g DS with errors of less than 1%.
Figure 8 compares the I-V characteristics from the neural compact model to the TCAD simulation data of a sample device.The dotted lines represent the CPN model with the base parameters, which reproduces the baseline device behavior.During training, the PGN and BMN modify this CPN model's network parameters and bias to align with the TCAD data, resulting in the trained model, represented by the solid lines, matching well with the TCAD data.Figure 8a shows the accurate matching of the transfer curve in both the subthreshold and strong inversion regions because the loss function includes the log and linear losses of I DS , which trains the PGN to fit the curve in both regions.The g m curve shown in Figure 8b is smooth and consistent with the TCAD simulation data, achieved by incorporating the g m loss component in the loss function, in addition to the I DS loss.Figure 8c,d shows the predicted I DS and g DS versus V DS curves of the same device sample; these curves also match the TCAD simulation data well.

Effects of BMN and CPN on Model Performance and SPICE Simulation Speed
This section examines the influence of BMN and CPN on model performance and SPICE simulation speed.First, to evaluate the effect of the BMN on the accuracy, we compared the MAE of the DC characteristics between the models trained with and without the BMN.As shown in Figure 9, the model with BMN effectively lowered the errors for all DC characteristics, with the largest improvements observed in SS and I off (52% and 19%, respectively).
We then compared the SPICE simulation speeds of the neural compact model with and without the BMN.We implemented    each model in Verilog-A [17] and ran SPICE simulations to predict the I DS value one million times for each of these two cases on a PC with an Intel Core i7-9700 CPU (3 GHz).The model with the BMN showed a 12% longer simulation time than the model without it, taking 31 and 28 s, respectively.This result was expected because a BMN requires additional NN layers.Even though the simulation time increases when using BMN, the difference can be considered insignificant because the proposed method, whether with or without BMN, consumes a relatively small amount of time compared to simulations using conventional NN based model, as will be discussed in Section 3.3.Consequently, we conclude that incorporating the BMN enhances the model's accuracy with only a minor compromise in simulation speed.
Next, we analyzed the BMN and CPN functions to match the transfer curve with the actual data.Figure 10 shows the transfer curves of three types of models (Base CPN, PGN-Base, and PGN-PGN) and the TCAD data.We first compared Base CPN to PGN-Base to determine the effects of BMN on the transfer curve.The output of the BMN, ΔV GS , shifted the Base CPN curve by 0.19 V in the subthreshold region, resulting in the PGN-Base curve, which fits well with the TCAD data in the log scale.In addition, BMN achieved a connection between the subthreshold and strong inversion regions to align with the TCAD data by reducing the value of ΔV GS in the transition region.Therefore, the BMN mainly matches the transfer curve in the subthreshold and transition regions.Next, to determine the effects of CPN on the transfer curve, we compared PGN-Base to PGN-PGN.As shown in Figure 10, PGN-Base has some mismatch with the TCAD data, especially in the strong inversion region and subthreshold slope.In contrast, PGN-PGN, using the CPN parameters from PGN2, matches the TCAD data in all regions, implying that the PGN controls the CPN to adjust the curve in the strong inversion region and fine-tune the subthreshold slope.Consequently, the PGN programs the BMN and CPN to capture the characteristics of the transfer curve in different regions and achieve good agreement with the TCAD data.
To further investigate the roles of the BMN and CPN, we conducted another experiment using the PGN-PGN model without the BMN and used the model to extract DC characteristics, as shown in Figure 11.The I on distribution agrees well with the TCAD data, even without the BMN.In contrast, V t , I off , and SS deviate from the TCAD data, and V t has a particularly narrow spread compared with the TCAD data distribution, indicating that the CPN mainly matches the strong inversion region, whereas the BMN primarily matches V t by shifting the subthreshold region.This result also suggests that the CPN and BMN jointly improve the matching of SS and I off by fine-tuning the SS and V GS modifications.

Comparison with the Conventional Method
This section compares the accuracy, simulation speed, and training speed of the proposed and conventional methods for neural compact modeling.
Figure 12 shows the training losses of the proposed and conventional methods.Compared to the conventional method, the proposed method achieves faster loss convergence and lower loss as it uses the base parameters from the baseline device as a reference for the I-V characteristics.The proposed model separates the processing stages for the model parameters and the bias, ensuring that the base parameters can be utilized.As a result, the PGN only needs to make small corrections to adjust the base   parameters, which improves the training efficiency.In contrast, the conventional method must train the model from scratch without this reference as it simultaneously inputs both the model parameters and the bias to the network.
Figure 13 compares the DC characteristic errors and SPICE simulation times of the proposed and conventional methods.The conventional method (Figure 4b) was implemented using different hidden layer sizes and numbers.The models were trained for the same duration with the optimal learning rate, and we used the same SPICE simulation test shown in Figure 9 to measure the simulation times.Figure 13 shows that the (50, 50) network has smaller errors but a longer simulation time than the (20, 20) network, demonstrating the speed-accuracy trade-off in conventional methods, where larger and deeper networks improve accuracy but decrease simulation speed.In contrast, the proposed method achieved both the highest speed and the lowest error, thus avoiding this trade-off because the model has a constant and short simulation time regardless of the accuracy.Specifically, using a PGN to manage time-consuming operations that determine the accuracy together with a smallsized main network reduces the simulation time.As a result, the accuracy of the model can be optimized independently of the simulation time because the PGN can be detached from the main network.

Conclusion
In this article, we propose a neural compact modeling framework that can employ model parameters related to device technology based on a two-stage network structure.The framework was applied to model the I-V characteristics of a-IGZO TFT devices.A comparison of the trained model with the original TCAD data showed that our model exhibited high R 2 scores for DC characteristics, exceeding 0.995, with only a 0.27% MAPE for the I DS value.Furthermore, a comparative study involving the conventional methods demonstrated that the proposed framework outperforms them in terms of accuracy, SPICE simulation speed, and training speed.The proposed framework has the potential to be extended to model other aspects of device operation, such as AC and radio frequency (RF) characteristics, and can be applied to emerging devices with device-specific model parameters.

Figure 1 .
Figure 1.a) The structure of a-IGZO TFT structure simulated with TCAD.b) Calibrated a-IGZO sub-gap DOS.

Figure 2 .
Figure 2. I DS -V GS calibration results of a-IGZO TFT provided by LG display.

Figure 3 .
Figure 3. Transfer curves of 1000 simulated devices.Each red line corresponds to one of the simulated devices.

Figure 4 .
Figure 4. a) NN structure of the proposed model.b) NN structure in previous studies.[12,15]c) Main network parameter generation using PGN.d) PGN removal after network parameter generation.e) Loss backpropagation during training.

Figure 5 .
Figure 5. Training algorithm of the proposed framework.

Figure 6 .
Figure 6.Comparison of DC characteristics between the proposed neural compact model and TCAD simulation data.Test dataset of 500 devices was used.Red markers and lines correspond to the NN prediction result, and black markers and lines correspond to TCAD simulation data.

Figure 7 .
Figure 7. R 2 plots and error histograms of DC characteristics.MAE was calculated using the test dataset.

Figure 8 .
Figure 8. Current-voltage (I-V ) characteristics of the trained model (solid lines) and CPN with base parameters (dotted lines) versus TCAD data of a sample device (symbols): a) transfer curve, b) g m curve, c) output curve, and d) g DS curve.

Figure 9 .
Figure 9.Comparison of DC characteristics error and simulation time between models trained with and without BMN.Left side of the figure illustrates the structure of each model.

Figure 10 .
Figure 10.Transfer curves of three types of models and V GS modification curve.V DS is 1 V for all curves.Left side of the figure shows the network structure of the models.

Figure 11 .
Figure 11.Comparison of DC characteristics between the PGN-PGN model without BMN and TCAD simulation data.Test dataset of 500 devices was used.

Figure 12 .
Figure 12.Training speed comparison of proposed method and conventional methods with different hidden layer configurations.The size and number of hidden layers for each conventional method implementation are indicated in parentheses.

Table 1 .
Model parameter types and range of uniform distributions for data generation.

Table 2 .
Symmetric mean absolute percentage errors of I-V characteristics.