Recycled integrated circuit detection using reliability analysis and machine learning algorithms

Udaya Shankar S., Department of Electronics & Communication Engineering, PSG College of Technology, Coimbatore, India. Email: rajkrish18392@gmail.com Abstract The use of counterfeit integrated circuits (ICs) in electronic products decreases its quality and lifetime. Recycled ICs can be detected by the method of aging analysis. Aging is carried out through reliability analysis with the effect of hot carrier injection and bias temperature instability (BTI). In this work, three machine learning methods, namely K‐ means clustering, back propagation neural network (BPNN) and support vector machines (SVMs), are used to detect the recycled IC aged for a shorter period (1 day) with minimum data size. This work also distinguishes the effects of degradation due to process variations and reliability effects. The reliability and Monte Carlo simulation are performed on benchmark circuits such as c17, s27, b02 and fully differential folded‐cascode amplifier using the Cadence Virtuoso tool, and the parameters such as minimum voltage, delay value, supply current, gain, phase margin and bandwidth are measured. Machine learning methods are developed using MATLAB to train and classify the parameters. From the results obtained, it is observed that the classification rate for the benchmark circuits is 100%, and using BPNN, K‐means clustering and SVM and the proposed method, recycled IC or used IC is detected even if it was used for 1 day.


| INTRODUCTION
Counterfeit integrated circuits (ICs) are a major problem in the electronic supply chain because of its reliability and security issues. This affects many devices used in the various application fields such as biomedical, telecommunications, automotive, consumer electronics and defence. Counterfeit parts are of various forms such as recycled, overproduced, remarked, cloned, out-of-spec and defective. The recycled ICs degrade the device performance due to aging. In addition, the reliability effects also further degrade the performance of the devices over a period of time.
Degradation in the performance of the IC is impacted by a variety of working conditions such as bias voltage, temperature and workload. There are various reliability issues such as bias temperature instability (BTI), hot carrier injection (HCI), electromigration and time-dependent dielectric breakdown (TDDB) in ICs. These conditions change over a period of time due to the aging of devices. Several methods have been discussed in the literature to detect a recycled IC based on different operating conditions and parameters.
Parameters are measured from the ICs provided by the trusted and untrustworthy vendors [12]. One-class SVM [12,13] is used to classify the brand new and the recycled IC, based on the decision function. The degradation due to the effect of NBTI in devices at different time periods is used to train the SVM to detect the counterfeit IC. This method is not suited to detect the recycled devices for a short period of aging.
Early failure rate analysis is used to collect parameters from fresh and recycled devices [16]. One-class SVM classifier (OCC) and degradation curve sensitivity analysis (DCSA) are used to classify the parameters. OCC has an impact on process parametric variations. If the process variation is large, then the recycled IC will fall within the boundary space. Stress has to be applied in the IC for a long time to make it go beyond the boundary. The parametric measurement with a similar degradation rate of fresh and recycled IC is selected in the DCSA in order to get a better detection of recycled ICs. This ML classification is verified with industrial benchmark circuits and an analog folded-cascode amplifier circuit. The accuracy of classification using OCC in [16] is affected by the impact of process variations in the parameters or features, while in DCSA, the parameters are independent of process variations, but DCSA could detect the recycled ICs aged more than 1 month.
BPNN and Elman NN [27] are used to predict the aging of CMOS low-noise amplifier (LNA) due to HCI and NBTI. The prediction model is developed based on a multistep-ahead prediction model. The parameters are obtained from the LNA in measurement and stress stages. The S parameters are used for degradation analysis. Noise figure (NF), supply voltage and third-order intercept point (IIP3) are not suitable for degradation analysis due to its very small variations due to aging. Three different prediction models such as 6-step ahead, 9-step ahead and 12-step ahead prediction models are developed for training using the NN model. Elman NNs consume more time compared to BPNN [27]. The parameters with minor variation due to aging are not preferred for prediction. This limits only certain parameters that are eligible to predict aging in the circuit.
In [24], MARS-based aging prediction model is developed in order to train the time-variant working conditional parameters. The parameters may be temperature and workload activity. The process parametric variation compensation factor is determined by testing the new manufactured device and from the corner simulation for worst and best cases. This factor compensates for the impact of aging due to HCI and NBTI on process variation for a certain period of time. The delay on the critical paths indicates the aging. Root-mean-square error (RMSE) evaluates the accuracy of the prediction model. MARS has lesser RMSE value compared to other prediction models like SVM and RNN [24].
In the proposed method, recycled IC is detected using RA with NBTI and HCI to predict the age of the device in the initial design stage, and they are classified using the ML techniques. The effect of NBTI and HCI is primarily considered in the proposed work. Three ML methods such as K-means clustering, BPNN and SVM classifier are used to predict the recycled or aged IC from the new ICs with the help of parametric measurements. ML approaches use low area overhead, low cost and low power detection method for recycled ICs. It requires only the input data samples to be trained. The parameters for the aged IC are differentiated from the fresh IC due to NBTI and HCI. The parameters measured at different time instants are used for training the ML methods. The ML methods used here provide a better classification between the fresh and aged ICs. Section 2 describes the background of reliability effects in transistors like BTI and HCI. In Section 3, ML methods like K-means clustering, BPNN, and SVM are discussed. Section 4 explains the proposed method with the simulation and data sample collection along with the classification using K-means clustering, BPNN and two-class SVM models, and Section 5 provides the performance comparison of the three used models.

| AGING EFFECTS IN TRANSISTORS
With the scaling down of CMOS technology into the nanometre, the reliability effect has become a significant issue. Issues like parameter variability, faults and soft errors make the device to be unreliable in different technology nodes. This parameter variability changes the fabricated devices to have characteristic features different from the intended designed circuit. The process variations make the transistors or gates in the designed circuit to operate with parametric variations after fabrication.
During the chip operation, the characteristics of the transistors or gates will degrade over a period of time. The variations in the parameters of the devices are due to different sources like temperature and voltage variations. The variations in temperature and voltage depend on the input workload activity, frequency and time of the operation. Hence, they cause dissimilarity on the properties of different gates of the IC at different time instants during the operational lifetime.
Wear-out effects cause transistor aging due to runtime variations. These wear-out effects increase the threshold voltage (V th ) of the transistors and the switching delays of the gates. This leads to a timing failure of the IC. In this study, aging analysis is performed through reliability issues, namely BTI and HCI.

| Bias temperature instability
BTI [28,29] shifts the value of the threshold voltage (V th ), mobility (μ) of transistors, transconductance (g m ) and linear and saturation drain currents over a period of time under DC stress. This leads to an increase in circuit delay. BTI is classified as negative and positive BTI. NBTI [30,31] [29]. The main part of BTI is hole trapping and interface trap generation.
The TD model in [29] captures the features of the stress and recovery phase physically. Based on the model, the threshold voltage (V th ) of the transistor increases logarithmically, and the overall dynamic BTI behaviour is shown in Figure 1. Suppose a single PMOS or NMOS transistor is turned on at time t ¼ 0, the stress period starts and no DC voltage stress is applied before. The increase in the threshold voltage ðΔV th Þ until stress time t st is given by The total shift in the threshold voltage ΔV th ðt st þ t rec Þ for a recovery interval of t rec after the stress phase is as where A, B and C are constants in a particular technology node, K is the fitting parameter, T is the temperature, k is Boltzmann's constant, t ox is the oxide thickness, E is activation energy, and V dds and V ddr are the supply voltages under stress and recovery.

| Hot carrier injection
When the transistor is in saturation mode, some of the carriers in the high electric field gain higher energy to overcome gate oxide and channel barrier and collide with the atoms in the pinch-off region and produce electron-hole pairs due to the impact ionization process. These carriers get injected and trapped into the gate oxide. Due to the interface traps in the gate oxide, the device characteristics such as threshold voltage and drain current are degraded. HCI [32] effect occurs in NMOS transistors. The effect of HCI in PMOS transistors is negligible. Hot carriers are formed during signal transitions. Hence the threshold voltage degradation due to HCI depends on the switching activity of the input signal [32]: where A HCI is a technology-dependent constant, f is the clock frequency, SW is the switching activity factor, t ox is the oxide thickness, V th is the threshold voltage and V GS is the gatesource voltage of the transistor, E 1 is a constant equal to 0.8 V/ nm [32] and t is the total time. The effect of HCI depends on the temperature of the device. Hence the change in the threshold voltage is modified as follows:

| MACHINE LEARNING METHODS
ML systematically applies different algorithms to synthesize the essential relationships among the data and the information. ML systems can be trained to classify the changing process conditions, in order to model variations in the operating behaviour. The different forms of ML methods are supervised, unsupervised, semi-supervised and reinforcement learning. The ML process involves various steps like input data collection, data preparation, training the data, evaluation of the results and tuning. In the proposed work, three different ML methods are used, namely K-means clustering, BPNN and SVM.

| K-means clustering
K-means clustering [33,34] is one of the most widely used clustering algorithms. It starts with the random initialization of the centroids. Each data point is allocated to the closest centroid based on a measurement of squared Euclidean distance. After the formation of K clusters, the centroids of each K cluster are updated. The centroid updates or changes the location until all the data points are assigned to the closest centroid in the K clusters. For a given dataset D ¼ {X 1 , X 2 , …, X n } consisting of n data points, after applying K-means clustering algorithm k clusters are obtained by C ¼ {C 1 , C 2 , …, C k }. The objective function or sum of squared error (SSE) for the K-means clustering is shown in the Equation (7), where C i is the centroid of cluster C k . The main goal of the K-means clustering is to form the K-clusters with the minimized SSE value [35]. The steps for the K-means clustering algorithm [36,37] are shown in Figure 2. The K-means algorithm is validated by measuring the silhouette value (SV). SVs define the connection between the data in the same and different clusters. The SV should be large enough to make the good clustering of data:  Figure 3. BPNN has two stages, namely forward pass (propagation) and a backward pass (propagation). In the forward propagation, the outputs are measured and compared with the preferred output values. Then the errors are calculated from the preferred and actual output values. In the backward propagation, the error is used to change the weights in the NN in order to minimize the size of the error. Let X i be the input given to the input layer, y j be the output of the hidden layer and Z l be the output from the output layer. Suppose w ji is the weight of the NN between the input and hidden layers and v lj is the weight of the NN between the hidden and output layers. The expected output value is t l and f(•) is the activation function. The computational formula of the model [40] is expressed as follows: In the forward propagation, the output from the hidden layer is The output of the output layer is In the back propagation, to reduce the error, the gradient descent method is adopted to control the weight value of all layers. The change in the weights w ji and v lj is given by The learning rate of the BPNN is given by η′, where ∑ l δ l v lj is the error in the hidden layer, δ j ' defines the error δ l of output, Z l is propagated back through v lj -y j to turn into an error of the hidden layer.

F I G U R E 2
Steps of K-means clustering [36,37] F I G U R E 3 BPNN model SANTHANA KRISHNAN AND PALANISAMY -23

| Support vector machine classifier
SVM [41][42][43] is a type of classifier method that classifies the data points by creating hyperplanes in a multidimensional space that divides different class labels as shown in Figure 4. A hyperplane function g(y) ¼ W T y þ b separates the two classes with margins.
For a given set of data points X i , which belongs to two separate classes ω 1 and ω 2 , the distance of any data point from the hyperplane is equivalent to gðXÞ ‖W ‖ : SVM [43] tries to determine the value of W and b, such that g(X) is equivalent to 1 for the closest data belonging to ω 1 and À 1 for the closest data belonging to ω 2 : The SVM training involves the minimization of the objective function, which is expressed in the following equation: Subject to the constraint, The objective function is augmented by adding it to the weighted sum of constraints and multiplied by the Lagrange multipliers: where W and b are the primal variables and λ i is the Lagrange multipliers.

| PROPOSED METHOD TO DETECT AGED INTEGRATED CIRCUITS
The manufactured ICs undergo process parametric variations. The parameter variations in recycled ICs need to be differentiated from process parametric variation of newly manufactured IC. In the proposed method, three classification algorithms are used to classify the aged IC from the new IC. The classification algorithms used are K-means clustering, BPNN and two-class SVM (TCSVM). The following sections deal with the overview and summary of the method of classifying the recycled and new devices using the three ML algorithms. The overall flow of the proposed work is shown in Figure 5. Figure 5 shows that two-level detection is used in the proposed method. First, the Monte Carlo (MC) simulation m (t) is performed using a fresh statistical device model, and reliability simulation is performed using degraded device models in the circuits. Then the parameter data samples are gathered from MC and reliability simulations.
The MC simulation m(t) using a statistical device model results in a response O i (t) that implies the effect of process variations on fresh IC. A reliability simulation m'(t) is performed to obtain the data response O i '(t) under reliability effects such as HCI and NBTI in the Device Under Test (DUT) with a degraded device model containing reliability parameters. Let C kl be the DUT with k inputs and l outputs. Then, F I G U R E 4 Two-class linear SVM classification of data separated by a hyperplane [43] The data samples collected are the parameter outputs from both simulations that change with respect to the aging of time. Then the collected parameters are trained and tested using the three ML algorithms like K-means clustering, BPNN and SVM.

| Simulation and data sample collection
MC simulation is carried out by varying the process parameters defined in the Process Development Kit. Reliability simulation [44] with effect of BTI and HCI is carried out for an aging time period varying from 1 day to 10 years. The fresh device is affected by the process variations.
The parameter data samples are determined by the measurements from the MC and aging simulations. The parameters measured are minimum voltage (v p and v' p ), delay value (d p and d' p ) and supply current (I p and I' p ) for ISCAS and ITC benchmark circuits. The ISCAS 85 benchmark circuit used is c17 (Six NAND Gate Circuit), ISCAS 89 benchmark circuit used is s27 and the ITC 99 benchmark circuit used is b02 (Finite state machine that recognizes Binary-coded decimal numbers). The number of primary inputs (PI), primary outputs (PO) and gates and transistors in the benchmark circuits are shown in Table 1. For benchmark from Texas Instruments, the parameters obtained are gain, phase margin, bandwidth and supply current (Iddq). All the necessary parameters are measured from the corresponding simulations. These parameters vary with respect to aging time and reliability effects. The variation in the parameters determined from the fresh and reliability simulation can be greatly classified by using the above-mentioned classifiers.
For MC simulation of circuit m{C kl (t)}, the parameters measured are v p , I p and d p .
For reliability simulation of circuit m'{C kl (t)}, the parameters measured are v' p , I' p and d' p . The various parameters measured are represented as follows:  by determining the difference between the response of optimal simulation without and with process variation. The delay values from reliability simulation are calculated by determining the difference between the responses from MC and reliability simulations.
Totally, 50 MC simulations are performed. In case of reliability simulation, the data is collected for different time periods. It means that the reliability DC stress simulations are performed for different 50 uniformly sampled aging time periods t v , where v ¼ 0.003 years (1 day), 0.04 years (15 days), 0.25, 0.5, 1, …, 10 years. Then for each simulation of t v aging time period like 0.5 years, the parameters are obtained.
As discussed in previous sections, for 50 MC simulations, single minimum voltage, single supply current and single delay value are calculated for each simulation. So, for the MC simulation, a total of 150 data sample values are measured. For the reliability simulation, a single minimum voltage, single supply current and single delay value are calculated for each aging time period simulation. So, from the reliability simulation, a total of 150 data sample values are measured.
The method of classification using output parameters from the simulation trains only a few numbers of samples. It greatly reduces the storage space and time.
An industrial benchmark fully differential folded-cascode amplifier (FDFCA) from Texas Instruments, shown in Figure 6, has been analysed by the proposed method of recycled IC detection designed in 90-nm technology. The parameters considered for the analysis of this circuit are gain, phase margin, bandwidth and supply current (Iddq). The MC and aging simulations are also performed for the amplifier circuit to determine the parameter outputs. The obtained parameters are given as input to the ML algorithms for classification. The degradation of gain, phase margin and Iddq of FDFCA for an aging of 10 years is shown in Figures 7-9.

| Classification using K-means clustering
First, the parametric samples are collected and given as input to the K-means clustering algorithm. The algorithm clusters the data samples into K clusters and then computes the SV. SV evaluates the performance of the K-means clustering. Two clusters are formed: one is for fresh and the other is for the degraded device depending on the parameter values. The difference between the parameters obtained from the MC F I G U R E 6 Schematic of Fully differential folded-cascode amplifier from Texas Instruments F I G U R E 7 Fully differential folded-cascode amplifier gain degradation curve for 10 years 26 -SANTHANA KRISHNAN AND PALANISAMY simulation and the reliability simulation is due to reliability effects, HCI and NBTI, in the devices during the reliability simulation.
K-means clustering [33,34,36] is the fastest and efficient way of clustering data. It is used in the detection of counterfeit IC. K-means clustering algorithm groups parameters obtained from the MC simulation and reliability simulation into clusters and determines the centroid values of every cluster.
The steps of the K-means clustering algorithm are shown in Figure 10. The input parameter data samples are applied to the clustering in the proposed method. The number of clusters K to be formed is supplied to the algorithm. K centroids are randomly initialized for the data.
The squared Euclidean distance is computed between the initialized K centroids and each pair of the parameter data sample. The data is assigned to a particular cluster if the Euclidean distance between the data and the centroid is lesser. Then the mean of the data and centroid in the clusters is measured. Move the centroid and then compute the distance between the new data points and centroid values, and place the data in the minimum distance cluster. Centroid stops moving until all the data points are placed in a particular cluster. The distance between a data point (A 1 , B 1 ) to the centroid (C 1 , C 2 ) in a cluster should be minimal compared to the distance between that data point (A 1 , B 1 ) to the centroid (C 3 , C 4 ) of the other cluster. Thus two clusters are formed and the data is grouped into two clusters as cluster 1 and cluster 2. Test data is provided to the cluster model to evaluate the clustering.
The clustering of parameters of the new and degraded devices using the K-means algorithm results in high accuracy. After the formation of clusters, the SVs are estimated. The SV is measured by the distance between two data in the same cluster and between two data in the different clusters. The distance between the two data in the same cluster is D i and the distance between the two data in different clusters is D j . The SV for K clusters (S K ) is calculated as follows: F I G U R E 9 Fully differential folded-cascode amplifier Iddq degradation curve for 10 years F I G U R E 8 Fully differential folded-cascode amplifier phase degradation curve for 10 years SANTHANA KRISHNAN AND PALANISAMY -27 The accuracy of K-means clustering is determined by the SV. A maximum value of SV supports the used K-means clustering algorithm. The K-means clustering method is tested for the output parameters of benchmark circuit s27, c17, b01 and FDFCA. Using the output parameters obtained, K-means clusters were generated and the SVs are estimated for K clusters. In this study, two clusters are formed, namely aged/ degraded and fresh. Figure 11 shows the two clusters of output parameters of benchmark b02 using K-means clustering after aging for a period of 10 years. Cluster 1 and cluster 2 define the fresh and degraded device responses. Table 1 shows that the K-means cluster performance of different benchmark circuits for aging time instants t v with the input of output parameters. Figure 12 shows the maximum average SVs for the K clusters of b02 after 10 years of aging. The aging periods considered are t 1 ¼ 1 day, t 2 ¼ 15 days, t 3 ¼ 0.25 years, t 4 ¼ 0.5 years, t 5 ¼ 1 year and t 6 ¼ 5 years, t 7 ¼ 10 years. The detection rate (DR) of K-means clustering is measured by the value of the silhouette S K . Let Z i be the number of input data in K clusters closer to À 1 and K i denotes the total number of data in K clusters. It indicates that Z i is misclassified in K clusters. Then the DR is given as follows: 0 < S K < 1; data samples are correctly clustered; À 1 < S K < 0; data samples are wrongly clustered: The maximum average SV of K-clusters is calculated and shown in Table 1. Since the SV is greater than the mean of 0.7 at all aging time instants, the clustered data has high accuracy. The first column in Table 1 defines the type of benchmark circuits and the second column defines the number of gates (G) and transistors (T) in the benchmark circuits. The remaining columns describe the DR of aged IC and the maximum average SV for K clusters at various aging time instants. The benchmark circuit c17, s27, b02 and FDFCA have a high possibility of classification between the new and degraded IC from 1 day to 10 years using the K-means clustering. The SV is also high in all aging time instants. The formation of two clusters produces high SVs. The number of data samples with negative values of silhouette reduces the clustering performance. But in all aging time instants, the silhouette analysis has no negative values.
Once the aging time increases, the SV decreases slightly indicating that the parametric features may be difficult to be clustered by K-means clustering. Thus it is difficult to detect the degradation in parameter of the circuit due to process variation or reliability effects after a particular limit of aging.

| Back propagation neural network (BPNN) classification
The performance of the BPNN network is analysed with the value of mean square error (MSE) and receiver-operating curve (ROC). The number of neurons and the number of hidden layers for the network are fixed at the initial stage. A large number of neurons for hidden layers are required to train a large amount of data. Hence it increases the cost of the model. The amount of input data used in the training and testing process for the used BPNN model is low; the dataset contains three features for c17, s27 and b02 benchmark circuits and four features for FDFCA benchmark circuit. The total size of data is 300 for all benchmark circuits, hence the number of hidden neurons is chosen as 2 and the number of hidden layers is chosen as 2 for the BPNN model. The BPNN model classifies the parameter data. The training and testing are carried out for three different mixtures of train and test samples. Here the amount of data considered for testing is 20% and data considered for training is 80%. The classification with a higher DR is extracted and tabulated.
The BPNN classification model shown in Figure 13 is applied to benchmark circuits like c17, s27, b02 and FDFCA. The MSE and accuracy of the prediction model are computed for the training (TR) and testing (TE) of data at aging time instants t 1 , t 2 , t 4 , t 5 , t 6, and t 7 (Tables 2 and 3). The MSE of the BPNN model during testing is higher than training. The area under curve (AUC) in the ROC from the proposed method for the BPNN classification method is higher. Table 4 shows the AUC in ROC for different benchmark circuits used in the BPNN classification of new and degraded devices. The MSE value must be close to zero and the AUC must be close to one; then the performance of the BPNN classification model is high. Values of MSE and AUC in ROC from the model validates that the BPNN is suitable for the classification of recycled IC from the new IC using output response as the input data samples.
In order to evaluate the best performance of the BPNN model, MSE is plotted in terms of epochs as shown in Figure 14. The data used for training, validation and testing are 70%, 10% and 20%, respectively. The best validation performance value is 4.4616eÀ 09 at epoch 30 of the BPNN model classification for data of circuit b02 after the aging of 10 years. The best validation performance MSE decreases as the number of epochs increases, but it may increase in the validation data samples as the NN starts to overfit the training data.

| Two-class support vector machine (TCSVM) classification
The parameter data is applied to the TCSVM model to classify the fresh and degraded data.  Abbreviations: BPNN, back propagation neural network; FDFCA, fully differential folded cascode amplifier; TE, testing; TR, training.

SANTHANA KRISHNAN AND PALANISAMY
Two-class SVM is used as the classification algorithm. It has a high level of accuracy compared to one-class SVM [45]. The disadvantages of TCSVM are its cost and time [45]. In this classification, the size of the input sample is not high and it consumes a reasonable training time (TT) and model size. The main goal of TCSVM is to classify the input data with higher accuracy within a short period of TT. The kernel used in TCSVM is linear. The TCSVM classified data for the s27 benchmark circuit after 10 years of an aging time period is shown in Figure 15 X-axis and Y-axis represent the parameters of fresh and degraded devices. The TCSVM model is evaluated by ROC and TT. The AUC in ROC must be of higher value to achieve good classification efficiency. Benchmark circuits s27, c17, b02 and FDFCA are tested with the used TCSVM classification. The MC and TA B L E 4 AUC of ROC for BPNN classification of benchmarks at different aging time instants BM t 1 ¼ 1 day  Table 5 obtained from TCSVM are AUC in ROC and the model TT for the benchmark circuits at the aging time of t 1 , t 2 , t 3 , t 4 , t 5 , t 6 and t7. The confusion matrix and ROC curve of the benchmark circuit s27 for an aging time period of 10 years are shown in Figure 16. The TT of used TCSVM is higher in terms of several seconds.

| COMPARISON OF USED MACHINE LEARNING METHODS IN THE DETECTION OF COUNTERFEIT INTEGRATED CIRCUITS
The performance of used ML methods for classification is compared in terms of its DR of the aged or degraded IC from the new IC. The DR of K-means clustering defines the performance of identifying the aged device. The AUC in ROC obtained in TA B L E 5 Performance of evaluation of TCSVM for benchmarks at different aging time