Correlation feature and instance weights transfer learning for cross project software defect prediction

ion DIT Depth of inheritance tree. NOC The number of public methods in a given class. MFA Measure of functional abstraction. Encapsulation DAM Data access metric. ZOU ET AL. 61 test at the 95% confidence level. The Friedman test assesses whether there are statistically significant differences among different predictors. The statistics of the Friedman‐test is written as follows: τχ2 1⁄4 12n mðmþ 1Þ ∑ m i1⁄41 R ̄ 2 i − mðmþ 1Þ 4 !


| INTRODUCTION
Software Defect Prediction (SDP) can conveniently assist the quality assurance team to reasonably prioritize and allocate testing resources by identifying the modules or classes that are potentially defective [1]. A majority of existing defect prediction methodologies leverage machine learning and data mining approaches to build a predictor [2][3][4][5]. These approaches can successfully identify defective modules or classes when training data and test data are from the same project. Thus, they are called the Within Project Defect Prediction (WPDP) model. However, in practice, most software companies or organizations seldom collect data to train the defect predictors, either because defective modules and classes labelling is too expensive, or because there are not sufficient training data at an early phase. Fortunately, there exist sufficient labelled data from public data repositories provided by different software companies or organizations. Many researchers have presented several approaches to study SDP using data from these public repositories. Different from WPDP, the source and target projects in Cross-Project Defection Prediction (CPDP) scenarios are not necessarily from the same project. Traditional machine learning methods may not work well in CPDP scenarios as the source and target projects may be entirely different at feature and instance level. To address this issue, transfer learning provides feasible techniques. The related auxiliary information from the source project can be used for the target project by implementing transfer learning [6,7]. The specific methods of transfer learning can be broadly categorized into instance-transfer learning and feature-transfer learning [8,9].
Instance-transfer learning can be subdivided into instancefilter transfer and instance-weighting transfer. In instance-filter transfer, corresponding instances related with the target are selected from the source project to train a CPDP model. Turhan et al. [10] propose the Nearest Neighbour filter (NN filter) model, which filters instances based on the nearest neighbour method. Kazuya Kawata et al. [11] and Peters et al. [12] utilize density clustering and k-means clustering methods to filter instances, respectively. In instance-weighting transfer, training instances from the source project are weighted according to their impact on the target project. Ma et al. [13] propose an instance-weighting transfer CPDP model called Transfer Naive Bayes (TNB) classifier that weights source instances via a data gravitation between instances of two projects.
Feature-transfer learning can be subdivided into featurerepresentation transfer and feature-selection transfer. In feature-representation transfer learning, two projects are transferred to a common high-dimensional feature representation space where the distance of distribution between the two projects is minimized. Transfer Component Analysis plus (TCA+) [14,15] is the state-of-the-art feature-representation transfer learning approach for CPDP. In feature-selecting transfer learning, appropriate features are selected to train a CPDP model. Yu et al. [16] proposed a feature-selecting transfer learning approach, which evaluates the individual predictive ability of each feature and the redundancy between different features. Menzies et al. [17] reported the sub-features Naive Bayes (sub + NB) for SDP and obtained the best results by using log filtering to select sub-features.
Instance-filter and feature-selection transfer learning focus more on filtering the source project's information to weaken the impact of irrelevant cross-project data, which results in wasting part of the data information. Instead, this work mainly focuses on the importance of instances and features to the predictive model, and expresses their importance in terms of weight. Appropriate feature and instance weights are more flexible than feature selection (FS) and instance filtering. For example, setting the weights of some features or instances to 1.0 and the others to 0.0 is equivalent to FS and instance filtering. Feature and instance weighting methods are able to assign continuous positive weights, while FS and instance filter methods merely specify discrete values. Therefore, the authors propose a novel CPDP model named Correlation Feature and Instance Weighting Transfer Naive Bayes (CFIW-TNB) by weighting for each feature and instance rather than single instance filter or FS. Despite its simple structure, Naive Bayes Classifier (NBC) has continuously been rated as one of the top 10 classic algorithms [18]. Thus, using NBC to predict software defect is also advocated by some scholars. The citing predictive performance and comprehensibility are taken as its major advantages [19,20]. NBC's approaches are summarized into six main categories by Jiang et al. [21]: (1) structure extension; (2) feature selection; (3) feature weighting; (4) instance selection; (5) instance weighting; and (6) fine Tuning. Of numerous NBCs to alleviating conditional independence assumption, feature weighting (instance weighting) has placed the focus more on highly predictive features (instances) than those that are less predictive. Some approaches proposed in Ref. [23][24][25][26] belong to the feature weighting NBC model to predict software defects. Intuitively, the highly predictive features should be assigned bigger weights when training a prediction model. Both TNB by proposed Ma et al. [13] and a Credibility theorybased Naive Bayes (CNB) proposed by Poon et al. [27] is the instance weighting NBC model for CPDP. The approaches do not directly filter instances, but mitigate the impact of irrelevant data by reducing the weight of the instance. Our proposed CFIW-TNB is an NBC that combines instance weighting NBC and feature weighting NBC.
We validate the proposed CFIW-TNB against existing five state-of-the-art CPDP approaches and single weighting NBC (TNB, CFW-NB [21]) using 25 real-world datasets from the PROMISE data repository. AUC, F-measure, pd and G-measure are adopted as performance measurement. To statistically analyze the performance between our method and other baseline methods, we perform the non-parametric Frideman test with the Nemenyi's post-hoc test. The authors address the following research questions (RQ): RQ1 How effective is CFIW-TNB method compared with other CPDP methods? RQ2 Is dual weighting mechanism helpful for CPDP ? RQ3 Is CFIW-TNB superior to its variants?
The main contributions of this work are: 1. To better reflect the importance of feature and instance, we proposed a novel CPDP model called CFIW-TNB, which considers both instance-transfer learning and featuretransfer learning rather than a single transferring strategy. It effectively improves the performance of single transfer learning methods. 2. CFIW-TNB enhances the local data gravitation method via considering the density of data distribution. The denser the data distribution, the greater the gravitation, and vice versa. Therefore, it is more flexible in allocating the instance weights than TNB. 3. We validate the effectiveness of our proposed methods, CFIW-TNB, by conducting extensive experiments on 25 real-world datasets.
The remainder of the article is organized as follows: Section 2 reviews relevant works. Section 3 proposes our method. Section 4 introduces the experimental setup. Section 5 shows the experimental results and analysis. Section 6 explains threats to validity. Section 7 concludes this work and proposes future work.
where w i ∈ R + is the weight of the ith feature A i , a i is the value of ith feature A i , P(c) is the prior probability, and P(a i |c) is the conditional probability. P(c) and P(a i |c) can be obtained using Equations (2) and (3), respectively: where n is the number of training instances, n i is the number of values for the ith feature, q is the number of classes, c j is the class label of the jth training instance, a ji is the ith feature value of the jth training instance, x ¼ a 1 ; a 2 ; …; a m ½ �, and δ(⋅) is the indicator function, which is one if its two parameters are identical and zero otherwise. How to define the weight of each feature is crucial and has attracted attention from more researchers. A FWNB can be briefly depicted as Algorithm 1.

| Instance weighting NBC
An instance weighting NBC places emphasis on assigning weights for each instance in the training stage to reflect their predictive capability. These approaches are commonly called Instance Weighting Naive Bayes (IWNB). IWNB incorporates instance weights into the formula to give: where P(c) and P(a i |c) can be computed using Equations (5) and (6), respectively: where w j is the weight of the jth instance. An IWNB can be briefly depicted as Algorithm 2.

Algorithm 2 IWNB algorithm
where the data gravitation so as to transfer the test set information into the classifier. Newton's universal gravitation law provides a unified description of gravitation between any two objects as a geometric property of space time [22]. The gravitation is proportional to the product of the two masses, while inversely proportional to the square of the distance between these two objects: where G is the gravitational constant, m 1 and m 2 , respectively, are the mass of two objects, r is the distance between these two objects and F is the gravitation between the two objects. TNB applies the universal gravitation to simulate data gravitation between training set and test set, and data gravitation is employed as the weight of instance. TNB supposes that one feature has mass M, and the mass of the test set is mM and the mass of each training data is s j M. Thus, the weight of instance x j is proportional to ms j M 2 and inversely proportional to r 2 ¼ ðm − s j þ 1Þ 2 . Therefore, the weight of instance can be written as follows [13]: If all the feature values locate between the maximum and minimum value for a certain instance, the r 2 ¼ ðm − s j þ 1Þ 2 is equal to 1. This moment, this instance is assigned the greatest weight. TNB classifier is briefly depicted as Algorithm 3.

Algorithm 3 TNB algorithm
Input: training dataset: D, an un-label test dataset: L. Output: test instance prediction label for test cases: c(x). Compute Max and Min vectors; for each training instance x j ∈ D do. Compute s j of each instance x j by Equation (7); Compute w j of each instance x j by Equation (9); end for. Train an NBC based on D and weights of instance; Predict label c(x) of x ∈ L by using NBC; Return c(x).

| NBC for software defect prediction
Malhotra et al. [28] have given a statistics that NBC accounts for 47.7% among all the SDP studies. In 2019, Hosseini et al. [29] have reviewed previous studies, and then concluded that Naive Bayes is the most commonly used modelling approach in SDP. These surveys reveal that there exists a room for research to improve NBC for SDP by alleviating the all feature fully independence assumption. For FS NBC, removing redundant and irrelevant features from raw features reduces the dimensionality of the data and improves the effectiveness of the learning algorithm. Menzies et al. [17] reported the subfeatures Naive Bayes (sub + NB), belonging to the FS NBC model, and obtained the best results by using log filtering to select sub-features. Hosseini et al. [30] adopted information gain for FS. He et al. [31] used the FS technique to identify and remove irrelevant features to improve the effectiveness of predictors for SDP.
For feature-weighting NBC, the highly predictive feature deserves a larger weight, and vice versa. Turhan et al. [23,24] presented a series of feature weighting NBC models to alleviate the independence assumption by the feature weighting technique. Besides, Asmono et al. [25] proposed a feature weighting NBC to predict software defects by employing the absolute value of the correlation coefficient as the weighting technique. Recently, Huang et al. [26] utilized information diffusion as weights of feature to learn NBC.
Many studies have proposed some effective NBC for CPDP. For example, in Ref. [32], a universal defect predictor applying NBC as the modelling technique provides the context-aware rank transformation to overcome the difference in distribution between the source and target projects. In TNB proposed by Ma et al. [13], instances in source project are weighted according to data gravitation of instances between source and target projects. CNB [27] employs credibility theory to define instance weight. TNB and CNB belong into instance weighting NBC. Ryu et al. [33] proposed a multi-objective naive Bayes (MONB) considering the class imbalance problem for CPDP.
In additio, there exists structure extension NBC increasing the dependence between features to predict software defect. For example, Arar et al. [34] proposed the feature dependent naive Bayes (FDNB) to alleviate the independent assumption between features.

| PROPOSED METHOD
In this section, we present the proposed CFIW-TNB model in detail. At the end of this section, we have given the whole pseudo-code of the CFIW-TNB model.

| Instance weight
In TNB, applying h(a ji ) determines degrees of similarity s j between instances in training set and test set. The value of h (a ji ) is equal to 1, when a ji is any value between the maximum value and the minimum value, which indicates that a ji is evenly scattered between the maximum and minimum values. In other words, instances from test set and training set are objects with uniform mass. However, in fact, a ji is not evenly scattered between the maximum and the minimum. For example, four features (amc, locm, locm3, ic) from the ant-1.4 software project are shown in Figure 1. They are not evenly spread between the maximum and the minimum. As shown in Figure 1, the distribution is denser between 0 and 40 than between 100 and 200. Thus, the gravitation generated by different parts also varies.
The denser the data distribution, the greater the gravitation, and the sparse the data distribution, the less the gravitation. Therefore, we introduce local data gravitation to ameliorate the data gravitation scheme. For instance x j in the training set, degrees of similarity s � j between it and the test set can be rewritten as follows: where from the test set ith feature denotes k nearest neighbours of a ji . The closer a ji is to a dense region from the test set, the larger is h*(a ji ). Thus, it is more reasonable to employ h*(a ji ) to evaluate the data gravitation. h* (a ji ) corresponds to a more flexible range [0,1] than h(a ji ). Thus, the local data gravitation of instance x j can be rewritten as follows:

| Feature weight
In CFIW-TNB, local data gravitation can be regarded as the weight of training instance. Besides instances are weighted, features are also weighted. In CFWNB [21], the authors argue that the features with maximum mutual relevance and minimum average mutual redundancy are considered to be highly predictive features and thus have greater feature weights. On the basis of CFWNB, our proposed CFIW-TNB considers the distribution distance of the features of two domains. In our proposed CFIW-TNB, we argue that for predictor highly predictive features should be correlated with the task, yet uncorrelated with other features and minimizing the distance of distribution between source and target projects. We employ mutual information to measure the correlation between each pair of features and Kullback-Leibler divergence (KL) to measure the distribution distance of features between source and target projects. Therefore, the various correlations and distribution distance can be, respectively, written as follows: where C is the class label, A i and A j are two different feature variables in the training set; A i 0 is the corresponding feature of A i in test set; c, a i , a j , a i 0 , respectively, denote the values that they take. The formula is displayed in Equation (16): where , respectively. Implementing normalization for the calculations is to keep their ranges realistic and consistent. The formulas are illustrated in Equations (17)- (19), respectively.
Because D i may be a negative number and yet meanwhile the weights for feature are required to be positive, we finally utilize the sigmoid function to transform its value into -59 the range (0,1). The final weight for feature is written as follows: The formula is the same as Equation (1), we repeat weighting feature NBC here for convenience: where prior probability P(c) and the conditional probability P (a i |c) also repeat Now, the detailed Naive Bayes for our proposed CFIW-TNB can be briefly depicted as Algorithm 4. This algorithm includes four parts: (i) computing instance weights; (ii) computing feature weights; (iii) training NBC; and (iv) predicting label of target datasets.

Algorithm 4 CFIW-TNB
Input: training dataset: D, an un-label test dataset: L. Output: test instance prediction label for test cases: c(x), //Part 1: determine instance weights. Compute Max and Min vectors; for each instance x j ∈ D do. Compute s � j of each instance x j by Equation (10); Compute instance weight w j instance by Equation (12); end for. //Part 2: determine feature weights. for each pair of features A i and A i (18); end for. for each pair of features A i and A j (i≠j) do.
Compute I(A i , A j ) by Equation (14); Compute NI(A i , A j ) by Equation (19); end for. for each feature features A i do.
Compute D i by Equation (16); Compute w i f eature by Equation (20); end for. //Part 3: training NBC; Compute the prior probability P(c) by Equation (22); Compute the conditional probability P(a i |c) Equation (23); Build NBC by Equation (21). //Part 4 predict label of target. Predict label c(x) of x ∈ L by using NBC. Return c(x).

| EXPERIMENTAL SETUP
In this section, we describe the experiments in detail to evaluate the CFIW-TNB, including benchmark datasets, evaluation measures and parameter setting.

| Benchmark dataset
To evaluate the proposed CFIW-TNB, we selected 25 different open-projects from the PROMISE data repository collected by Jureczko [35]; and they are extensively adopted in previous empirical studies [3,4,36]. Each module in these projects contain 20 static code features and a defective label (defective or clean). The instance is labelled as 1 if it includes one or more defects. On the contrary, it is labelled as 0 if it is clean. Table 1 summarizes essential information of these projects including project name, the number of instances and the defective ratios. Table 2 presents the description of features including category, feature name and description.

| Performance measures
In each classification task, the predictive result is one of the following four outcomes [37]: In our study, we employed the common four measures to evaluate the predictors: probability of detection (PD), AUC, Gmeasure and F-measure. These measures are defined as follows.
Recall is the accuracy rate for predicted truly defective instance. It is hoped that the predictor can identify more defective modules or classes. It is defined as: Precision is the probabilities of truly defective instances to be predicted as predictive defective instances. It is defined as: F-measure is a trade-off measure between precision and recall, it can balance the two if an increase in precision (recall) exceeds a reduction in recall (precision). It is defined as Probability of false alarm (pf) defines the probabilities of false positive instance in comparison to the total number of clean instances, which is defined as: G-measure, combining recall and pf, is defined as Receiver operating characteristics (ROC) curve is a twodimensional graph in which the x-axis and y-axis are recall and pf, respectively. ROC is suitable for evaluating the overall classifier's performance. AUC calculates the area under the ROC curve, which is independent of the decision threshold (0.5 by default) since AUC in some sense averages performance over all decision thresholds. AUC is an interesting statistics, it is a probability that a randomly positive instance will be ranked higher than a randomly negative instance [20].

| Statistical test method
To further analyse the significant difference between the proposed CFIW-TNB and other predictors, we conducted the non-parametric Friedman-test with the post-hoc Nemenyi's- -61 test at the 95% confidence level. The Friedman test assesses whether there are statistically significant differences among different predictors. The statistics of the Friedman-test is written as follows: where n and m, respectively, denote the total number of the datasets and methods, and R ī denotes the average ranks of the ith method. The statistics τ χ 2 can be regarded as following the follows the F-distribution with (n − 1) and (m − 1) (n − 1) degrees of freedom. We can compare the corresponding p the Friedman value, then determine whether or not to accept the null hypothesis that there are no statistically significant differences among all models. If the null hypothesis is rejected, the post-hoc Nmenyi's-test is performed to check that there is a significant difference between the specific two models. The Nmenyi-test is employed to obtain the critical value of average rank (Critical Difference [CD]) that is calculated with the following formula where q α,m is a definite value related to m and the significant level α, and it is available online. 1 For two predictors, the difference of average rank exceeds the CD, which means to reject the null hypothesis there is no significant difference between them. This significant difference between two predictors can be measured using CD. As shown in Figure 2, two dotted blue lines are shorter than the full red line, which means that predictor 2 is both non-significant difference with predictor 1 and predictor 3. However, the dotted green line is longer than the full red line, which means that predictor 1 and predictor 3 are significantly different. This mean post-hoc Nemenyi's-test cannot create distinct ranks for the models that are compared, but overlapping ranks [38,39]. In this article, we employed the strategy in Ref. [39] to address this issue, which divides the models into different non-overlapping clusters according to the distance between the best average rank and the worst rank (a green broken line in Figure 2). If this distance is less than 1 CD value they belong the top rank group; if this distance is two times the CD value, they are divided into three groups: top rank group, middle rank group and bottom rank group; if this distance is less than two times the CD value and larger than one time the CD value, they are divided into two non-overlapping groups: top rank group and bottom rank group. The model belongs to the top rank group (or bottom rank group) if its average rank is closer to the best average rank (or the worst average rank). If this distance is less than the CD value, all models belong to the same group.

| Parameter setting
In CFIW-TNB, calculating local data gravitation for each training instance requires selecting its k nearest neighbours from the test set. In order to investigate the influence k (number of nearest neighbours), we change the value of k based on the number of test instances with different percentages (e.g. k = n test � 20%, n test is the number of test instances). Here, we choose ant-1.4, ivy-1.1, jeidt 4.0 and xerces 1.4 to investigate k values by using 10 cross-validation. Figure 3 shows the average results of CFIW-TNB on k values with different percentages of the number of test instances. Figure 3 indicates that the predictive results will be affected by k, but this effect is not very sensitive. Therefore, we apply 2% of the number of test instances as the value of k.

| EXPERIMENTAL RESULTS AND ANALYSIS
In this section, we give the experimental results of CFIW-TNB effectiveness and comparison methods. Based on the results achieved, the research questions have been answered as follows.

RQ 1 How effective is CFIW-TNB method compared with
other CPDP methods?
To explore whether the CFIW-TNB approach could surpass other CPDP approaches, we adopted five existing stateof-the-art CPDP approaches as comparisons. These five CPDP approaches including a CPDP approach designed with a limited amount of labelled within-project data (TrAdaboost); three CPDP approaches using only cross-project data to train predictor (NN filter, TCA+ and TNB); a baseline using an underlying classifier trained by the source project to directly predict the defects of the target project. In the work, we selected logistic regression as the underlying classifier. Before each time experiment, we have to select the source domain for one defect prediction. Specifically, we picked up one release of a project as a target project (e.g. ant-1.3), and used releases of other projects as the candidate source projects (i.e. 20 releases not belonging to the ant in the example).
The w/t/l means that CFIW-TNB wins in w datasets, ties in t datasets and loses in l datasets. From the fourth line from the end of Table 3, compared with TNB, CFIW-TNB wins 16 times, ties 5 times and loses 4 times in terms of AUC; compared with TCA+, AUC of CFIW-TNB wins 23 times and ties 2 times; compared with NN filter, TrAdaboost and Baseline, AUC of CFIW-TNB is significantly better on 25 datasets and significantly worse on none. In terms of F-measure, pd and G-measure, the CIFW-TNB approach has won more times than other five CPDP approaches on 25 real-world datasets.
For the four measures (AUC, F-measure, pd and G-measure), we obtained the ranking of all the approaches on each dataset. Figure 4 shows the rankings of four measures for six CPDP approaches using boxplot. Across 25 datasets, the rank of the most frequent occurrence of CFIW-TNB is 1 (ranking number one), the worst ranking is 2 in terms of AUC; the ranking of the most frequent occurrence of CFIW-TNB is 1, the worst ranking is 4 in terms of F-measure, pd and Gmeasure. Overall, ranks of CFIW-TNB are higher than other CPDP approaches on the four measures in most cases.
For further fair comparisons of the six CPDP approaches, we resorted to the well-known Friedman-test with Nemenyi's test. We could obtain q 0.05,6 = 2.850 by querying the certain       where d(a)-(b) denotes the difference between AR of two approaches (a) and (b). We conclude that CFIW-TNB performs significantly better than Baseline, NN filler and TCA+. The distances between AR of CFIW-TNB and TNB, between AR of TCA+ and TNB are both less than CD, which indicates that there is no significant difference between TNB and TCA+, TNB and CFIW-TNB. The Nemenyi's test generated overlapping groups for approaches. In this work, we utilized the strategy in Ref.
[40] to address this issue. The distance between CFIW-TNB and TCA+ is less than two times the CD and more than one time of the CD. The AR of TNB is closer than that of TCA+ so they belong to the same rank group. Similarly, we could obtain the same conclusions in terms of other measures based on the average ranks and relevant values reported in Table 5. Figure 5 shows the results of the Friedman test with Nemenyi's test for six CPDP approaches in terms of the four measures. Groups of the approaches that are significantly different are with different colours. The results of the post-hoc test indicate that CFWI-TNB always belongs to the top rank group in terms of the four measures.

RQ 2 Is dual weighting mechanism helpful for CPDP?
To explore whether the dual weights mechanism can improve the performance of CPDP, we adopted three existing NBC, including TNB [13], CFW-NB [21] and Niave Beyes (NB), as comparisons. TNB and CFW-TB are single instance weighting NBC and single feature weighting NBC, respectively. CFW-NB and NB classifiers trained by source domain to directly predict the defects of target protect. For TNB, we repeated the above experiment on TNB.
The experimental results are described in Tables 6 and 7. From the penultimate lines of this table, CFIW-TNB, respectively, improves TNB, CFW-NB and NB by 4.42%, 9.38% and 15.0% in terms of AUC; by 13.36%, 16.49% and 32.72% in terms of F-measure; by 3.36%, 14.42% and 26.83% in terms of pd; by 13.97%, 23.23% and 32.62% in terms of G-measure.
Compared with TNB, CFW-NB and NB, CIFW-TNB is significantly better on 16, 24 and 25 datasets in terms of AUC, respectively; compared with TNB, CFW-NB and NB, CIFW-TNB is significantly more outstanding on 18, 20 and 22 datasets in terms of F-measure, respectively; compared with TNB, CFW-NB and NB, CIFW-TNB is significantly more accurate on 14, 22 and 28 datasets in terms of pd, respectively; compared with TNB, CFW-NB and NB, CIFW-TNB is significantly more excellent on 20, 24 and 25 datasets in terms of G-measure, respectively. Figure 6 depicts the box-plots of four measures for the four NBC across all 25 datasets. Figure 6 shows the superiority of our CFIW-TNB compared with the three NBC in terms of the median values of all four measures across all 25 datasets. To further fairly with these approaches, we resorted to the wellknown Friedman-test with Nemenyi's hoc test. We could obtain q 0.05,4 = 2.569 by querying the certain table. The differences of AR between CIFW-TNB and other NBC and statistical variables are displayed in Table 8. In terms of four measures, statistical variables τ F are greater than F α (3, 72) = 2.7318, which indicates that the null hypothesis is rejected. Since differences between CIFW-TNB and CFW-NB, CIFW-TNB and NB are greater than that between CD, the proposed CIFW-TNB performs significantly better than CFW-NB and NB. Figure 7 shows the results of the Friedman test with Nemenyi's test in terms of the four measures, which showed that CIFW-TNB belongs to top group in terms of four measures. These observations showed that CFIW-TNB with the dual weighting mechanism performs better than TNB with instance weights and CFW-TNB with feature weights. In summary, we conclude that the performance of the dual weighting mechanism is better than the single weighting mechanism. This is because CFIW-TNB pays different attention to the features and instances with predictive difference, rather than considering a single predictive difference.

RQ 3 Is CFIW-TNB superior to its variants?
Since two weights (feature weight and instance weight) are used in our method, this question is designed to investigate whether or not our method is more effective than single weight or non-weight. To answer this question, we designed three of its variants, including CFW-TNB (only retaining feature weight), CIW-TNB (only retaining feature weight) and NB (non-weight). They can be used to compare the performance of our method against its downgraded version methods that do not consider two weights. All these baseline methods are treated as the variants of CFIW-TNB.
The experimental results are described in Tables 9 and 10. From the penultimate lines of these table, CFIW-TNB, respectively, improves CIW-TNB, CFW-TNB and NB by 4.55%, 5.88% and 15.0% in terms of AUC; by 8.62%, 11.66% and 32.72% in terms of F-measure; by 3.627%, 5.92%`and 26.83% in terms of pd; by 8.62%, 10.82% and 32.62% in terms of G-measure. -67 Compared with CIW-TNB, CFW-NB and NB, CIFW-TNB is more outstanding on 24, 25 and 25 datasets in terms of AUC, respectively; compared with CIW-TNB, CFW-NB and NB, CIFW-TNB is more outstanding on 24, 25 and 22 datasets in terms of F-measure, respectively; compared with CIW-TNB, CFW-NB and NB, CIFW-TNB is more outstanding on 23, 25 and 25 datasets in terms of pd, respectively; compared with CIW-TNB, CFW-NB and NB, CIFW-TNB is more outstanding on 24, 24 and 24 datasets in terms of G-measure, respectively. Figure 8 depicts the box-plots of four measures for the four NBC across all 25 datasets. Figure 8 shows the superiority of our CFIW-TNB compared with its variants in terms of the median values of all four measures across all 25 datasets. To further fairly with these approaches, we resorted to the wellknown Friedman-test with Nemenyi's hoc test. The differences of AR between CIFW-TNB and its variants and statistical variable are displayed in Table 11. In terms of four measures, statistical variables τ F are greater than F α (3, 72) = 2.7318, which indicates that the null hypothesis is rejected. Since the differences between CIFW-TNB and CFW-TNB, CIFW-TNB and CIW-TNB, CIFW-TNB and NB are greater than that between CD, the proposed CIFW-TNB performs significantly better than its variants.  Figure 9 visualized the results of the Friedman test with Nemenyi's test in terms of the four measures for CIFW-TNB and its variants, which showed that CIFW-TNB belongs to top group in terms of four measures. These observations showed that CFIW-TNB with dual weighting mechanism performs better than single weight (only instance or feature weight) and non-weight. In summary, we conclude that the effect of weighting both instances and features is better than that of only weighting instances or features and nonweighting.

| THREATS TO VALIDITY
Threats to external validity means the degree to generalize the research results. In this work, a related threat to validity might be triggered by the single data repository (i.e., PROMISE repository). Although CFIW-TNB can obtain good performance on 25 datasets, it is not certain that CFIW-TNB will be completely suitable for other datasets. In addition, since the features of the datasets employed are all static product metrics and the modules are extracted at the class level, we cannot TA B L E 7 The pd and G-measure of our CFIW-TNB compared with TNB, CFW-NB and NB, respectively guarantee that our approach can be generalised to the defect datasets with process metrics and the modules extracted at the file level. Our approach should encourage more researchers to develop more practical prediction models. Threats to construct validity mainly refer to selecting the baselines and the evaluative measures. Totally five CPDP approaches and three NBCs are chosen as comparisons. The authors cannot compare CFIW-TNB with all existing CPDP approaches and NBC with instance weighting or feature weighting since the space is limited. We employ four extensively used measures to evaluate the predictive performances of our CIFW-TNB and comparisons for defect prediction. These measures do not take classification costs, different classification errors result in different costs, into consideration, which may be a potential threat to the validity of results.

| CONCLUSIONS AND FUTURE WORKS
In this work, we propose a new CPDP model with a dual weighting mechanism called CFIW-TNB based on NBC, which is not the prior instance filter and FS transfer learning, but weights for all instances and features. The CFIW-TNB defines a novel local data gravitation as the instance weight to emphasise the impact of instances on source data to target.

TA B L E 9
The AUC and F-measure of our CFIW-TNB compared with CIW-TNB CFW-NB and NB, respectively -71 Compared with TNB, CFIW-TNB considers the impact of density of data distribution on data gravitation. The denser the data distribution, the greater is the gravitation, yet the sparse the data distribution, the less is the gravitation. Therefore, it is more flexible in allocating weights of instance than TNB. The CFIW-TNB employs feature weights to emphasise more on highly predictive features than those that are less predictive. We argue for a predictor that highly predictive features should be highly correlated with task, uncorrelated with other features and minimized distribution distance between target feature. And they are incorporated into weights of feature. We validate the proposed CFIW-TNB against existing five state-of-the-art CPDP approaches using 25 datasets from the PROMISE data repository, use AUC, F-measure, pd and G-measure as performance measure of predictors. And the non-parametric Friedman-test with the post-hoc Nemenyi-test is taken as the statistical method. Extensive experimental results demonstrate that our CFIW-TNB achieves better results compared with existing CPDP in most cases; and the dual weighting mechanism can improve the performance of predictor better than single instance weighting or feature weighting. In the future, the authors plan to evaluate CFIW-TNB with datasets from more software projects collected from different repositories. Furthermore, considering a more practical scenario, the authors will extend the proposed CFIW-TNB into imbalance learning and cost-sensitive learning in CPDP tasks.

TA B L E 10
The pd and G-measure of our CFIW-TNB compared with CIW-TNB CFW-NB and NB, respectively -73