Cross‐project defect prediction method based on genetic algorithm feature selection

With the continuous development of Internet technology, the role of software in life is increasing, and software defect prediction (SDP) is a key means to ensure software reliability. SDP is to predict the modules that may have defects in advance based on the historical data of software projects, and its purpose is to maximize the use of testing resources. However, in the actual development process, the project that needs to be predicted is often a new project for which there is little or no historical data. Therefore, how to use the massive data of other related projects to build a cross‐project software defect prediction (CPDP) model has received extensive attention from scholars. However, due to the differences in data distribution and class imbalance between different projects, the performance of CPDP is greatly affected. Therefore, on the basis of CPDP, this article proposes a feature selection method based on genetic algorithm (genetic algorithm feature selection, GAFS). GAFS mainly includes two stages: feature selection and ensemble training. In the feature selection stage, this article proposes a global search adaptive feature selection method based on genetic algorithm, which uses the integrated training results of candidate feature subsets on target data to migrate the optimal feature subset. In the ensemble training phase, the EasyEnsemble method is used to alleviate the class imbalance problem, multiple naive Bayesian classifiers are constructed, and then the final model is constructed through ensemble learning. In this article, F1‐score and MCC are used as the test indicator, and comparative experiments are carried out on AEEEM and Promise. The results show that compared with the five comparison methods, GAFS can improve the average F1‐score and MCC much more. For example, GAFS can improve the average F1‐score value by 38.9%, 31.6%, 35.1%, 22.0%, and 31.6%, respectively. In most cases, it can effectively improve the performance of the model and achieve better prediction results.


INTRODUCTION
With the rapid development of the Internet and the wide application of information technology, software has become a "common meal" in the people's lives.Small software defects can lead to major security incidents.In 2010, the Japanese Bit Exchange was attacked by hackers due to a huge security defect in its trading software, losing more than 850,000 bitcoins.In 2020, thousands of close contacts did not receive self-isolation notices due to the lack of timely updates on the new coronavirus app of the UK's National Health Service (NHS), causing the virus to spread freely.In the same year, the promulgation of policies for working from home and online classes made online learning software "popular overnight." The average daily number of online users of Zoom in the United States has skyrocketed from 1 trillion to 100 million, threatening the security system and causing tens of thousands of user privacy leaks.All of the above are enough to prove that the software defects affect a wide range of fields, such as, finance, national defense, transportation, privacy, the people's livelihood and politics, and their importance is evident.
A software defect is a bug that exists in software.Defect makes some functions of the system fail, causing the software to fail to meet the corresponding functional requirements.Software testing is to test and evaluate the software in all aspects by specialized technical personnel under the guidance of standardized test documents, and finally find the modules that may have defects.In the process, it will consume a lot of manpower to test all the modules that may have defects one by one.Therefore, if the list of potentially defective source code can be predicted in advance, the quality assurance team can focus on the defective modules in the software product in advance, thereby reducing the test scope and improving the test efficiency.
Software defect prediction (SDP) 1 is to help software developers maximize the use of testing resources and complete product delivery with high quality by predicting possible defects in software.Many classic SDP models have been born in the research field of SDP, [2][3] but they are still affected by problems such as difference of data distribution, curse of dimensionality, and class imbalance.These problems cause the application scenarios of the SDP model to be limited, and the prediction performance is not ideal.Therefore, the research prospects in the field of SDP are still broad, the research value is still huge, and the research significance is still far-reaching.
For the research on SDP, people initially focused on within project defect prediction (WPDP), trying to build a model from the perspective of variable statistical analysis.For example, Akiyama et al. 4 proposed the SDP formula from the perspective of univariate statistical analysis: Defect = 4.86 + 0.018LOC (LOC, number of lines of code).From the perspective of multivariate statistical analysis, Halstead et al. 5 believed that the possibility of bugs in software modules is related to the possibility of bugs in software programs.The number of operands is proportional to the complexity of the operator.Correspondingly, McCabe et al. 6 believe that the possibility of bugs in software modules is proportional to the complexity of the control flow in the software program.However, with the popularity of machine learning in the industrial field, more and more people try to apply machine learning to SDP problems, and the results show that this method has better performance than traditional variable statistics methods. 7ith the acceleration of the software project development process, software applications emerge in an endless stream.In the actual development environment, the projects that need to be predicted are often newly developed projects, and there is an extreme lack of training data, which makes the model performance poor.However, if the existing data are labeled one by one to expand the training data, then it requires a lot of time and manpower, and the research progress of SDP is also hindered.As the researchers' understanding of transfer learning continues to deepen, the cross-project software defect prediction (CPDP) is proposed by combining transfer learning with SDP. 8 By migrating the data of related projects into the target project, the training data set required by the model is expanded, and the impact of data scarcity on the model performance is alleviated.However, due to the data disorder phenomenon caused by data update errors, the CPDP problem based on transfer learning has attracted much attention. 9CPDP effectively solves the data shortage problem of new projects, and has a profound impact on the research in the field of SDP. 10 This article mainly aims at the situation that the software project that needs to be predicted is a new project, lacks or even has no historical data, resulting in poor performance of the prediction model.On the basis of CPDP, a feature selection method based on genetic algorithm GAFS is proposed.Improvements have been made mainly in the following two aspects: 1.It is difficult to find a suitable calculation formula to measure the similarity of feature distribution among different projects.Therefore, this article proposes a wrapping feature selection algorithm GAFS, which adaptively finds the optimal feature subset by means of global search to obtain the best training data set.
2. At the same time, this article uses the EasyEnsemble method to alleviate the class imbalance problem, and combines the ensemble learning technology to make experimental improvements.
The rest of the article is organized as follows: Section 2 introduces the related work in SDP.Section 3 presents the details of the method we proposed.Section 4 describes the design of the experiment and analyzes the experimental results.And finally, Section 5 summarizes the full text and looks forward to the future work.

RELATED WORK
Software defect prediction is a relatively hot research field focused by the current software engineering community.In this section, we will introduce the relevant knowledge and methods in the field of SDP.

Software defect prediction
Recently, intelligent software development methods have received extensive attention from researchers in the software engineering community.SDP research is an important part of intelligent software development methods and an important application scenario of AI4SE.SDP is to combine the mainstream machine learning method with the defect data set to train the defect prediction model, and then predict whether the target software module contains defects.For the whole life cycle of software engineering, SDP research has two important goals.One is to establish a SDP model.The model should be able to provide practitioners with deep insights, including understanding the characteristics associated with past software defects and understanding why the defect prediction model made a certain prediction.So as to help the software development team scientifically design software development activities.The second is to improve the performance of the defect prediction model.Therefore, before the software testing phase begins, testers can effectively allocate limited testing resources and formulate efficient and resource-saving SQA plans.For how to improve the performance of the defect prediction model, researchers conducted large-scale research on two application scenarios of defect prediction.

Within-project SDP
WPDP is to train the model with some data of the project itself in the target domain, and then use the model to predict whether other parts have defects.
In the existing research, Turhan et al. 11 proved that the Naive Bayes algorithm performs better than other machine learning algorithms (such as Logistic, DT, SVM, and C4.5).Ma et al. 12 et al. proposed an improved semi-supervised SDP model for class imbalance and limited labeled data issues.The method uses random undersampling technology to resample the original training set to construct a dataset, and the results show that this method is better.Catal and Diri. 13roposed the semi-supervised model YATSI.The algorithm can use different classifiers for model building (such as J48, Random Forests, NB, and Kstar).The results show that on large datasets, the performance of the NB classifier is significantly improved by the YATSI method, and on any other dataset, the performance of all classifiers except NB can also be significantly improved.Zhong et al. 14 proposed an unsupervised model based on expert evaluation.The model first uses the K-means algorithm to cluster unlabeled instances, then calculates the average metric (i.e., centroid) for each cluster, and then evaluates each cluster (defective or not) by software quality experts.Experimental results show that the method is effective in detecting potential noise present in the software.Nam and Kim. 15 proposed automatic defect prediction methods CLA and CLAMI that remove the manual labeling link.CLA contains two steps of clustering and labeling.CLAMI additionally includes two steps of metric selection and instance selection.Experimental results show that CLA and CLAMI can achieve better prediction results compared to supervised learning, threshold-based methods and expert-based methods.

Cross-project SDP
CPDP is to train a model with the existing projects in related fields, and then use the model to predict whether the projects in the target domain have defects.
At first, scholars directly used the existing project data as source data to directly train to obtain a predictive model.In the existing CPDP research, Ghtra et al., 16 Bowes et al., 17 and Malhotra and Raje. 18conducted comparative experiments on various machine learning algorithms (such as Logistic, DT, SVM, C4.5, and NB), experiments show that the performance of each algorithm model is different, and NB can achieve better results in most cases.Therefore, how to combine the advantages of different classifiers has become a new research idea, and ensemble learning is attracting attention.Panichella et al. 19 proposed the combined defect predictor (CODEP).The method integrates multiple base classifiers by stacking.The results show that CODEP outperforms a single prediction model based on AUC and cost-effectiveness.Subsequently, in order to improve the performance of model prediction, more and more scholars use transfer learning to build CPDP models from the perspective of datasets.According to the content of the transfer, related research can be roughly classified into two categories: feature transfer and instance transfer.
Feature transfer is to process the features of the data set to make the distribution of source projects and target projects similar, and then use machine learning algorithms to build prediction models.According to whether the measurement elements of the two are the same, they can be divided into feature selection and feature mapping.Feature selection is to find the optimal feature subset based on the training results.Wang et al. 20 proposed a CPDP method based on a simplified metric set.This method first selects the top k features in the feature set according to the coverage index to form a top-k feature subset, and then removes redundant features according to the thresholds of recall, precision and F-measure to form a minimum feature subset, so as to achieve The purpose of simplifying the metric set.The experimental results show that the predictive model built using the simplified set of metrics performs better in the case of limited resources.Ni et al. 21roposed a CPDP method FecTrA based on a clustering algorithm.The method selects the top k features from the cluster according to the feature similarity to form a transfer feature set.Then select the first n instances according to the instance similarity to form a training set.Finally, TrAda is used to obtain the final integrated CPDP model.Experiments show that FecTrA has better prediction performance than other classical CPDP algorithms (TCA+, Burak, Peters, etc.).Later, because sometimes the attribute measurement systems of source project and target project data are different, scholars have proposed feature processing methods of feature mapping.Nam et al. 22 proposed TCA for CPDP based on migration component analysis. 23The method first establishes a mapping function f, and transforms the source project and the target project through f so that they are uniformly distributed in the new feature space.Experiments show that the performance of TCA+ is comparable to that of WPDP.
Instance transfer is to process the instances of the dataset, make the data distribution of the source project and the target project similar from the instance point of view, and then use the machine learning algorithm to build the CPDP model.Turhan et al. 24 proposed the instance filtering method Burak based on instance selection.For each instance Ti in the target project instance T{T1, T2, T3, … , Tm}, the method selects the k with the highest similarity from the source project instance set S{S1, S2, S3, … , Sn} Instances {S1, S2, S3, … , Sk}, form a new dataset D{S1,S2,S3 … ,Smk}, and then remove the repeated x instances {S1, S2, S3, … , Sx} constitute the final training set D{S1, S2, S3, … , Smk-x}.Peters et al. 25 proposed the instance filtering method Peters.For each instance Si in the source project instance S{S1, S2, S3, … , Sn}, the method first selects the one with the highest similarity from the target project instance set T{T1, T2, T3, … , Tm} and labeled as Ti, which constitute the labeled sample set L{T1, T2, … , Tn}.Then, according to L further select the most similar samples from S to form the final data set D. He et al. 26 proposed the Tow-Stage-CPDP method.First, the samples with the closest distance to the target project instance set T{T1, T2, T3, … , Tm} are selected from the source project instance set S{S1, S2, S3, … , Sn} according to the feature distribution as a whole; secondly, and the Burak filtering method is used locally to filter the instances to form the training data set.However, after instance screening, although the correlation between S and T is improved, some data information is also lost.Ma et al. 27 proposed transfer Naïve Bayes (TNB) based on changing instance weights.The algorithm sets larger weights for source instances that are similar to the target instance when building NB.The results show that the prediction performance of TNB is better than that of the neural network filter-based method.

GENETIC ALGORITHM-BASED FEATURE SELECTION METHOD (GAFS)
At present, in response to the problem of how to select features, most scholars use the filtering method, that is, the similarity calculation formula is used to measure the similarity between the features of the source project and the target project, so as to select the features with the most similar distribution.For the calculation of similarity, most of the calculation formulas are defined empirically, which cannot accurately measure the similarity of feature distribution, which greatly affects the performance of model prediction.Therefore, this article uses the advantages of GA global search adaptively to find the optimal feature subset.At the same time, in order to reduce the impact of class imbalance problem on training results and model performance, this article adopts the EasyEnsemble method in the model building stage, and performs integrated training on all the basic classifiers NB at the same time to obtain the final integrated model.

Research framework
The overall research framework of GAFS is shown in Figure 1.
1.In the feature selection stage, GAFS uses the genetic algorithm of meta-heuristic search to search for the optimal feature subset by integrating the results of the training stage on the feature space of the source project dataset and the target project dataset.Firstly, the labeled source project data are used as the training set, and the unlabeled target project data is used as the validation set.Next, data preprocessing makes the data fit a normal distribution.Finally, the validation result F1-score of the validation set on the model is used as the fitness, and then the optimal feature subset is searched through GA. 2. In the ensemble training phase, GAFS mainly uses the EasyEnsemble in the undersampling method to solve the class imbalance problem.First, random undersampling is performed for a total of n iterations.The results of each sample are then trained to produce multiple base classifiers.Finally, ensemble learning builds the final ensemble model.

Data structure and algorithm description
Next, we will detail the experimental design of this article including data preprocessing, details of the proposed method and algorithms, and the process of model construction.

Feature selection stage based on genetic algorithm
This article proposes a feature selection method of genetic algorithm based on meta-heuristic search.According to the verification results fed back by the verification set on the model, the fitness function is calculated, and then the optimal feature subset is searched through the genetic algorithm according to the fitness function.The GA-based feature selection stage is shown in Figure 2. Data preprocessing: read the source project data S as the training set, and the target project data T as the validation set.Firstly, let the class feature column recode to 1 and 0 (1 means defective, 0 means no defect).Then, sigmoid transformation is performed on the feature data to make the data distribution between 0 and 1, and then z1-score transformation is performed on the data to make the data conform to the standard normal distribution and reduce the dimensional difference.
Chromosome definition: Each chromosome pop is a vector (each element has a value of 1 or 0), with length genes_num, representing a subset of migration features F{F 1 ,F 2 , … ,F n }.When the element value is 1, it means that the corresponding feature F i belongs to F, and if the element value is 0, then it does not belong.
Population definition: Population is a two-dimensional array composed of pop_size pops, and the size is pop_size*genes_num.In order to reduce the time cost of running the algorithm and the range of feature space searched by the genetic algorithm, the chromosomes in the population are composed of randomly generated bit strings, pop_size = 100.Fitness function: GAFS feeds the F1-score of the integration stage.The purpose is to reduce the uncertainty and error caused by the artificially defined correlation calculation formula.
Selection operator: Select the parent chromosome according to the fitness value of the parent chromosome.
Crossover operator: Swap genes from the selected parents with a certain probability.
Mutation operator: The selected parent will modify the gene with a certain probability.
We set the number of parents is the number of features in the dataset, the crossover rate is 0.3, the mutation rate is 0.1, and the mutation method is the single point mutation method.Algorithm 1 is a description of the feature selection stage.Algorithm 1 is based on the global search idea of GA, and uses the F1-score fed back in the integration stage as a fitness function to guide the genetic algorithm to search for the optimal feature subset.The input is the source project data S, the population size pop_size, the target project data T, the number of genes in a single chromosome genes_num, the maximum reproduction generation max_gen, the crossover operator crossover_rate, the mutation operator mutation_rate and the basic classifier n, and the final output of the algorithm is the optimal feature Subset best_features and best fitness value best_f1.Set population: Initial population with pop_size members randomly, each member is represented by an array of genes_num bits, like (1,0,0,1 … ,0, the number of bits is genes_num) For each iteration in max_gen do For each member in population do Set S, T: Ergodic the meber, if the i th bit is 1, the i th feature is selected, otherwise the i th feature is not selected Filter S and T with the selected features Get F1-socre: Get current F1-score from Ensemble Learning Phase Add the current F1-score to pop_f1_list End For.Get best_pop: Choose the best_pop in current population, where F1-score is the largest in pop_f1_list Set best_pop: best_pop = crossover(best_pop) best_pop = mutation(best_pop) End For Return: best_features and best_f1

Model building stage based on ensemble learning
In the genetic algorithm stage, the optimal individual selection is mainly based on the fitness function F1-score of the ensemble learning stage.In order to solve the influence of the large difference between the number of defective samples and non-defective samples in the training set on the learning process, that is, the class imbalance problem, we Ensemble learning: Average weighted calculation of each classifier in the basic classifier set N{NB1, NB2, … , NBn}, if the final prediction result is greater than or equal to 0.5, it is considered defective, and less than 0.5 is considered to be no defect.
Algorithm 2 is a description of the ensemble learning phase.Algorithm 2 uses EasyEnsemble to alleviate the class imbalance problem, and obtains the optimal model through ensemble training.In Algorithm 2, the input is the training set S and the validation set T (after feature selection), and the final output of the algorithm is the prediction result y_predict of the validation set in the integration stage, and the fitness function F1-score.

EXPERIMENTAL VERIFICATION
This section may be divided by sub-sections.It should provide a concise and precise description of the experimental results, their interpretation, as well as the experimental conclusions that can be drawn.

Experimental dataset
We used three datasets frequently used by researchers, AEEEM 28 and Promise. 29The description information of the datasets is shown in Table 1, where the first five belong to the AEEEM dataset, the last five belong to the Promise dataset.

Evaluation indicator
Whether there is a defect in a software module can be regarded as a binary classification problem, that is, defective and non-defective.If the one-hot coding method is used, then the defective can be recorded as 1 (i.e., the prediction result is true), and the non-defective can be recorded as 0 (i.e., the prediction result is false).][31] However, the Accuracy cannot accurately and comprehensively reflect the training results of any dataset on the model.When the number of positive samples (T) and the number of negative samples (F) are very different, for example: T:F = 99:1, the results predicted by the model are all T, and the A = 99% of the model, but the most critical negative samples was not correctly predicted.Therefore, when the samples are not balanced, it is inaccurate to use the Accuracy to evaluate the model.For Precision and Recall, generally, the higher the Precision, the lower the Recall, and vice versa, choosing one of them as the evaluation index is one-sided.The F1-score is the harmonic average of Precision and Recall, which includes both precision and recall.Combining the performance of the two can fully reflect the performance of the model.Recently, MCC has also been widely used by researchers.MCC considers the four situations that appear in the actual prediction (TP, TN, FN, and FP), which is a more balanced indicator.Therefore, we use F1-score and MCC as the main evaluation indicators.

Experimental design
GAFS is a method of how to obtain more ideal prediction results when there is a lack of data for the SDP problem, and the CPDP method is used to supplement the training data.In order to reduce the difference of data distribution and the class imbalance problem in the data set, GAFS mainly selects the optimal feature subset based on the genetic algorithm, and uses the EasyEnsemble method to form multiple basic classifiers, and finally weights and integrates the final model.In order to verify the performance of GAFS, this article sets the following questions:

Experimental results and analysis
For question 1: Is the prediction performance of GAFS better than other classical CPDP methods?Tables 2 and 3 are the performance comparison of the benchmark algorithm (NB), the classic algorithm (Burak, 21 Peters, 22 CODEP, 16 TCA+ 19 ) and the proposed method GAFS.In order to obtain the generality of GAFS, we take F1-score and MCC as the evaluation index, and train GAFS three times and take the average value as the result.Experiments show that: (1) In the 40 CPDP scenarios of AEEEM and Promise, with the mean of the F1-score and MCC index as the evaluation index, GAFS achieves the best results compared with other algorithms.(2) Compared with the mean F1-score and MCC of the benchmark method on AEEEM and Promise, the GAFS is improved by 38.89%, 106.67%, and 54.84%, 150%, respectively, and the effect is significant.Compared with other classic models (CODEP, TCA+, Peters, Burak), GAFS is improved by 35.14%, 20%, 35.14%, and 31.58% in F1-score and 93.75%, 72.22%, 24%, and 63.16% in MCC on AEEEM dataset, and improved by 20%, 23.08%, 4.35%, and 11.63% in F1-score and 66.67%, 57.89%, 20%, and 25% in MCC on Promise dataset, respectively, and the effect is also very significant.
For question 2: the of transferred features obtained by GAFS better than filtered feature selection methods?Tables and 5 compare the with other filtered feature selection methods and wrapper methods on AEEEM and Promise datasets.The experiments show that: (1) On the 20 CPDP scenarios of AEEEM, taking the mean of the F1-score index and MCC as the evaluation index, there are 18 and 17 scenarios where GAFS achieves the best results compared with other algorithms respectively.On the 20 CPDP scenarios of Promise, there are 20 scenarios where GAFS achieves the best results compared with other algorithms on both F1-score and MCC index.( 2

THREATS TO VALIDITY
This section briefly discusses some threats to the validity of the conclusions.The first is the experimental dataset used in this article.We conducted experimental research on the method proposed in this article on some projects of the AEEEM and Promise datasets.Although these software projects are often used in past research, the research conclusions based on these datasets may not be generalizable to all datasets, especially some non-open source proprietary software projects.Therefore, the conclusions of this article should be re-validated on more datasets, especially on imbalanced datasets.
The second is the classification model used in this article.We use the Naive Bayes model often used by researchers as the base classifier.Although good experimental results have been obtained on the Naive Bayes model, there are many models and types of classifiers, including composite models with stronger capabilities.Therefore, the generalization of the conclusions of this article may be limited by this.
Next are the evaluation metrics used in this article.For the performance evaluation index, we selected F1 and MCC.These two metrics have been widely used in the previous research on defect prediction.The F1-score is a comprehensive evaluation of precision and recall.The MCC metric perform well in the context of class imbalance, however, more other types of metrics are not used in this article, such as G-measure, and AUC.Therefore, the validity of conclusions may also be affected by different evaluation metrics.The last is the implementation process.For the comparison method, we reproduced the description in the original article, and the implementation process may be affected by the programming level and thus affect the final experimental results.For the method proposed in this article, many parameters may affect the experimental conclusions, so a more in-depth study will be investigated in the future work.

CONCLUSION AND FUTURE WORK
At present, in the CPDP research based on feature selection, most scholars use the filtering method, that is, use the similarity calculation formula to measure the similarity between features, so as to select the features with the most similar distribution in the source project and the target project.However, the calculation of similarity is mostly empirically defined calculation formula, which cannot accurately measure the similarity of feature distribution, which greatly affects the performance of model prediction.Therefore, based on the existing CPDP research, this article proposes a feature selection method based on genetic algorithm.By means of verification search, the optimal feature subset is found adaptively.At the same time, in order to reduce the impact of the class imbalance problem on the training results and model performance, the EasyEnsemble method was adopted in the model building stage, and all the basic classifiers were integrated training to obtain the final integrated model.The experimental results show that compared with the filtering feature selection method, the global search method based on genetic algorithm can obtain a better set of transfer features.At the same time, compared with the existing classical CPDP method, the prediction effect of GAFS is more superior.But there are still many aspects worthy of in-depth study.As far as this article is concerned, with the increase of the population number and the number of genes in the genetic algorithm, the search time of the genetic algorithm is also increasing.Therefore, it is necessary to consider how to optimize the genetic algorithm to reduce the code running time.In addition, the most current defect predictions still only perform well at the file level, but cannot give more fine-grained prediction results.At the same time, the existing defect prediction methods cannot explain the reasons for making certain predictions.Therefore, our future research will focus on solving the above problems.

2
Feature selection stage based on genetic algorithm.

Algorithm 1 .
Feature Selection Phase Input: S: Source project T: Target project pop_size: Number of chromosomes in a population genes_num: Number of genes in a chromosome max_gen: Maxinum number of generations crossover_rate: Probability of crossover for the genetic algorithm mutation_rate: Probability of mutation for the genetic algorithm Output: best_features: The selected feature subset best_f1: F1-score of the best solution Method: Set S, T: Read source project datasets (S) and target project datasets (T) Remove the class label from both S and T Sigmoid and Z1-score the data of S and T use the EasyEnsemble method to generate n Naive Bayes classifiers, and then use the linear weighting method to ensemble training these n base classifiers, and the F1-score of the training result is fed back to the genetic algorithm stage as a fitness function.The model building stage based on ensemble learning is shown in Figure 3. EasyEnsemble: Randomly undersample the sample set S{I1, I2, … , Imore, C1, C2, … , Cless} after feature selection in the genetic stage.A total of n times are executed.For each iteration, for the current sample set S, select samples P{I1,I2, … , Iless}.Then combine P and Sless to form a training set Strain{C1, C2, … , Cless, I1, I2, … , Iless}, and then train Strain on NB to get the basic classifier set N{NB1, NB2, … , NBn}.Finally, the N ensemble training is obtained to the final model.Base classifier: NB classifier is used to train each training set Strain{C1, C2, … , Cless, I1, I2, … , Iless}.

3
: S: New source project datasets after Feature Selected Phase T: New target project datasets after Feature Selected Phase n: Number of basic classifiers Output: y_predict: Prediction results of the current validation set F1-score: F1-score of the current solution Method: Initialize: Set clf_list = empty Set x_train_list = empty For each iteration in n do Get S and T from Feature Selected Phase Set x_train_list, x_test_list: Use EasyEnsemble to under-sampling, return the current list of training data set and verification data set Set clf_lsit: Ergodic the x_train_list, build based classifier for each iteration Add the current classifier to clf_list End For For each clf in clf_list do Get y_predict: y = ∑ n i=0  i * y i n , y i is the result of the current clf if y i > =0.5: y_predect = 1, else: y_predict = 0 Get F1-score: calculate the current F1-score by current y_predict, and feedback to Feature Selected Phase End For Return: y_predict and F1-score Dataset S Model building stage based on ensemble learning.
Experimental datasets.Comparison of F1-score and MCC values between GAFS and other classic cross project SDP model on AEEEM.
TA B L E 1Note: Bold values are the optimal values for each corresponding indicator in each row.
) Compared with other feature-based methods based on feature filtering and wrapper methods, that is, using chi-square test, F-test, mRMR test Comparison of F1-score and MCC values between GAFS and other classic cross project SDP model on Promise.Comparison of F1-score and MCC values between GAFS and other feature-based methods on AEEEM.
Note: Bold values are the optimal values for each corresponding indicator in each row.TA B L E 4 Comparison of F1-score and MCC values between GAFS and other feature selection methods on Promise.
Note: Bold values are the optimal values for each corresponding indicator in each row.