Joint loan risk prediction based on deep learning‐optimized stacking model

In recent years, China's automobile industry has undergone rapid development, creating new opportunities for the auto loan industry. Currently, auto financing companies are actively seeking to expand their cooperation with banks. Therefore, improving the approval rate and scale of joint loan business is of significant practical importance. In this paper, we propose a Stacking‐based financial institution risk approval model and select the optimal stacking model by comparing its performance with other models. Additionally, we construct a bank approval model using deep learning techniques on a biased data set, with feature extraction performed using convolution neural networks (CNN) and feature‐based counterfactual augmentation used for balanced sampling. Finally, we optimize the model of the prediction of auto finance companies by selecting the optimal coefficients of loss function based on the features and results of the bank approval model. The proposed approach leads to an approximately 6% increase in the joint loan approval rate on the actual data set, as demonstrated by experimental results.


INTRODUCTION
With the rise of the consumption level and the change in consumption concept, young people are used to consuming in advance with credit cards.And loans are provided to lessen stress when few people can afford to purchase a house or a car in full.Figure 1 shows that the number of credit activities in China has risen rapidly in the 21st century, especially in the last decade.
Although commercial banks are well-capitalized, they face challenges in reaching out to long-tail customers in the market due to limited scenarios.Identifying and managing risks associated with these customers is also difficult for them.As a result, the consumer loan business of commercial banks has been slow to develop.
In contrast, consumer finance companies have a significant advantage in terms of market presence and data.They have successfully cultivated a large customer base consisting of long-tail customers and have accumulated substantial data on consumer spending habits and credit status.This advantage enables them to conduct inclusive finance business with lower costs and higher efficiency.Through collaboration with consumer finance companies, traditional commercial banks can leverage their expertise to collect data on small consumer loans and develop risk management models, thereby enhancing their capabilities in the small loan business.On the other hand, consumer finance companies can achieve rapid business growth with minimal capital investment.
The business pattern of the joint loan is as follows: 1. Consumer finance companies are responsible for the initial review of borrowers and recommend eligible borrowers to commercial banks.It is important to note that only borrowers with high scores will be recommended to the banks.2. Commercial banks conduct a second review using their credit scoring model to make the final credit decisions.3. Banks and consumer finance companies distribute the loan interest and assume their respective risks based on the proportion of loan funds.
Taking an auto consumption scenario as an example, consumers apply for car loans from an auto finance company with fewer requirements and a shorter application process.To alleviate financial pressure, the company collaborates with banks to offer joint loans.If an application receives a high score from the company, it will be recommended to the bank for a second review.The loan is granted by both the company and the bank if the application is approved on the second review.However, if the application is not approved by the bank, the company will have to grant the loan alone.The company aims to have all recommended applications accepted by the bank to maximize profit while minimizing capital investment.
The company wish to get higher approval rate with the purpose of maximizing leverage.Generally speaking, the companies and the bank issue joint loans at a ratio of 1:4.Assuming that you want to borrow five dollars, the company will lend you one dollar and the bank will lend you four dollars.Due to the fact that company provides loan guarantees to the bank, it has more profit distribution than the bank.From the perspective of economic profit, the higher approval rate means that the company can issue the higher total amount of loans and can gain higher potential profits.
However, not every recommended customer will be accepted by the bank.Therefore, the company needs to train a credit scoring model to classify whether a loan application can meet the bank's requirement and pick out the ones classified as "yes".
The existing credit score prediction methods are mostly based on labeled datasets, mining customer features to establish models.For the specific application scenario of joint loans, most customers do not have a certain label and are either approved or not approved by the joint loan review.For such a semi supervised learning problem, the existing methods cannot establish a suitable model.
Our main contributions and innovations are as follows: 2. We enhanced the performance of semi-supervised learning in the context of joint loans by leveraging the deep learning model to assign predicted labels to data with missing labels.3. We designed a new loss function to optimize the stacking model, improving the classification performance beyond that of the original stacking model.
The rest of this paper is organized as follows.Section 2 discusses related works.Section 3 introduces the methods used in different stages.Section 4 gives the experimental results and analysis.Section 5 concludes the paper.

Loan risk prediction
Loan customer risk prediction has been widely concerned by scholars at home and abroad, and has become a frontier subject in the field of finance and Internet credit.Using data mining technology to establish a loan risk prediction model based on customer loan application data has a huge demand and broad application space. 2,3][6] In recent years, significant progress has been made in using data mining techniques for loan risk prediction.For instance, researchers have utilized data mining techniques such as logistic regression and radial basis function to analyze loan application data for loan decision-making. 7They have employed the cluster support vector machine model (CSVM) to assess individual credit risk and address the computational overhead associated with nonlinear classification methods when dealing with large-scale data. 8To tackle challenges related to imbalanced distribution, high levels of noise, and oversampling of balanced sample sets in high-dimensional spaces, researchers have combined the synthetic minority over-sampling technique (SMOTE) with other SMOTE techniques to predict loan risk.Additionally, they have utilized the XGBoost algorithm for loan risk prediction. 9urthermore, researchers have applied the XGBoost classification algorithm, which integrates kernel non-negative matrix decomposition and Bayesian parameter optimization methods, to assess the creditworthiness of P2P loan borrowers.By employing the Bayesian approach to identify parameter combinations that yield the highest classification accuracy, they have enhanced the precision of customer credit evaluation. 10Additionally, researchers have enhanced existing models by leveraging Gradient Boosting and categorical features (CatBoost) to predict loan risk using both classifiers and narrative data. 11Researchers have proposed a method based on bagging supervised autoencoder classifier (BSAC), which leverages the advantages of supervised autoencoder and representation learning in classification. 12Another study introduces a focal-aware cost-sensitive light gradient boosting machine model (LightGBM-focal) specifically designed for credit scoring.The effectiveness of LightGBM-focal is investigated by incorporating two interpretation methods: feature importance scoring and partial dependence plots. 13he aforementioned approaches have demonstrated superior performance compared to benchmark methods on real-world datasets, making valuable contributions to the research and practical application of loan risk prediction.

Stacking algorithm models
][16] Compared to traditional machine learning methods, stacking algorithm models exhibit superior generalization and accuracy when compared to single models or combined models.In recent years, the application of stacking algorithms has expanded across various domains, yielding promising outcomes.For instance, a stacking model has been employed to create user profiles based on gender, grade, and age using behavior log data from a campus network. 17Parameter estimation has been enhanced through the combination of support vector machine (SVM) + superposition models with diverse expert opinions. 180][21] The efficacy of these approaches has been validated through the utilization of diverse datasets from different fields.Researchers have discovered that stacking ensemble methods exhibit superior performance compared to commonly used machine learning methods in bankruptcy prediction models. 22By employing stepwise discriminant analysis to select features, a stacking model for bankruptcy prediction has been developed, surpassing baseline models in terms of performance. 23To enhance the accuracy of default risk prediction, researchers have combined balancing the data with integrating stacking ensemble models and meta-learners. 24oreover, a novel ensemble model based on logistic regression has been proposed, demonstrating statistically significant performance advantages. 25Furthermore, selective ensemble algorithms within the stacking ensemble framework have been introduced by researchers, resulting in improved performance and stability in multi-class prediction for the constructed models. 26urthermore, the stacking algorithm has shown its versatility and effectiveness in addressing complex prediction tasks, benefiting a wide range of industries.Its ability to leverage multiple models and expert opinions enables it to capture and integrate diverse perspectives, resulting in improved predictive performance.The stacking algorithm's robustness and adaptability make it a valuable tool for tackling challenging problems, contributing to advancements in the field of predictive modeling and decision-making.

Deep learning model
][29] The CNN algorithm is widely employed in deep learning as a method for feature extraction from input data.It utilizes convolution and pooling operations at different layers to automatically learn and capture the underlying feature representations of the input data.In recent years, numerous scholars have made significant advancements using CNN in various domains.
For instance, the multi-scale integrated feature fusion convolution neural network (MCFF-CNN) based on residual learning has been utilized to extract vehicle color features.Subsequently, a support vector machine (SVM) classifier is employed to obtain the final color recognition results. 30Furthermore, CNN has been applied to extract features from raw landslide data, which are then inputted into SVM, logistic regression (LR), and random forest (RF) models to evaluate the sensitivity of landslides. 31In the domain of fake news analysis, CNN, long short-term memory (LSTM), and residual network (ResNet) have been employed. 32Moreover, CNN and LSTM have been used for detecting structural defects on the surface of concrete and predicting potential damages. 33Additionally, CNN has been utilized for remote sensing image classification. 34The effectiveness of CNN-based classification has been demonstrated by comparing the performance against alternative methods.Researchers have developed a stock investment management system using technical indicators and deep neural network models.The system exhibits superior performance compared to the market in metrics such as maximum drawdown, Sharpe ratio, and Sortino ratio. 35An enhanced Score-CAM model has been proposed to provide interpretability to deep learning models, thereby increasing their transparency and explainability. 36Furthermore, a risk prediction system for online lending platforms has been introduced, leveraging convolutional neural networks and Stacking ensemble models.This system outperforms single models and other ensemble models in terms of prediction accuracy and recall. 37In the realm of forecasting leasing alternative costs, deep learning models have been applied and demonstrated smaller deviations between predicted opportunity costs and actual values, surpassing existing advanced models. 38Additionally, researchers have conducted a comprehensive analysis of different graph neural network models in the context of stock prediction, paving the way for future advancements in this domain. 39hese research efforts demonstrate significant achievements in utilizing machine learning algorithms, stacking models, and deep learning methods in classification tasks across various domains.However, it is worth noting that all the aforementioned methods are based on supervised learning using complete datasets.In the case of joint loans, the bank's approval results are only partially labeled, while the majority of the data remains unlabeled.This poses a challenge of semi-supervised learning for modeling the joint loan approval status.To address this challenge, deep learning techniques have been employed to establish models using partial datasets, enabling the labeling of previously unlabeled data.Extracted features and loss functions have been utilized to optimize the stacking model, thereby enhancing classification accuracy within the context of semi-supervised learning.

DEEP LEARNING-OPTIMIZED STACKING MODEL
The construction process of the deep learning optimized Stacking model we propose is depicted in Figure 2. In the following, we provide a detailed description of how the risk prediction model for joint loans is constructed using deep learning techniques.

The stacking model of auto finance companies
The Stacking integrated learning approach is based on the idea of training the first layer of classifiers (primary learning layer) with the initial dataset, and using its output as features to generate a new dataset for training the second layer of classifiers (meta-learning layer).In this approach, the original labels are still used as the labels of the new dataset, enabling an overall improvement in prediction accuracy.
Assuming that the classifier at the first layer contains T1 learning algorithms  1 ,  2 , … ,  T1 , and the second layer classifier algorithm is ,the training and prediction process based on the Stacking model is described as follows.
For the data set D = {(x n y n ), n = 1, … , N}, where x n is the feature vector of the nth sample, y n is the label corresponding to the nth sample, the value is 0 or 1, m is the feature dimension contained in each sample, that is, each feature vector is and D k′ is defined as the k-fold test set and training set in the K-fold cross-validation respectively.
Training phase: For the T1 learning algorithm of the first layer classifier, the prediction result for each sample t denotes the classifier obtained by t algorithms trained on the j-th fold data set, and z nt denotes the class probability vector.The prediction results of the training set obtained in the first layer and the original features x n are used as the attribute features of the new training set , where the predicted values of each base learning algorithm z nt are combined with the samples x n and their corresponding labels y n , and the size of the new training set is m × T.
Prediction phase: The test data is tested with the classifier obtained from the first layer of training, and the test results of all classifiers for the test sample x are obtained t (x)∕k denoting the average of the prediction class probability vectors of the t-th algorithm for x.Then, the final prediction results of the output sample x are fed to the already trained secondary classifier.
The newly generated data set is the input data of Stacking layer 2. Stacking is configured in such a way that the training results of the layer 1 algorithm can be fully used in the generalization process of the layer 2 algorithm, and the layer 2 algorithm can find and correct the prediction errors in the layer 1 learning algorithm to improve the accuracy of the model.
The pseudo-code for Stacking-based integrated learning is as follows: ) The Stacking method aims to leverage the strengths of different algorithms by fitting data from multiple angles, thereby allowing each individual model to complement each other.In this study, we compared the performance of different stacking models, including single models, integrated learning models, and different stacking models with various base learning combinations.We ultimately selected the best-performing stacking models with the RF algorithm, ensemble of regression trees (ERT) algorithm, and XGBoost algorithm as the base learning algorithms.The second layer classifier was chosen as the SVM algorithm due to its good performance on each research data set.The specific training process is illustrated in Figure 3.

Bank approval model based on deep learning
After the auto finance company conducts an audit, customers are categorized as either approved or rejected.The information of the approved customers is then sent to the bank for a secondary approval.If the bank also approves the loan, the auto finance company and bank will jointly lend to the customer.However, since we do not know the criteria or methods used by the bank for approval, we aim to build a model that simulates the bank's approval process and tries to align with the bank's decision.The interpretability of the model is not a requirement as we use it to optimize the stacking model and increase the stacking rate.We employ the CNN to extract features for subsequent model training, and the counterfactual enhancement method is used in combination with these features to generate new eigenvalue samples for balancing the dataset.Finally, we use the SVM method for binary classification to categorize customers as either approved or rejected.

CNN to extraction features
where x l j is the input of the j-th neuron in layer l, f is the activation function, x l−1 i is the i-th neuron in layer l-1, k l ij is the corresponding convolution kernel parameter of neuron i and j in the feature graph of layer l and b l j is the bias parameter of the j-th neuron in the feature graph of layer l.The CNN model generates corresponding features from the data through the convolution layer, and then enters the pooling layer for feature reprocessing to generate features with spatial invariant and multi-scale characteristics.The feature extraction process of CNN is shown in Figure 4:

3.2.2
Balanced sampling based on CFA The ratio of customers rejected by the bank is approximately 25% of all customers sent for secondary approval.The proportion of positive to negative samples is about 4:1, with approved customers serving as positive samples.Although negative samples are relatively rare, they require the same attention as positive samples during model training to enhance the model's performance.Algorithms related to dealing with class imbalance mainly include threshold shift (TM), 40 repeated edit nearest neighbor (RENN), 41 one-sided selection (OSS), 42 cost-sensitive learning (CSL), 43 synthetic minority oversampling technique (SMOTE), 44 SVM-SMOTE, 45  class imbalance problem have proven to be effective in data augmentation for deep learning models by generating new synthetic data points to create labeled datasets with better performance. 46Counterfactual augmentation (CFA) method based on features to generate synthetic data points in the minority class, adaptively construct minority class samples with new feature values from matching and difference features, and proved its superior performance in the binary classification model. 47o meet the feature requirements extracted by the CNN in the initial stage and achieve the final establishment of the binary classification model, it is reasonable to adopt the CFA method for sample balancing.The process of the CFA method is illustrated in Figure 5.
cf (x,p) are initially computed over the whole dataset T, identifying combinations of counterfactually-paired instances, x from the majority class and, p from the minority class.
There are three main steps for CFA to construct minority samples: Calculate the CF set cf (x,p) of the data set, and calculate cf (x,p) to generate the instance p of the minority class for the instances of the majority class pairing.
For the unpaired majority class instance x', use KNN to find the nearest matching instance neighbor x, and participate in the local counterfactual pair cf (x,p).
Transfer the eigenvalues from p and x' to p', and synthesize a new counterfactual instance p' based on matching features and difference features.

SVM classification
Convolution neural network finally uses BP neural network as classifier in the fully connected layer, but BP neural network has the problem of poor generalization ability. 48SVM method is used to construct the classification hyperplane, which has the advantages of global optimization, fast convergence and strong generalization ability. 49Therefore, after extracting features by CNN and balanced sampling by CFA, the SVM method is used to predict the binary classification results of the labeled data set.Finally, the training flow of our simulated bank approval prediction model is shown in Figure 6.After the model is trained, we can obtain the classification results fed into the bank customers as well as the features extracted by the CNN.The classification results and these features can be input into the stacking model as the direction and basis for the optimization of the stacking model.

Optimizing the stacking model
The performance ceiling for machine learning is determined by the data and features, while models and algorithms can only approach this limit.At the data feature level, we integrate the features extracted by CNN with the features selected from the original data into a new feature subset.We then use the feature selection method to select appropriate input model features.Unapproved customers of an auto finance company, fed into the simulated bank approval model, will generate a new batch of approval results.We utilize these results by adding coefficients to the model loss function.

Feature combination and selection
To address the issues of poor generalization and robustness of traditional models, we augment the feature set of the original data with features extracted by CNN.By combining and selecting features, we construct various feature subsets for training the stacking model to improve its generalization and robustness.
In this paper, we propose a simple voting selector that integrates three different feature selection methods: Pearson correlation-based filtering, unsupervised variance inflation factor-based method, and recursive feature elimination-based wrapper method.Each feature recording method assigns a value of 1 to keep the feature or 0 to discard it.If the average of the three methods is greater than the 0.5 vote threshold, we keep the feature, that is, we retain the feature only if at least two methods choose to keep it.Different feature subsets exhibit varying algorithm performance in the same algorithm, serving as an evaluation criterion for feature selection. 50The grid search method is employed to adjust the parameter thresholds of the three feature selection methods, generating various feature combination subsets in the voting selector.Logistic regression algorithm's classification performance is used as the evaluation standard for feature selection, leading to the final feature selection.The specific process is illustrated in Figure 7.

Loss function optimization
The deep learning-based simulated bank approval model predicts loan applications for all customers, and the result is denoted as predict1.The trained stacking model produces a prediction result denoted as predict2.By comparing the two predictions for a given customer, we can determine if the classification is correct.If the predictions are consistent, we reward the corresponding loss function coefficient in the model by multiplying it with .On the other hand, if the predictions are inconsistent, indicating a classification error, we punish the model by multiplying the loss function coefficient with .
F I G U R E 7 Selection of features.
F I G U R E 8 Support vector machine (SVM) distance.
The loss function of the original model is denoted as "loss," and  and  are scaling factors.To construct a new loss function, a coefficient is added to the original loss function based on the comparison of customer prediction results between the stacking model and the bank approval model.The goal is to select  and  values that maximize the joint loan approval rate.The specific form is as in Equation ( 2): We conducted some work on  and  values.First, we fixed the value of  and let  = 1/.The value range of  is (1, 2) based on the performance on the dataset.However, a single value of  and  may not be suitable for the actual situation.The stacking model and the bank approval model are ultimately constructed with SVM hyperplane and binary classification of data.Therefore, the distance between each data point and the constructed hyperplane exists in both models, as shown in Figure 8.
For each data point, if the two models classify it into the same side of the SVM hyperplane, we calculate the distances d1 and d2.A large distance indicates high classification accuracy, and the corresponding loss function can be greatly reduced.Conversely, if the two models classify the data point into different sides of the SVM hyperplane, we calculate the distances d1 and d3.A large distance sum indicates that the data point is significantly different in the two models, and the corresponding loss function needs to be greatly adjusted to reflect the difference.We set specific values as in Equations ( 3) and ( 4): Where d is the sum of the distances of the data points in the two models, dmax and dmin are the maximum and minimum values of the sum of all distances, and  is the error parameter.The results of subsequent experimental data sets show that this setting method is better than the setting method of fixed single value in improving the performance of the model.
To sum up, we first established a stacking model, and then used CNN + CFA + SVM to build a deep learning model for unbalanced data sets.The deep learning model provides new mining features for the stacking model, labels the data without labels, and adds different coefficients to the loss function.We optimized the stacking model through these three ways, and finally obtained the expected stacking prediction model with high joint loan approval rate.Evaluation indexes: Accuracy, Precision, Recall and AUC value were used to evaluate the classification performance of the model.

4.2
The stacking model of auto finance companies

Data set
We have created an experimental dataset for Chery Huiyin Auto Finance Co., Ltd. based on actual loans issued by the company.As part of the joint loan process, customers first apply for auto loans from Chery Huiyin Auto Finance Co., Ltd.The company then assesses the customers' multidimensional information, including credit information, application information, and channel information, to determine whether to approve or reject the loan application.If the loan application is approved, Chery Huiyin Auto Finance Co., Ltd. will push the application to the bank.If the bank also approves the loan application, Chery Huiyin Auto Finance Co., Ltd. and the bank will jointly issue the loan to the customer.In the second quarter of 2021, a total of 120,672 customers applied for loans from Chery Huiyin Auto Finance Co., Ltd., and a total of 163 multidimensional information variables were obtained from the company's database.The partial data are shown in Table 1.

Feature selection
First, we removed some irrelevant or redundant variables such as customer ID, customer name, repayment bank card number, and so on.Then, we set a threshold to handle missing data, where columns with missing values exceeding 20% were deleted, and customers with missing values were also removed.If a variable's majority of observations were the same, then the feature or input variable cannot be used to differentiate the target, and the variable was also removed.After this simple data preprocessing, we obtained 100,146 remaining customers and 82 variables.Among the remaining variables, there were non-numerical categorical variables, which were analyzed and transformed into appropriate numerical variables.The data are shown in Table 2.
The preliminary approval results for customers selected by the auto finance company were divided into two categories: approved and rejected.We used qualified samples as positive samples and disqualified samples as negative samples, obtaining 25,383 positive samples and 74,763 negative samples.The original data was divided into three datasets to prevent model overfitting.Samples were divided into training, validation, and testing sets in a ratio of 6:2:2.The training set, which consisted of 60% of the data, was used for model parameter adjustment.The validation set, which consisted of 20% of the data, was used for manual parameter adjustment.The testing set, which also consisted of 20% of the data, was used to evaluate the performance of the model.The specific data are shown in Table 3.

Experimental results and analysis
We use verification set to adjust the parameters of the model and make preliminary evaluation of the model capability, and use test set to evaluate the generalization ability of the final model.The stacking model is compared with a single model (SVM, KNN, Logistic regression) and an integrated model (RF, ERT, XGBoost).Table 2 shows the performance of stacking models for a single model (SVM, KNN, Logistic regression), integrated models (RF, ERT, XGBoost), and different base learners.The specific introduction of the model is provided in the Appendix A. The Stacking1 model was composed of RF, ERT and, XGBoost integration models, while the Stacking2 model was composed of SVM, KNN, and LR single models.The Stacking3 model consists of RF, XGBoost, and LR.The last meta-learner of the Stacking model is a SVM classifier.SVM algorithm has shown excellent performance and generalization capability on our high-dimensional and nonlinear application data.In our application scenario, real-time demand is not important.We put the improvement of the approval rate of joint loans above computational complexity.Therefore, we chose SVM as the meta learner for the last layer of our stacked model.
According to Table 4, the integrated learning model outperforms the single learning model in all four indicators.Among the stacking models, the stacking1 model that combines three integrated learning methods demonstrates the best performance in terms of accuracy, recall, and AUC.On the other hand, the stacking3 model, which combines the RF, XGBoost, and LR methods, exhibits the highest precision.In summary, RF, ERT, and XGBoost methods were chosen as the fundamental learning tools to construct the stacking model on our dataset, aligning with the methodology discussed in the previous 3.1 section.

4.3
Bank approval model

Data set
In the second quarter of 2021, 25,383 customers were approved by Chery Huiyin Auto Finance Co., Ltd. and sent to banks for secondary review.The data are shown in Table 5.Among them, customers approved by the bank were considered as positive samples, while those rejected were considered as negative samples.We obtained 19,125 positive samples and 6258 negative samples.Samples were divided into training, validation, and testing sets in a ratio of 6:2:2.The specific data are shown in Table 6.

Model prediction result
The CNN extracted 30 feature variables from the model for subsequent training, including loan total (loan_total), bank loan total (bank_loan_total), annual income (year_income), job type (job_type), education level (edu_level), marital status (mar_status), and so forth.The data are shown in Table 7.
As shown in Table 8, it can be observed that after selecting the CFA method for category balancing, the classification performance indicators on the test set exhibit the best performance.This demonstrates the superiority of the CFA method in achieving sample balance on this dataset.Therefore, it is reasonable and effective to adopt the combination method of CNN + CFA + SVM for constructing the simulated bank approval model.This finding aligns with the methodology discussed in the previous 3.2 section.

4.4
Optimized joint loan approval model

Combination and selection of features
In the second quarter of 2021, there were 74,763 samples that were not submitted to the bank for approval.These customers were input into the simulated bank approval model, and the approval and rejection results were obtained.These results serve as new approval labels for each customer in the stacking model.Our stacking model was then retrained based on these new features and newly generated labels.
Here we define the customer label approved by both the automotive finance company and the bank as 1, which serves as a positive sample and the others as a negative sample.This is based on the target requirements of our application scenario, and the adoption of more joint loans will bring more profits, as discussed in our first chapter.At the default level, customers approved on two different reviews have a relatively lower probability of default and are worthy of lending.
Convolution neural networks was utilized to extract 30 features for model construction, including 6 composite features.A total of 32 features were selected to construct the stacked model.After removing duplicate features, a set of 38 features was retained.Three methods were combined with a voting selector to select these 38 features, and based on the performance of logistic regression, 31 features were kept.These 31 features include 26 selected basic features and 5 composite features extracted by CNN.This approach aligns with the methodologies discussed in the previous 3.3 section.The data are shown in Table 9.

Coefficient selection of loss function
We use different loss function coefficients to conduct 5-fold cross validation, and determine the final loss function coefficients by comparing the average final approval rate of joint loans.The specific data are shown in Table 10.
From the table, it can be observed that by utilizing the distances from the data points to the SVM hyperplane we defined, the model constructed using coefficients  and  exhibits the best performance in terms of the joint loan approval rate.This outcome demonstrates the effectiveness of the method we discussed in previous Section 3.3, which involves increasing the coefficients of the loss function.

Results of optimization model
To demonstrate the effectiveness of our proposed deep learning optimized joint loan stacking model, we conducted a comparison between the approval rates of simulated joint loans in the Q2 2021 data sets using the non-optimized stacking model and the optimized stacking model.Furthermore, we evaluated the actual co-loan approval rates in the Q2 2021 data set and the Q3 2022 data set after applying our optimized model.Specific data can be found in Table 11.
As can be seen from the table, in the simulated approval model for the data in the Q2 2021, the optimized stacking model compares with the not optimized stacking model, and the joint loan approval rate increases by 8.35%.When the optimized stacking model was applied to the user joint loan approval in the Q3 2022, 34,386 of 133,160 customers were approved by the auto finance company, with an approval rate of 25.82%.After it was submitted to the bank for approval, 28,134 customers were approved, and the approval rate of joint loan was 81.82%.Compared with the actual joint loan approval rate of 75.56% in the Q2 2021 in the unused model, the optimized stacking model increased the joint loan approval rate by 6.26%, resulting in good economic benefits.

F I G U R E 1
Credit in China since 2000.

F I G U R E 2
Model construction flow chart.
and hybrid combining the above methods Sum algorithm.Several techniques to address the F I G U R E 4 Convolution neural networks extraction features.

F I G U R E 5
Counterfactual augmentation method.F I G U R E 6 Bank model training process.
Experimental environment: This paper uses Keras 2.3.0, a deep learning framework based on CUDA 10.0, to build a network model.The experiments were conducted on DDR4 32GB memory, 3.6GHz i7-7700 Intel(R) Core(TM) CPU, and NVIDIA GeForce GTX 1080 Ti on Ubuntu 18.04 LTS.
N} Layer 1 learning algorithm:  1 ,  2 , … ,  T1 Layer 2 learning algorithm:  Step 1: Divide the data into K mutually exclusive subsets of approximately equal size: D 1 , D 2 , … D k Step 2: Training of the first layer of K-base learners The output of T base classifiers for all training samples is expressed as h Convolutional neural network is a powerful deep learning technique that has been widely used in pattern recognition and image processing with proven excellent results.Furthermore, CNN has demonstrated remarkable efficacy in feature extraction.By utilizing various convolution kernels, it can extract local features from data and fuse them into higher-level global features, thereby effectively extracting data features.The exceptional performance of CNN in feature extraction has been demonstrated in various datasets reported in the literature.The convolution operation formula of CNN is as in Equation (1).
Description of the dataset (a part).Data set characteristics.Auto finance company data set.
Classification model performance.Bold values represent the best performance of the model in performance metrics.Data set characteristics.Classification model performance. Note: Balanced sampling method performance.
Joint loan approval rate.This paper proposes a novel deep learning optimized stacking model for joint lending, which is applied to loan application data from auto finance companies.After data cleaning and feature selection, the selected data sets are input into the Stacking model constructed by RF, ERT, and XGBoost for training.The final Stacking model is obtained by SVM classification, which shows the best performance.Furthermore, the approved data of auto finance companies are input into CNN + CFA + SVM for training, and the bank approval model with the best classification performance is obtained by extracting features from CNN and CFA balanced samples.The selected features from CNN are reserved in the Stacking model, while the unapproved data are placed in the simulated bank approval model.The loss function is corrected for the correctly and incorrectly classified samples to retrain the Stacking model.The verification on the dataset proves that the features extracted from the simulated bank approval model and the loss function correction can improve the co-approval rate of loans, leading to significant economic benefits.Payload capacity of the vehicle purchased with the loan.consultant_360_orate_stdThestandard of the default rate for the customers handled by the sales advisor in the last 360 days.dealer_90_prepay_maxThemaximumdownpaymentamount from dealers in the past 90 days.dealer_60_loan_stdThestandarddeviation of loan amount from dealers in the past 60 days.The annual income of the applicant divided by the number of loan terms.address_dealer_addressConsistency between the customer's address and the dealer's address.affiliated_unit_vehicle_type Intersection characteristics of vehicle affiliation units and vehicle types.vehicle_type_dealer Intersection characteristics of vehicle type and dealer.consultant_vehicle_price Intersection characteristics of consultant and vehicle prices.job_type_vehicle_price Intersection characteristics of job types and vehicle prices.dealer_15_prepay_std The standard deviation of down payment from dealers in the past 15 days.affiliated_unit The affiliated unit applying for the vehicle.