Comprehensive evaluation of distribution network project cost management level based on optimized random forest

In this article, by analyzing the content and characteristics of the cost management of the distribution network project, the evaluation index system of the cost management level is formulated first. Gini index in random forest algorithm and analytic hierarchy process are used to analyze the influencing factors of the cost management level of distribution network project from quantitative and qualitative aspects. The results show that design change and on‐site visa management have a great impact on management level, while project advance payment management has a relatively small impact. Based on the above analysis results, this article simplifies the indicators, so as to optimize the random forest model. After verification, the optimized model can improve the efficiency of evaluation, on the other hand, it can effectively improve the accuracy of evaluation, ensure the scientificity of evaluation, and provide reference for improving the cost management of distribution network project.

by using the optimized random forest algorithm for comprehensive evaluation, so as to provide guidance for the cost management.
At present, the evaluation of project management mainly uses analytic hierarchy process and fuzzy comprehensive evaluation. According to the content of cost management of power transmission and transformation project, Ling et al. built the evaluation index system of cost control management, and then used the traditional fuzzy analytic hierarchy process to evaluate the management level. 1 Zhihong et al. analyzed the influencing factors of the cost of power transmission and transformation project in the early stage, constructed a three-level index system, and used multilevel fuzzy comprehensive evaluation method for instance application and analysis. 2 Xinmin sorted out the current cost management process of power grid projects, compared and analyzed the deviation rate and balance rate between the investment plan and the budget estimate of the completed power grid projects, and analyzed the main influencing factors of cost deviation from the perspective of itemized costs. He also believes that the evaluation system of cost management effectiveness contains not only quantitative factors, but also many qualitative factors, and uses multilevel fuzzy comprehensive evaluation method for instance application and analysis. 3 The above research is mainly based on fuzzy theory, and there is no research on the cost management of distribution network project. In addition, there are many qualitative indicators in the index system of the above research, which are difficult to quantify, and the scientific evaluation needs to be improved. Fuzzy comprehensive evaluation method is a comprehensive evaluation method based on analytic hierarchy process, which needs to rely on the judgment of experts. When there are too many indicators, it is easy to make evaluation errors. 4 Some scholars have improved the traditional evaluation method. Li et al. believe that traditional evaluation methods have shortcomings in quantitative evaluation and uncertainty handling. They presents a method to evaluate the risk of underground gas explosion in coal mine by combining fuzzy logic theory with Bayesian network. 5 Lu et al combined with the current situation of capital transfer of power transmission and transformation project, and based on the identification of key points of capital transfer in different construction stages of power transmission and transformation project, innovatively put forward the risk index system of capital transfer, further determined the index weight by AHP, and then constructed the evaluation model by using fuzzy TOPSIS method to realize the effective assessment of the risk of capital transfer. 6 However, the calculation process of the improved method is more complex and the evaluation efficiency is low.
In recent years, the application of intelligent algorithms is more and more common, such as SVM algorithm, KNN algorithm, neural network, and random forest. Random forest is a data mining method developed by Breiman. 7 It is a modern classification and regression technology, with good anti-noise ability, high efficiency, high accuracy, and good application effect in remote sensing, genes, ecology and other fields. 8,9 In this article, indicators that can be fully quantified are first proposed. After that, this article makes a comprehensive analysis of the index from quantitative and qualitative aspects by using Gini index in random forest algorithm and analytic hierarchy process. It does not rely solely on the judgment of experts to ensure the scientific selection of the index. Finally, the index is simplified according to the impact degree, and then the characteristics of the traditional random forest model are optimized, which can effectively improve the efficiency and correctness of the evaluation, make up for the shortcomings of the existing research methods, and provide an effective method for the cost management evaluation of distribution network project.

Content of cost management
The construction process of the distribution network project mainly includes four stages: decision-making stage, design stage, construction implementation stage, and completion stage. Among them, the decision-making stage mainly includes the project preparation stage, the feasibility study approval stage. The design stage mainly refers to the drawing of construction drawings and the preparation of construction drawing budget. The construction implementation stage mainly includes the construction of the project and the payment of related funds. The final stage is completion acceptance.
In general, the overall investment of distribution network project is larger, but there are many single projects, the construction period is short, and the construction management is more difficult, which brings great difficulties to the cost management. The main work of distribution network project cost management includes: feasibility estimation, preliminary design budget, construction drawing budget, bidding price limit, completion settlement and final settlement and so forth. 10,11 2.1.2 Index system of cost management level Based on the principle of analytic hierarchy process, according to the characteristics and content of the whole process cost management of distribution network project, the evaluation index system of cost management level is formulated, and the corresponding quantitative index of the secondary index is proposed, as shown in Table 1. The primary index in the index system is determined according to the cost management stage, and the secondary index is determined according to the specific cost management content of each stage, such as feasibility evaluation, preliminary design budget and so forth. 12 In the decision-making stage, the integration of feasibility preliminary design and feasibility estimation are mainly managed to ensure the correctness of decision-making. The design stage mainly includes the preliminary design and construction drawing design of the corresponding budget management, but also includes the preparation of the bill of quantities for bidding and bidding price limit management. In the construction implementation stage, it mainly manages the advance payment, progress payment and other funds in the construction, the on-site construction quantity management, and the management of changes and visas due to certain reasons. Completion stage mainly includes completion settlement management and financial final account management.
Among them, the application rate of initial integration refers to the proportion of the number of projects applying initial integration design to the total number of projects. The accuracy of feasibility estimate refers to the proportion of change of feasibility estimate from completion settlement. The accuracy of preliminary design estimate refers to the proportion of the change of preliminary design estimate from completion settlement. The accuracy of construction drawing TA B L E 1 Index system of cost management level budget refers to the proportion of changes in construction drawing budget compared with completion settlement. The compilation accuracy rate refers to the accuracy rate of the evaluated bill of quantities. The standard rate of advance payment refers to the proportion of the total amount of works paid in full and on time in accordance with the contract. The standard rate of progress payment refers to the proportion of the total number of projects that pay progress payment in full and on time according to the contract. The change rate of project quantity refers to the mean value of the proportion of the change of the settlement project quantity and the bidding project quantity of each sub-project. The adjustment ratio of change visa fee refers to the proportion of change visa fee to the total cost of completion settlement. Completion settlement accuracy refers to the proportion of completion settlement changes compared with financial final accounts. The accuracy of financial final accounts refers to the proportion of changes in financial final accounts compared with the final audited amount.

2.2
The research methods

Analytic hierarchy process
The analytic hierarchy process is especially suitable for complex problems that are difficult to analyze with quantitative indicators. In this method, the factors in complex problems are divided into ordered layers which are related to each other, and the relative importance of each level is expressed quantitatively according to the fuzzy judgment of objective reality, and the coefficients of the relative importance of all elements are determined by mathematical methods. 13,14 The calculation process is as follows: (1) Determine evaluation factors.
where u is the various indicators, p is the amount of indicators. (2) Construct judgment matrix. Generally, the scale method of 1-9 is adopted. The judgment matrix S = ( u ij ) p×p is obtained by comparing two by two. u ij represents the importance scale of evaluation index i relative to evaluation index j. The judgment matrix is calculated by an expert team consisting of 10 technical experts and 10 cost management experts after judgment.
(3) Calculate weight and verify. Calculate the maximum eigenroot of the judgment matrix and its corresponding eigenvector W, which is the weight of each evaluation factor.

Random forest algorithm
Breiman proposed the random forest algorithm in 2001, the base classifier of random forest is decision tree, which mainly generates many decision trees by bagging sampling, and the result is jointly determined by multiple decision trees. 15 This method can avoid the overfitting problem, has good anti-noise ability, and runs in parallel, with high efficiency and improved accuracy. The main steps of the algorithm include sampling set, feature selection, decision tree establishment, and final decision making. The calculation process is as follows: 1. Sampling set Assume that there are N samples in total in the original sample set, N samples are extracted from the original sample set by bootstrap aggregating in each round to get a training set of size N. There are k rounds of extraction, then the training sets extracted in each round are T 1 , T 2 , … , T k , respectively.

Feature selection
If there are D features in the feature space, in the process of generating the decision tree in each round, d features are randomly selected from the D features to form a new feature set.

Establish decision tree
A decision tree is generated by using a new set of features. The decision trees are independent of each other.

Final decision
The random forest algorithm does not need to prune the decision tree, the final result uses all decision tree votes to determine the final classification result. The main parameters and explanations are shown in Table 2.

Importance assessment
Based on the established evaluation indexes of cost management level, 11 quantitative indicators and evaluation grades are taken as the research object. According to the statistical information of previous projects and the existing evaluation results, the value of each index can be calculated, and the evaluation results of the cost management level of each project can be obtained. Data of 150 projects were obtained for analysis and evaluation of the study. The evaluation level is divided into four categories: excellent, fair, poor and very poor, which are denoted by 1, 2, 3, and 4, respectively. The model was trained on 120 items of data composed of quantitative indicators and evaluation levels, and 30 items of data were evaluated and analyzed. The Gini index is the probability that a randomly selected sample in the sample set is misclassified. 16 It is an algorithm for the construction of random forest. Using Gini index as the judgment standard cannot only ensure the accuracy of classification but also speed up the calculation. So Gini index was used to evaluate the importance of 11 features in the data set, in order to determine the contribution and influence degree of different indicators to the evaluation result. The specific calculation process is as follows. 17,18 The importance assessment is denoted by VIM. Suppose the dataset D has J features {X 1 , X 2 , … , X J }, I decision trees, and C categories.
The Gini index of node q of the ith tree is calculated as follows: where p (i) qc is the proportion of category c in node q•. The importance of feature X j is the change of Gini index before and after node q branches.
where Gini (i) l and Gini (i) r , respectively represent the Gini index of the two new nodes.
If the node where feature X j appears in the ith tree is set Q, then the importance of the variable X j in the ith tree is: The total importance of I decision tree of feature X j is as follows.
Finally, all importance scores were normalized to obtain the importance evaluation results of each feature.
The evaluation results show that the change of visa fee adjustment ratio is the most important, followed by the application rate of integration, which are shown in Table 3. It fully shows that the design change and site visa management in the construction stage and the integrated management of the preliminary feasibility study and the management of the feasibility study estimate in the decision-making stage are particularly important, and the above aspects should be paid attention to when improving the management level of the cost.

Analytic hierarchy process analysis
In this article, the analytic hierarchy process method is used to analyze 11 indicators. The results show that the comprehensive weight of integrated management of feasibility study and preliminary design, project advance payment management and progress payment management is the lowest, and the comprehensive weight of engineering quantity management, design change and site visa management is the highest, indicating that the integrated management of feasibility study and preliminary design, project advance payment management, and progress payment management have relatively little influence on the level of cost management. The results are shown in Table 4.

Evaluation results
Through importance assessment, the standard rate of advance payment is only 1.84%, the lowest among all indicators. By using the analytic hierarchy process, the comprehensive weight of the project advance payment management is 0.03, which is also the lowest. Therefore, it can be judged that the project advance payment management has little impact on the overall cost management level, and this indicator can be removed. The random forest features were optimized from 11 to 10, and the random forest model is re-established to evaluate and analyze 30 items of data. The evaluation results are shown in Figure 1, and the evaluation accuracy reaches 100%. In this article, random forest algorithm, SVM algorithm and KNN algorithm are used to comprehensively evaluate the cost management level. The results of training and evaluation are shown in the Table 5. Compared with the four algorithms, the accuracy of training results and evaluation results of the optimized random forest algorithm are the highest, both reaching 100%, which fully indicates that the use of the optimized random forest algorithm in this article to comprehensively evaluate the cost management level is scientific and accurate. In this article, we first propose the secondary index of evaluation that can be quantified, which makes up for the problem that the index is difficult to quantify in previous studies. In the early stage, it takes a certain amount of time and cost to use Gini index and analytic hierarchy process to comprehensively analyze the indicators from both quantitative and qualitative aspects, so as to make the selection of indicators more scientific. The evaluation process and results are based on quantitative data, which effectively overcomes the shortcoming of subjectivity caused by traditional expert evaluation and solves the problem of confusion of evaluation results. Then the index is simplified according to the degree of influence, and the random forest features were optimized from 11 to 10. The amount of data is reduced, and the computational efficiency is relatively high. At the same time, the accuracy of evaluation is effectively improved because the random forest algorithm has strong anti-interference ability and anti-overfitting ability. On the whole, it has certain advantages.

CONCLUSION
In this article, in order to evaluate the cost management level of distribution network project, the evaluation index system of cost management level is first formulated according to the cost management work of each stage. The primary index includes management level in decision-making stage, design stage, construction implementation stage, and completion stage. The secondary index includes 11 indicators, such as integrated management of feasibility study and preliminary design and management of feasibility estimation, then the corresponding quantitative index is put forward. The analytic hierarchy process and random forest algorithm are used to analyze the influencing factors. The analysis results show that the design change and site visa management have a great impact on the management level, while the project advance payment management has a relatively small impact. Therefore, in order to improve the management level, we should focus on design change and on-site visa management. Based on the analysis results of influencing factors, the random forest algorithm was optimized and the model was re-established. Through optimization, the evaluation accuracy reached 100%, which fully demonstrated the scientificity and effectiveness of the research and provided reference and guidance for the evaluation and improvement of the cost management level.

CONFLICT OF INTEREST
There is no potential competing interest in our article.

DATA AVAILABILITY STATEMENT
The data that support the findings of this study are available from the corresponding author upon reasonable request.