Management and development performance assessment for electric distribution company based on data mining

: In this study, the Statistical Product and Service Solutions software is applied to analyse the massive data of electric distribution companies. A comprehensive evaluation of grid development and production and operation of basic electric distribution companies is the key to a company's investment and development strategies. This study proposes a comprehensive evaluation index system for electric distribution companies. In the method, the weight of each index is calculated using the improved analytic hierarchy process based on the Delphi method. Then, according to the actual operation situation of each enterprise, the differential weight of various indices is constructed, and the comprehensive evaluation and score of differentiation for electric distribution companies are realised, which can be used for locating the weak line of the power grid of each enterprise and putting forward an investment strategy of the power grid. Through the demonstration application of 98 electric distribution companies in Shanxi Province of China, this method exhibits a promotion of value and accuracy in carrying out a comprehensive evaluation for electric distribution companies.


Introduction
Electric distribution companies are directly responsible for the operation and management of distribution networks. For a long time, influenced by various factors such as management system, personnel structure, equipment level, operation principle, and regional differences, electric distribution companies have been lagging in distribution network construction and management, planning, and operations. Therefore, a comprehensive evaluation based on electric distribution companies is urgently needed. Through the evaluation method, all aspects of production, operation, and power grid development can be evaluated and the weakness can be located, which provide a foundation for distribution companies to formulate investment strategies and improve investment efficiency.
To achieve a comprehensive evaluation of the power grid, most traditional evaluation methods [1,2] are based on the actual operation and development indicators of the grid, using the entropy weight method and dynamic comprehensive evaluation method to determine the weight of each index and carry out index scoring. A previous study [3] used the analytic hierarchy process (AHP) to carry out a comprehensive evaluation of the rural low-voltage power grid. In addition, a previous study [4] has proposed a method and a procedure for grid planning evaluation by constructing a multi-dimensional grid evaluation index system. The literatures [5,6] have conducted a comprehensive evaluation of the power grid through data mining methods such as principal component analysis (PCA) and system clustering. Most studies have described the methods and processes for comprehensive evaluation of power grids, but lack the multi-level evaluation for the development of the power grid, production, and operation, especially the diversity of the power grid.
This paper uses Statistical Product and Service Solutions (SPSS) [7,8], through PCA, K-means clustering, correlation analysis, and systematic clustering, to reduce the dimension of the massive data and indicators of electric distribution companies and to achieve enterprise classification. Then, Specific, Measurable, Attainable, Relevant, Trackable (SMART) guidelines are used to construct the evaluation indicator system for basic electric distribution companies, and all indicators are normalised using different standardised strategies. On this basis, an improved AHP method based on the Delphi method is used to calculate the index weights, and based on the actual situation of each company, different weights are proposed. Finally, a comprehensive evaluation of basic electric distribution companies is achieved. The differentiated weights can better reflect the actual situation and characteristics of the power grid in each company, which conforms to the nature and rigorousness of evaluation and improves the application value and theoretical value.

Data mining
Data mining, also known as knowledge discovery in the database, is a complex process that extracts knowledge of unknown, valuable patterns or laws from large amounts of data. Data mining mainly includes seven steps: data cleansing, data integration, data selection, data transformation, data mining, pattern assessment, and knowledge representation. Common data mining methods include mathematical statistics, neural networks, genetic algorithms, and decision tree induction [9][10][11][12].
This paper is based on the massive data of 98 basic electric distribution companies in Shanxi Province of China, using SPSS software as a data analysis platform, PCA, K-means clustering, and correlation analysis. In-depth data mining and data analysis were performed on 198 grid development, production, and operation indicators, and the number ultimately reduced to 40 indicators. The specific mining process is shown in Fig. 1.
Through deep data mining and analysis, comprehensive utilisation of K-means clustering and PCA, the 98 electric distribution companies in Shanxi Province is divided into five categories. The clustering result fully takes into account the economic development situation of each region and the development of power grids. From the classification results, it is possible to trace back the main component factors of each enterprise's decision categories, thereby positioning the weaknesses of each company's power grid development.

Evaluation index system construction
The evaluation index system constructed in this study uses SMART criteria to ensure that the indicator system covers all aspects and processes of the grid development and production operations of J. Eng., 2018, Vol. 2018 Iss. 17, pp. 1909-1914 This is an open access article published by the IET under the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0/) basic electric distribution companies. It can fully reflect the socioeconomic, equipment operation, power grid production, gridconnected power generation, operation and management, and energy saving and emission reduction properties of basic power supply grids. On the basis of in-depth analysis of practical issues, various factors that affect the development of the power grid and production and operation are broken down into several levels. The factors on the same level are subordinated to those on the upper level or have an impact on the superior factors. At the same time, it also controls the factors of the next level or is influenced by the factors of the lower level [13,14]. At the end, it is determined that grid development, production, and operation are the primary targets, and a hierarchical structure model of the electric distribution companies is established, as shown in Fig. 2.

Data standardisation strategy
As there are many types of indicator data for basic electric distribution companies, the goals and requirements of different indicators in the power grid are inconsistent. Therefore, different standardised calculation methods need to be adopted according to the types of indicators. For the indicators involved in the evaluation index system of electric distribution companies, this paper mainly adopts three standardised processing strategies: trapezoidal processing strategy, semi-triangular processing strategy, and segmented processing strategy [15].
(i) Trapezoidal processing strategy: The trapezoidal processing strategy mainly involves 24 indicators such as the load ratio, the average power supply line radius, the N-1 pass rate, the insulation  rate, and the number of power grid accidents. Taking the load ratio as an example, in the 'Technical Guidelines for Planning and Designing Distribution Network', the 110-35 kV load ratio must be controlled between 1.8 and 2.2. First, the load ratio is converted into the normalised value of the percentage system; then, according to the range of the load ratio interval, the trapezoidal processing function of the load ratio is established, as shown in Fig. 3. The specific mapping of the trapezoidal processing function for the load/load ratio is shown in Table 1.
(ii) Semi-triangular processing strategy: The semi-triangular processing strategy mainly involves 12 indicators such as the coverage of electricity information acquisition system, automatic distribution coverage, and smart meter coverage. Taking the coverage rate of the electricity information acquisition system as an example, the actual value of the electricity information collection system coverage is first converted into the standardised value of the percentage system. The coverage rate of the electricity information acquisition system is a positive indicator. We then establish a monotonously increasing half-triangular processing function for the coverage of the power usage information acquisition system, as shown in Fig. 4. The specific mapping of the triangulation function of the coverage of the power consumption information acquisition system is shown in Table 2.
(iii) Segmentation strategy: The segmentation strategy mainly involves the following four indicators: asset-liability ratio, personal injury, and death. Taking personal casualties as an example, the actual value of personal accident casualties is first converted into a standardised value of the percentage system. Personal accident casualties are negative-type indicators and a segmented processing function is established, as shown in Fig. 5. Table 3 shows the specific mapping of the trapezoidal handler function for personal accident casualties.

Benchmark weight calculation
In this study, the Delphi method is used to improve the AHP [16,17], and several expert experience opinions are synthesised to calculate the index weight parameters in the index system. The main process is as follows.
(i) Each of the n experts is asked to score the indicators at each level to form n judgment matrices (ii) Singular values are removed (big difference matrix). Assuming that the kth expert establishes the judgment matrix as  Table 1 Calculation standards for capacity/ratio specifications Function type Actual value range Normalised value trapezoidal processing functions where x is the actual value and f (x) is the normalised value.
it is found that the average of the elements of the contrast matrix of each expert x i j and determine whether x i j k deviates from x i j by more than a predetermined threshold (usually set to 50%). Hence, the expert's contrast matrix that deviates from the threshold element to form n′ new contrast matrix is removed. (iii) Calculate n′ expert weights. The basic steps of the AHP method are used to calculate the index weight coefficient recommended by each expert. (iv) Calculate the comprehensive weight coefficient of the improved AHP.
Summing up the experience of n′ experts, By calculating the indicator of the degree of consistency between the index weights, index S i j , the average degree of consistency of indicator weights S i and the relative degree of conformity S i ′ for each expert is as follows: where θ i j is the angle between the vectors w i and w j . Following are the correct index weights required to improve the index weight vector: Among them, the greater S i j is, the higher is the consistency between the index weight vectors w i and w j of the two experts. The relative degree of consistency S i ′ reflects the degree of consistency with the weights of other expert indicators, i.e. whether it can represent the opinions of most experts and is more representative. Taking the power grid development indicators of 98 countylevel power supply enterprises in Shanxi Province as an example, using the improved AHP method based on the Delphi method, the weights of the 5 secondary and 20 tertiary indicators under the power grid development index are obtained, as shown in Table 4.

Differentiated weight calculation
There are many basic electric distribution companies in Shanxi Province. Due to the different levels of economic development, environment, and climate in various regions, different enterprises have different levels of power grid development. To improve the accuracy of the evaluation method, this paper modifies the benchmark weights, fully combines the characteristics of the development of each region, and presents the calculation results of differential weights for various regions. Taking the correction of the weights of secondary indicators of power grid development as an example, the specific calculation process is described as follows: (i) Build a differentiated scoring matrix: According to the recommendations of the expert group, different categories of the same indicator are scored, with a scoring standard of 1-5. When the indicator has a large weight in a certain category, 5 points are assigned, as shown in Table 5.
(ii) Calculate the index contribution matrix: According to the differential scoring matrix, the scores of the same category of regions are normalised and the contribution of each index to the region is calculated and defined as θ i j , indicating the degree of contribution of the ith index to the jth region, as shown in Table 6. (iii) Calculate the weight correction matrix: According to the index contribution matrix, the difference correction matrix of each index in different categories is calculated by combining the reference weights φ i of each index. The formula is ω i j = φ i × 0.5 + θ i j . The details are shown in Table 7. (iv) Calculate the differential weight: According to the weight correction matrix, the normalisation process can be applied to obtain differential weights applicable to different types of regions, as shown in Table 8.

Applications
Using a comprehensive evaluation method for basic electric distribution companies based on differentiated weights, 98 basic electric distribution companies in Shanxi were evaluated and scored. According to the scores of various indicators, we can intuitively locate the weaknesses of each company's power grid development.
Taking Yangqu county in the first category and Pingshun county in the fourth category as an example, through retrospective analysis of the clustering results, it was found that the fourth type of power supply enterprises is mostly poor areas, the development of power grids is slow, and the power supply capacity still needs to be significantly increased. Therefore, the power supply capacity has the highest weight, i.e. 0.295, which is higher than the reference weight of 0.251 and other categories of differential weights. While the first type of enterprises is mostly in areas where the power grid  Table 3 Calculation standards for personal accident casualties Function type Actual value range Normalised value single increase in triangle processing function Among them, x is the actual value and f (x) is the normalised value.
develops rapidly, the power supply capacity has basically met the power supply demand, but the grid structure still needs to be strengthened. Therefore, the grid structure has the highest weight, i.e. 0.278, which is higher than the benchmark weight of 0.276 and other types of differential weights.
The comprehensive scores of Yangqu County and Pingshun County are shown in Table 9. Fig. 6 is a radar chart of the scores of Yangqu County and Pingshun County, which can clearly locate the weaknesses of the company, and then go back to the specific threelevel indicators, find out the root cause, and guide the investment focus and investment direction of the distribution companies in the     future. Therefore, the scores of Yangqu County and Pingshun County indicate the accuracy and rationality of the method of differential weights, which can reflect the differences between electric distribution companies in various regions to a certain extent, making the evaluation results more accurate.

Conclusion
A comprehensive evaluation of grid development and production and operation of basic electric distribution companies is the key to a company's investment and development strategies. Through a comprehensive evaluation of the electric distribution companies proposed in this paper, it can clearly locate the problems existing in the operation of various enterprises and the development of power grids, identify the direction for the company's next-step investment focus, improve the availability of power grid investments, ensure the efficiency of investment, and further implement the accurate investment.