Identification of the characteristic parameters for gas pipe sections using the maximum likelihood method and least square method

The pipeline efficiency factor (E‐value) and the overall heat transfer coefficient (K‐value) are, respectively, the characteristic parameters that describe the flow and heat transfer behavior of gas pipe sections. These parameters have a significant impact on the accuracy of the thermo‐hydraulic calculations of the gas flow processing. Hence, they should be investigated thoroughly. Due to various complex factors affecting the E‐value and the K‐value in operating the in‐service gas pipe sections, it is often difficult to calculate the two characteristic parameters accurately using physics‐based formulae. Based on quasi‐steady‐state historical operational data sets of the in‐service gas pipe section, the characteristic parameters are identified with the maximum likelihood method and the least square method, respectively. Besides, the characteristic parameters of the gas pipe section are derived from solving the inverse problem of the steady‐state thermo‐hydraulic calculations. Both methods are applied to an in‐service pipeline, and the two parameters are obtained from 144 historical operational data sets. By introducing the two parameters into the thermo‐hydraulic calculations for certain quasi‐steady‐state historical conditions in the same gas pipe section, the maximum relative errors occurred between the calculated results and the measured data are 2.91% and 2.66%, respectively. Additionally, the difference between the two methods is small, with a gap of 0.0037 W/(m2 K) in K‐value and 0.0206 in E‐value. The proposed identification methods are extendable to provide adequate technical support for large‐scale gas pipeline networks.


INTRODUCTION
The determination of the pipeline efficiency factor and the overall heat transfer coefficient are important tasks before the design or simulation of the gas transmission pipeline, to ensure that the result of the process calculation fits that in practice. The pipeline efficiency factor is defined as the ratio of the actual flow to the theoretical flow of a gas pipe section under specified inlet and outlet pressure; where the theoretical flow is calculated based on the general flow equation, and the actual flow is the gas network measured with flowmeter onsite. The inner diameter and the equivalent inner-wall roughness of the gas pipe section in the theoretical flow calculation are both taken as the design values. However, during the operation of the gas pipe section, the inner wall roughness of the pipe varies, liquid or solid deposits, and even the cross-section at some positions of the pipe may be partially blocked due to nonsedimentary reasons, which lead to an error between the theoretical flow and the actual flow. The pipeline efficiency factor is introduced to correct these errors caused by all the above factors. These factors affect the flow process by changing the inner wall roughness of the pipeline from the perspective of the physical mechanism of gas flow. However, the deposition also affects the actual flow by decreasing the equivalent inner diameter of the gas pipe section. In the cases that the deposition decreases the cross-sectional area of the pipe significantly at some positions, the effect on the flow should be reflected through the inner diameter rather than the roughness. In addition, due to nonsedimentary reasons on the flow process, the effect of the partial cross-section blockage in the gas pipe section can also take place through the reduction of equivalent inner diameter. When the actual pressure drop of a gas pipe section is much larger than the theoretical limit, the increase of equivalent roughness may not be the only reason, but the effect of the decrease of the equivalent inner diameter of the pipe should be also considered. The above-mentioned error of the inner diameter or roughness will affect the actual hydraulic friction factor and be reflected by the pipeline efficiency factor. Therefore, the pipeline efficiency factor can be applied to correct the theoretical flow. In addition, the error of the theoretical friction factor causes the error in the theoretical flow through the general flow equation, which should also be corrected by the pipeline efficiency factor, although it is scarcely mentioned in the literature that the pipeline efficiency factor is related to the error of the theoretical friction factor. It can be extracted from the general flow equation and the definition of the pipeline efficiency factor, that the pipeline efficiency factor is in direct proportion to the square root of the ratio of the theoretical friction coefficient and to the actual friction coefficient of the gas pipe section, which directly reflects that the pipeline efficiency factor modifies the theoretical friction factor. Besides, it can be seen from the general flow equation that the effects of the inner wall roughness, depositions in the pipe on the actual flow are reflected by the actual friction factor of the gas pipe section. The equivalent roughness can be estimated from the Colebrook formula as shown in Figure 1, which also reveals the relationship between hydraulic friction factor λ and equivalent roughness k e , that is, k e increases monotonically with respect to λ. (The data of Figure 1 is extracted from the "Case Study" in Section 4.) In other words, the actual friction factor reflects the effect of the variation of the inner wall roughness and the equivalent internal diameter on the actual flow. The overall heat transfer coefficient of the pipeline is the quantity of heat transferred from the natural gas to the surroundings through the pipeline per unit heat transfer surface per unit time per unit temperature difference. 1 The K-value comprehensively reflects the heat transfer behavior of the gas in the pipe with the surroundings, which is a critical parameter for the thermal calculation of the gas pipeline. For buried natural gas pipelines, the heat transfer consists of three parts, namely (1) the heat convection from the natural gas to the inner wall of the pipe, (2) the heat conduction across the pipe wall, and (3) the heat transfer from the outer wall of the pipe to the surroundings. 2,3 Heat conduction and convection from the outer wall to the surroundings is the most essential part of the pipe's heat transfer. For the buried pipeline without insulation, when the gas flow in the pipe is turbulent, the K-value is approximately equal to the heat transfer coefficient from the outer wall of the pipe to the soil. Thus, the K-value of a buried pipe mainly depends on the thermal conductivity of the soil surrounding the gas pipe section. The thermal conductivity of the soil varies with the burial depth and the axial position of the gas pipe section, and the transport of the groundwater in the soil also affects the heat transfer between the gas and the surroundings. Thus, the K-value is hard to be calculated accurately based on the physical mechanism.
The above analyses show that some complex factors affect the E-value and K-value of in-service gas pipe sections, making it difficult to calculate the two characteristic parameters accurately using formulae based on physical mechanisms. Back-calculated methods are commonly used to determine the K-value and E-value. Such back-calculated methods can provide accurate E-value and K-value without analyzing the complex physical mechanisms. Based on the back-calculated methods, in this article, two mathematical models have been established for accurately identifying the two characteristic parameters. The maximum likelihood method (MLM) and the least square method (LSM) are the classic, fundamental, and most widely used system identification methods. The Bayes estimation method has been tried before the two methods were proposed. However, due to a lack of prior probability, it was ceased to be carried out smoothly. Both methods are statistical methods using large-scale samples and can partially eliminate the effect of random bias in the field data.
The historical operational data sets under steady-state or quasi-steady-state operating conditions of the in-service gas pipe sections are selected, in order to avoid the transient calculation and make the problem easier to address. The amount of computation involving only steady thermo-hydraulic calculations is much smaller than that considered in the case of transient conditions. The strictly steady state is hard to reach due to the influence of various internal and external factors. Thus, a small variation of the operating parameters over time is acceptable in the identification of the characteristic parameters of gas pipe sections, and such a state is called the quasi-steady-state. Although the steady general flow equation is applied to the quasi-steady-state flow in the inverse calculation, the results obtained from this approximation method can meet the accuracy requirements of engineering calculations as long as the quasi-steady-state is close enough to the steady-state.
In addition, although the characteristic parameters of gas pipe sections are obtained based on the date of the steady-state or quasi-steady-state operation, they can also be applied in transient simulations; because in the transient operation, the friction terms in the governing equations and the heat transfer terms between the gas and the surrounding environment are the same as in the steady operation. It can be deduced from the definition of the parameters that the E-value and K-value have wide adaptability; such values are obtained from the inverse calculation of the historical operational data sets, in the situations that there are neither significant variations of equivalent roughness, nor in the depositions in the pipe, nor in the heat transfer conditions with the surroundings. Therefore, the identification method to calculate E-and K-values can be used for each gas pipe section. Regardless of the steady-state, the transient simulation or the initial values of the characteristic parameters for gas pipe sections during tuning and calibration of the simulation model, the retrospective simulation of historical conditions, and the predictive simulation for future needs are applicable. Hence, the obtained E-value and K-value of each gas pipe section can be applied in retrospective and predictive, steady, and transient simulations. Besides, they can be applied as the initial value for adjusting or testing the simulation model of the gas pipe section.
In 1985, Changchun Wu 4 proposed a method to back-calculate the K-value of hot oil pipelines using historical operational data sets, and applied the value in the optimization of hot oil pipeline operations, which improved the accuracy of thermo-hydraulic calculations and the reliability of the optimal schemes. Hua Li et al. 5,6 calculated the equivalent roughness of the inner wall of the gas pipeline based on the least square method, which was applied to various sections of the Shaanxi-Beijing pipeline and an accurate result was obtained reflecting the actual condition of the gas pipe section. The method does not applicable to the gas pipe sections, in which the cross-section is decreased significantly, since the calculated roughness might be abnormal in order of magnitude. Colebrook's formula 7-9 is a classic one for calculating the hydraulic friction factor of gas pipelines and is often used as a benchmark for evaluating the accuracy of other formulae, which is most widely used in the hydraulic simulation of gas pipelines. On the other hand, scholars from the UK and Germany 10,11 found inconsistencies between experimental data and the values calculated by the Colebrook formula. They proposed the GERG friction factor formula based on the Colebrook, but the coefficients in this formula must be determined experimentally. All of the above suggest that the pipeline efficiency factor includes modifying the error from the theoretical friction factor. This article is structured as follows: Sections 2.1 and 3.1 present the essential concepts of the MLM and the LSM, respectively. The MLM and the LSM mathematical models, as well as the specific solution methods, are established in Sections 2.2 and 3.2. In Section 4, a case study is provided to demonstrate the performance of the suggested approach, as well as a comparison between the two methods. Finally, conclusions and suggestions are drawn in Section 5.

IDENTIFICATION OF THE CHARACTERISTIC PARAMETERS BY THE MLM
The critical point of the MLM is that one specific event which considered to have the most significant probability if it has already occurred in a stochastic experiment. 12 Given the probability distribution model to be analyzed, the parameters of the model can be determined using sample observations.

Basic idea of the MLM
The general idea of the MLM is to consider the data to be analyzed as the possible values of a random variable. Based on the distribution trend of the sample points, a probability distribution function is selected. Hence, the unknown parameters can be determined according to the principle that the maximum probability of all the events occurs at the same time.
The random variable X corresponds to sample points x i (i = 1 ∼ n) to be analyzed. X approximately follows a normal distribution f ( ; ,σ 2 ) with parameters and σ to be determined, which is denoted by Equation (1).
where, and are the expectation and variation. According to the basic idea of the MLM, the probability that x i (i = 1 ∼ n)happens at the same time is to be maximized, which is described with the likelihood function in Equation (2).
Considering that the maximum point of the objective function in Equation (2) remains after taking the natural logarithm, Equation (1) is transformed into equivalent expression with a log-likelihood function to simplify the solution.
The derivative of the log-likelihood function with respect to the mathematical expectation is equal to zero, that is, lnL( , 2 ) = 0. Thus, the estimation of the mathematical expectation is determined via Equation (4).
Equation (4) can be transformed into the problem of determining the minimal estimation in the mathematical expectation.

Detailed identification procedures of the MLM
The end temperature T Z ' and end pressure p Z ' of a gas pipe section are calculated with the axial steady-flow temperature distribution formula and the nonhorizontal general flow equation, respectively. The error i = (T Z '-T Z ) i or i = (p Z '-p Z ) i and the assumed error follow a normal distribution. Finally, the overall heat transfer coefficient and the pipeline efficiency factor are determined.

2.2.1
The overall heat transfer coefficient Supposing that the inlet temperature of a gas section is T Q (K), the outlet temperature T Z ' (K) can be calculated with Equation (5).
where, T 0 is the temperature of the soil at the burial depth of the gas pipe section, K; L is the length of the gas pipe section, m; J is the throttling coefficient, 3.5 • C/MPa; p Q and p Z are the gas pressure at the inlet and outlet of the gas pipe section, respectively, MPa; K is the overall heat transfer coefficient from gas to the surroundings of the gas pipe section, W/(m 2 • C); D is the outer diameter of the gas pipe section, m; M is the mass flow rate of gas in the gas pipe section, kg/s; and c p is the isobaric specific heat capacity of gas, J/(kg K).
Assuming that the error ε approximately follows a normal distribution, a one-dimensional optimization problem about K-value is obtained and shown in Equation (6).
where, N denotes the amount of the measured data; the subscript i denotes the number of the data. A quadratic interpolation 13,14 is adopted to solve Equation (6), the basic idea of which is a function approximation. The objective function G(K) is approximated by a quadratic trinomial (K) in the proximity of G(K)'s minimum. The functions (K) and G(K) have the same values at interpolation points K 1 < K 2 < K 3 . Assuming that G(K 1 ) > G(K 2 ), G(K 2 ) < G(K 3 ), then the quadratic trinomial (K) can be constructed. Firstly, the minimal point K n is regarded as the estimation of the minimum point of G(K). Then, substituting K n and its two adjacent points into (K), the new G(K) assessment is generated. The sequence of iterating {K n } can be generated in the same manner until |G (K n+1 ) − G (K n )| < or |K n+1 − K n | < is satisfied, where is the given accuracy.

2.2.2
The pipeline efficiency factor According to the K-value obtained in the above Section 2.2.1, the average temperature T pj (K) of the gas pipe section can be calculated. Assuming that the inlet pressure of the gas pipe section is p Q (Pa), then the outlet pressure p Z ' (Pa) is calculated by Equations (7)-(9).
= 2g * ZR a T pj , where, Q is the volumetric flow rate of the gas pipe section at the standard condition (p 0 = 101,325 Pa and T 0 = 293.15 K), m 3 /s; C 0 is a constant, 0.03848(m 2 s K 1/2 )/kg; d is the inner diameter of the gas pipe section, m; is the hydraulic friction factor; n is the number of calculated pipe intervals divided artificially based on the variation of the elevation; Δh is the difference in height between the start and the end of the calculated interval, m; h i is the elevation of the calculated interval, m; L is the length of the gas pipe section, m; L' is the equivalent length considering the variation of elevation along the pipeline, m; Δ * is the relative density of natural gas; Z is the compressibility factor of natural gas; R a is the gas constant of air, 287.1 J/(kg K); is the dynamic viscosity of natural gas, 1.09 × 10 −5 Pa s; and g is the acceleration of gravity, 9.81 m/s 2 .
Supposing that approximately follows a normal distribution, and i = ( , an optimization problem in terms of actual hydraulic friction factor S can be obtained and shown in Equation (10).
The quadratic interpolation method is applied to solve the Equation (10). Then the equivalent roughness is assumed as 30 μm, 15 and Equations (11)-(13) are applied to calculate the pipeline efficiency factor.
where, Q is the average volume flow rate of the gas pipe section at the standard condition, m 3 /s; L is the theoretical hydraulic friction factor; Re is the Reynolds number; E is the pipeline efficiency factor. The identification of K and E is transformed into two one-dimension optimization problems as shown in Equations (6) and (10) based on the MLM. There are two applications of the functions: firstly, for a given value of K or S , the mathematical expectation of calculation error of the outlet pressure and temperature can be evaluated; secondly, the K or S can be optimized to minimize the expectation of the error.

IDENTIFICATION OF THE CHARACTERISTIC PARAMETERS BY THE LSM
The LSM aims to seek a function that is closest to the actual data. The optimal model parameters are determined based on the given data and given observation model to minimize the sum of error squares of the calculated values of data in the LSM .16,17

Basic idea of the LSM
The basic idea of the LSM is focused on minimizing the error square sum between the measured data and the results of the model based on a given regression model, so as to obtain the optimal values of parameters in the model. Assuming that the regression model is f and the measured value is y i , the least square estimation model in terms of parameter is shown in Equation (14).
The minimum valuêof Y ( ) is the least square estimate of . The numerical method for solving Equation (15) depends on the concrete function form of Y( ).

3.2
Detailed identification procedures of the LSM

3.2.1
The overall heat transfer coefficient The regression model by Equation (16) on the overall heat transfer coefficient is built based on the equation of steady-state flow axial temperature distribution in gas pipe sections.
where,K is the least square estimation of parameter K, which satisfies the equation G(K) = min K G(K).

The pipeline efficiency factor
The regression model in Equation (17) on the pipeline efficiency factor is built based on the general flow equation of the nonhorizontal gas pipe section.
where, the least square estimation of parameter S iŝS, which satisfies F Equation (18) can be solved by the quadratic interpolation method, and the subsequent steps to identify the pipeline efficiency factor are the same as that of the maximum likelihood method in Section 2.2.
The LSM and the MLM are both applied to solve the minimum error sum of the field measured data and calculated value by a one-dimensional optimization model. The difference is that the LSM sums the square errors of each data set, while the MLM sums the errors directly. Therefore, the positive and negative errors from the calculated and measured values can cancel each other only in the MLM, but not in the LSM.

CASE STUDY
An actual gas pipe section is taken as a case study to verify the feasibility and reasonability of the proposed method. The overall heat transfer coefficient and the pipeline efficiency factor of the gas pipe section are identified with the MLM and the LSM.

Data sources
The quasi-steady-state is close to the steady-state. 18,19 The data of the days where the difference between daily inlet and outlet flows is less than 5% are selected. It is assumed that the operating state of the gas pipeline is quasi-steady-state in such days. The back-calculated characteristic parameters of the gas pipe section are relatively stable over the selected period and can be regarded as a constant.

Physical parameter of the gas pipe section
The case is calculated using 144 data sets of the gas pipe section measured in the field on April 1, 2020. The parameters of the gas pipe section and surroundings are shown in Table 1. The gas components in the gas pipe section are shown in Table 2.

Identification results
Based on the identification methods described in Sections 2.2 and 3.2, the overall heat transfer coefficient and pipeline efficiency factor of the gas pipe section are identified and shown in Table 3.
The difference of the overall heat transfer coefficients calculated with the MLM and the LSM is 0.0037 W/(m 2 K), while that of pipeline efficiency factors is 0.0206, which shows a slight difference between the two methods. From this case study, the first assessment may make it better to identify the E-value with the LSM and the K-value with the MLM. Nevertheless, we cannot draw such a conclusion because the calculation results that we are seeking can be generated via the algorithm itself. The objective function of the MLM is the sum of errors, whether it is positive or negative. However, the objective function comprises the sum of squares of errors in the LSM, which is always positive. Therefore, we cannot determine the accuracy of the two models by comparing the objective function values, and it is better to use both methods when predicting the characteristic parameters of the gas pipeline section to minimize the errors. On the other hand, the results' accuracy of the identification by one method can be verified by the other one.

Verification of the result
In order to verify the accuracy of the identified results, the outlet pressure and temperature of the gas pipe section are calculated and compared with filed data using the forward calculating method, where the starting pressure, temperature, and flow rate in the field are applied. The forward calculating method is based on the steady-state operation of the pipeline. The gas pipe section is divided into multiple intervals, then the hydro-thermal equation is applied in each interval successively. The detailed steps are as follows. 1. The gas pipe section is divided into several intervals, the length of which is equal to Δx.
2. The temperature and the pressure at the inlet of the first interval are equal to that in the field. 3. Assuming that the average temperature of the interval equals to the inlet temperature, and based on the given compressibility factor of gas and specific heat at constant pressure, the outlet pressure and temperature are calculated using thermo-hydraulic equations of the gas pipeline as shown in Equations (5) and (7)-(9). 4. Identify the temperature and the pressure at the inlet of the latter interval are equal to that at the outlet of the former one. 5. Repeating steps (3) and (4), until the outlet pressure and temperature of the last interval are calculated. Thus, the outlet pressure and temperature at the end of the gas pipe section are obtained.
The 90 sets of measured data on April 7, 2020 of the gas pipe section are applied for verification. The gas pipe section is divided into 50 intervals, the length of which are equal to 1447 m, and the compressibility factor of gas is given as 0.92; which is the average compressibility factor of the pipeline on April 1st, and it is calculated with the BWRS equation using the average temperature and pressure of the pipeline. Assuming that the compressibility factor of April 1st can be similarly applied to that of April 7th. Then the forward calculating method can be applied to verify the accuracy of the identified parameters calculated with the MLM and the LSM.
It is shown in Figure 2 and Table 4 that the relative errors of the calculated outlet pressure and temperature of the gas pipe sections fluctuate a lot but they are all below 3%. And the variation tendency of the relative errors in the two methods is the same. The above two points indicate that the MLM and the LSM are feasible and accurate for the parameter identification of the gas pipe sections. The maximum relative error of the outlet pressure in the MLM is slightly larger than that in the LSM; whereas the maximum relative error of the outlet temperature is the opposite. The  the K-value calculated by the same method must be used in pairs, since the K-value which identified firstly is used for calculating E-value in the next step. The results show a slight difference in the two identification methods and the relative errors are both small, which means that both methods are feasible. The two methods are also applied to verify the results of each other.

CONCLUSIONS AND SUGGESTIONS
The MLM and the LSM are applied to identify the overall heat transfer coefficient and pipeline efficiency factor for the characteristic parameters of the gas pipe section, respectively. The conclusions and suggestions in this research are as follows.
1. It is feasible to use the MLM and LSM to identify the characteristic parameters of the gas pipe section. Based on the principles of maximum likelihood and least square, the mathematical models for determining the overall heat transfer coefficient and the pipeline efficiency factor are established. 2. The accuracy of the parameter identification by the MLM and LSM can both meet the engineering requirement. The characteristic parameters are identified based on the 144 field data sets with the MLM and LSM, respectively. The outlet temperature and pressure of the gas pipe section are calculated with a forward calculation method using the identified parameters. The results show a slight difference between the two methods. The maximum errors between the calculated values and field data are below 3% in the two methods. 3. The results determined from quasi-steady-state which include the initial overall heat transfer coefficient and the pipeline efficiency factor can be applied for both steady simulation and transient simulation of the gas transmission pipeline. It is applicable to adjust and test the transient simulation model of the gas network for more accurate simulated results in terms of pressure, temperature, pipe storage, and gas transmission capacity. 4. Under the condition that the actual operating parameters of the gas pipeline networks are known, the identification methods of the proposed characteristic parameters can be extended to large-scale networks. The two methods are expected to provide adequate technical support for the research, development, and application of intelligent control of the gas pipeline networks in the future.

ACKNOWLEDGMENT
This work is supported by the project of Academician (Expert) Workstation of PetroChina Southwest Oil and Gasfield Company (Application of pipeline network simulation technology in natural gas energy measurement).

DATA AVAILABILITY STATEMENT
Research data are not shared.

CONFLICT OF INTEREST
The authors have no conflict of interest relevant to this article.