Research on prediction methods of formation pore pressure based on machine learning

Formation pressure is the most fundamental data in oil and gas drilling and production; it has an important position in the entire cycle of oil and gas extraction. However, most current prediction methods are limited to parametric methods with fixed models; such that the accuracy does not meet requirements. This is especially true for deeper layers of marine sedimentary basins where the safety density window is extremely narrow. In this study, we propose a novel method to predict pore pressure using machine learning techniques. For the first time, the effective stress (direct output variable) was accurately predicted by a combination of four input variables (2900 sets of data, of which 90% is the training subset and 10% is the testing subset), including longitudinal velocity, porosity, mud content, and density. As such, an accurate prediction of the formation pressure was achieved based on the effective stress theorem. The performance of machine learning techniques was verified by comparing and analyzing the prediction results with traditional parametric single and multivariate models; whereby the best algorithm was chosen by structural optimization and comparative analysis of five algorithms (multilayer perceptron neural network, radial basis neural network, support vector machine, random forest, and gradient boosting machine). Compared with the methods based on parametric one‐dimensional and multivariate models, the machine learning‐based method was determined to possess high accuracy, adequate self‐adaptation, and high fault tolerance (D2 = 0.9981, RMSE = 0.00718 g/cm3). Moreover, the multilayer perceptual neural network algorithm outperformed other machine learning algorithms in terms of goodness of fit, generalization, and prediction accuracy, with D2 = 0.9981 and RMSE = 0.00709 g/cm3. The formation pressure prediction model developed in this study is not affected by the mechanical depositional environment and is applicable to sandy mudstone formations, such that it can be a useful and highly accurate alternative to the traditional formation pressure prediction methods with fixed parameter forms.


| INTRODUCTION
The formation pore pressure (Pp) refers to the fluid pressure in the formation pore and fracture spaces. Pp is typically applied in each vital link of oil and gas exploration and development. Accurate prediction of Pp can strengthen the understanding of the sedimentary evolution history of sedimentary basins and provide information for the study of hydrocarbon migration and trapping. This is of great significance to the exploration of large oil and gas fields. 1,2 During the drilling process, accurate identification of high-pressure causes and prediction of Pp can effectively reduce downhole complex accidents, such as blowout and drilling fluid loss, and indirectly reduce drilling costs. 3,4 When the Pp is less than hydrostatic pressure, it is called abnormally low pressure, while greater is called abnormally high pressure. Abnormal high pressure can be caused by many factors, such as poorly compactness, hydrothermal pressurization, clay mineral diagenesis, permeability, hydrocarbon generation, hydrocarbon cracking, pressure transfer, and so forth. Among these factors, poor compactness is one of the essential mechanisms of unloading, and fluid expansion (hydrothermal pressurization, hydrocarbon generation, and hydrocarbon cracking) is one of the most significant unloading mechanisms. 5

| The prediction methods for Pp based on logging data
Geophysical logging information is ideal for determining Pp because it can accurately reflect the formation compaction law and mechanical and elastic characteristics of subsurface rocks. An accurate Pp profile was obtained based on logging data. 6 Pp prediction methods based on geophysical logging data are divided into two main categories. The first category is the method represented by the Eaton method and is based on the normal compaction trend line. [7][8][9][10] The second type is the method described by Bowers, based on the effective stress theorem. [11][12][13][14][15] However, due to the uncertainty of the specific form of the normal compaction function, multisolution results are obtained from predicting Pp; limited by lithology (suitable for pure mud and shale formation), the practicability of the former is obviously reduced. Therefore, the Pp prediction method based on the effective stress theorem is currently more applicable.
The method based on the effective stress theorem is based on the empirical transformation from physical properties and lithological parameters to effective stress. Numerous scholars have proposed many different transformation methods, including single-variable and multivariate models.
A single variable (typically acoustic velocity) is associated with effective stress. The most superficial and classical single variable function relation between effective stress and acoustic velocity is the Bowers' model. 16 It establishes different models for mechanical characteristics in different deposition processes: Loading curve: Unloading curve: There are few multivariate relationships involving physical properties and effective stress, but some references can still be made. Han's large number of experiments have shown that the most critical factors affecting acoustic velocity are lithology, porosity, and effective stress; that is, acoustic velocity is a function of lithology, porosity, and effective stress. Eberhart-Philips performed a detailed analysis of Han's test data and proposed a multivariate transformation model between acoustic velocity, argillaceous content, porosity, and effective stress 17 : Sayers proposed another multivariate transformation model related to these four variables based on Han's test data and Bowers' loading curve, 18 as shown in the following equation:

| Problem statement
The single-variable model relies on lithology. To reduce the negative influence of lithological changes, pure mud and shale data are considered first. However, the overall error of Pp prediction results will become larger when performing full well section Pp prediction owing to the influence of other lithologies (e.g., sandstone). The multivariate models consider the effects of multiple rock properties and lithologies, and fully consider the response of rock conduction properties (acoustic velocity, resistivity, etc.) and volume properties (porosity, density) to effective stress under different lithological conditions (generally sand and mudstone formations). However, the aforementioned two multivariate models consider the influence of lithology only by argillaceous content. Sandy mudstone with the same mud content does not have the same mineral composition, particle thickness, and cementation, and these factors will have some influence on the longitudinal wave propagation velocity, while the density can reflect the influence of these factors to some extent. 6 According to Bowers et al., 16 in the process of sedimentary loadings, such as poor compactness, rock acoustic time difference, porosity, density, and other properties all deviate from the normal compaction trend line. In the process of unloading, such as fluid expansion, the reservoir pores of the formation rock are insensitive to unloading, while the connecting pores are opened and the rock acoustic velocity increasingly changes, but the porosity and density remain fundamentally unchanged. 16 The density of the rock has an evident effect on the acoustic velocity. Therefore, a density term can be added to create a new multivariate model that is applied to the lithology and mechanical environment of deposition. It should be noted that the aforementioned models are all parametric regression models, where the variables in the models have specific functional forms, which is a simplification of the complex relationship between the effective stress and the physical parameters of the rock. This significantly reduces the regression effect. In addition, the use of these multivariate transformations requires tedious correction of the coefficients, which is highly influenced by subjective factors.
In this study, five machine learning models, namely, multilayer perceptron neural network (MLP), radial basis function network (RBF), support vector machine (SVM), random forest (RF), and gradient boosting machine (GBM), were constructed to overcome the problems associated with single and multivariate models. These five models used four input variables to predict the formation Pp accurately. The input variables were the acoustic velocity, mud content, density, and porosity. A data set including 2900 data points provided by the China National Offshore Oil Corporation (CNOOC) was used to train and test the developed models.

| Literature review
Unlike classical parametric regression models, machine learning techniques do not rely on strong assumptions about the relationships between variables. Therefore, to overcome this limitation of parametric models, some scholars have introduced machine learning techniques into the field of petroleum. For example, it has been successfully applied in several research fields such as fluid flow rate prediction, [19][20][21][22][23][24][25] crude oil physical property prediction, 26-28 rock lithology physical property prediction, [29][30][31][32] and geophysical data processing. [33][34][35][36] Some studies have been conducted on the prediction of formation pressure. Keshavarzi et al. 37 developed a backpropagation artificial neural network (BPANN) to predict the Pp gradient in the Asmari (Oligocene) and reservoir (Cretaceous) in the Asmari (Summer) area. The results showed that the prediction accuracy of the BPANN-based Pp prediction model was much higher than that of the conventional Eaton method. Ahmed et al. 38 developed a Pp prediction model based on drilling parameter logging data (including drilling pressure, rotational speed, drilling speed, mud density, bulk density, porosity, and acoustic time difference) and artificial neural networks. Their study demonstrated that to ensure highly accurate prediction of Pp, combining drilling parameters and logging data, preferring feature variables, and optimizing the neural network structure is essential. The model was validated to have a high Pp prediction performance with a correlation coefficient of 0.998 and an absolute percentage error of only 0.17%. Ahmed, Elkatatny et al. 38,39 used logging data (porosity, density, and compressional time) and real-time surface drilling parameters (weight on bit, rotary speed, rate of penetration, and mud weight) as base data to predict the Pp using an SVM. The results showed that the SVM predicted the pore and fracture pressures with high accuracy, where the coefficient of determination was greater than 0.995. Hutomo et al. 40 and Andrian et al. 41 predicted the Pp using 3D seismic data and neural network methods and conducted a distribution characterization study with a prediction accuracy of 90%. Yu et al. 42 developed four machine learning methods for predicting Pp (including multilayer perceptron neural network, SVM, RF, and gradient boosting methods) to predict Pp with an anomalously high type of lower unloading in combination with Bowers unloading curves based on logging data (porosity, mud content, and compressive time) in the normal pressure section. Their results showed that their RF model outperformed the other algorithms in terms of fit, generality, and prediction accuracy. Farsi et al. 43 used a hybrid machine learning optimization algorithm to predict the Pps in carbonate-dominated formations based on logging data. The analysis showed that a multilayer extreme learning machine model hybrid with particle swarm optimization, applied to seven input variables by feature selection, provided the most accurate Pp prediction for the entire data set. The trained model can be reliably applied to multiple locations throughout the Marun field for Pp prediction.
In this study, for the first time, we relate rock properties, lithological parameters, and effective stress, integrate the complex transformation relationships of these properties, and combine the effective stress theorem to propose a new Pp prediction method that is applicable to sandy-mudstone formations and is not affected by overpressure genesis, that is, the Pp prediction method using machine learning techniques. The method does not need to establish trend lines and loading or unloading curves and does not require the tedious process of manual interaction. As a result, the coefficient correction problem of parametric models can be effectively avoided. By comparing and analyzing the prediction results with the parametric model, an adequate performance was demonstrated. The prediction accuracy of five machine learning methods (i.e., MLP, SVM, RF, and GBM) was analyzed and compared to determine the best algorithm for Pp prediction. Figure 1 illustrates the workflow sequence employed to develop and compare the machine learning models to predict Pp from logging data collected from an exploration well drilled by CNOOC.
Five characteristic variables (acoustic velocity, porosity, density, mud content, and effective stress) were used in this study. The acoustic velocity, porosity, density, and mud content were the input variables, and the effective stress was the output variable. The Pp is based on the effective stress theorem and is converted from the effective stress. To avoid interference between these four characteristic variables, data for different characteristic variables were obtained from different logging data. 44 The acoustic velocity was calculated as the inverse of longitudinal acoustic travel time logging. The density was derived directly from the density logging curves. The porosity was directly derived from the compensated neutron logging data. The mud content was estimated using gamma-ray logging. The effective stress was converted from the overlying formation pressure and Pp. Subsequently, we smoothed the characteristic variables using low-pass filtering to eliminate and correct the anomalies.
Machine learning is a data-driven approach that requires a relatively large training data set. Before training a machine learning model, the features in the training data must be transformed into the same numerical interval. Data normalization can prevent the influence of larger value features (acoustic velocity and F I G U R E 1 Schematic diagram of workflow employed for comparing prediction performance of machine learning algorithms developed to predict pore pressure. AAPD, average percentage deviation; RMSE, root mean square error HUANG ET AL. | 1889 density) on the model from training by obscuring smaller value features (porosity and mud content). The normalization also accelerates the convergence of machine learning models (e.g., MLP, RBF, SVM, and GBM) for the gradient descent-like algorithms employed in this study. The mean normalization method can better preserve the distribution characteristics of the data and exhibits adequate robustness.
Mean normalization: Then, we divide the normalized data set into two subsets: a training subset used to train the machine learning model (90% of the data) and a test subset used independently to test the training model (10% of the data). Subsequently, the training subset is used to train five machine learning algorithms, and the trained models are used to predict the Pp of the training, test, and overall datasets. Finally, the prediction performance and accuracy of the five models constructed are evaluated and compared by applying a set of statistical measures of prediction accuracy, as defined in Section 4.

| Machine learning algorithms
In this study, MLP, RBF, SVM, RF, and GBM machine learning models were built to predict Pp. A fivefold crossvalidation was used for model training of the above five algorithms, and the average cross-validation score (R 2 ) was used as the criterion to optimize the algorithm structure.

| MLP
An MLP is a type of feedforward artificial neural network. A feedforward neural network is an efficient nonlinear function approximation that uses a gradient descent algorithm to minimize the function approximation error. MLP organizes neurons into a layered structure, typically one input layer, one or more hidden layers, and one output layer. The universal approximation theorem, the mathematical theoretical basis of the neural network, indicates that under the mild assumption of the activation function, a single hidden layer feedforward neural network containing finite neurons can approximate any continuous function on a compact subset. This demonstrates that under proper parameter control, simple neural networks can represent complex functions under proper. The loss function of MLP is the mean square error. The optimization method is essentially a stochastic gradient descent method and its variants. The gradient on each neuron is obtained by applying the chain rule and using the error backpropagation algorithm. [44][45][46] For the MLP, the structure of the neural network (including hidden layer and hidden size) and the type of activation function are the main factors affecting the performance of the algorithm. An MLP with a large number of hidden layers is complex in structure, difficult to train and optimize, requires a large number of training samples, and is very costly. Therefore, the structure of an MLP is limited to a structure that contains one single hidden layer. As such, the hyperparameters we need to optimize include the hidden size and the type of activation function. The R 2 values for different combinations of the hidden size and activation function are illustrated in Figure 2. Initially, R 2 increases sharply with the increase in hidden size; as the hidden size gradually increases to 14, R 2 exhibits a wavy pattern that does not change much. Overall, relu has a higher R 2 as the activation function, and the maximum R 2 (0.994) was achieved when the hidden size was 944. The detailed information about the configuration of the MLP model is shown in Table 1.

| RBF
The RBF is also a three-layer feedforward neural network with excellent performance, which has global approximation ability, compact topology, separation learning of structural parameters, and fast convergence speed. The hidden node of the RBF takes the distance between the input mode and center vector as the independent variable of a function and uses the radial basis function as the activation function. The farther the neuron's input from the RBF center, the lower the neuron's activation. RBF can arbitrarily approximate the nonlinear function under a simple network structure, has an adequate generalization ability, and can achieve nonlinear control. RBF is used as the "basis" of the hidden element to F I G U R E 2 R 2 under different combinations of hidden size and activation function construct the hidden layer space, and the input vector is directly mapped to the hidden space without a weight connection. The weights of networks can be solved directly by linear equations, which greatly expedite learning and avoid local minimum problems.
The activation function of the RBF is a linear radial basis function. 47,48 Due to the limitation of the training samples, the same single hidden layer structure was used. Its hidden size controls its performance. Therefore, the hyperparameter optimization of the RBF needs to target only a single factor, the hidden size. As can be seen in Figure 3, like MLP, in the beginning, R 2 increases sharply with increasing hidden size; when the hidden size gradually increases to 14, R 2 changes very little. The maximum value of R 2 (0.994) was achieved when the hidden size was 654. The detailed information about the configuration of the RBF model is shown in Table 1.

| SVM
SVMs were applied. SVM is the most typical method of the kernel-based algorithm. SVM solves a quadratic optimization problem, and its objective function is a convex function composed of an interval-based loss function with regularization terms. By introducing the kernel technique, input sample data are mapped from low-to high-dimensional space. The optimal regression hyperplane or classification hyperplane is found in a high-dimensional space. Its function can be extended to nonlinear classification and regression problems. The commonly used kernel functions are linear, polynomial, and radial basis functions. [47][48][49][50][51] For SVR, optimization of the kernel function is the most important step. The parameters γ and C of the penalty term in the kernel function are the hyperparameters requiring further optimization.  same impact on the model. However, a C value that is too large leads to a reduction in the generalization ability of the model, that is, the classification accuracy of the test data is reduced. Therefore, we set the C value to the default value of 1, and the γ value was preferred. Figure 4 shows the R 2 values for different combinations of kernel functions and γ values. When the poly and sigmoid functions are used as kernel functions, R 2 < 0, which indicates the extremely poor performance of the model, when the kernel function is the RBF function, R 2 > 0.7 and a maximum value of R 2 (0.992) can be obtained at γ = 10.21. The detailed information about the configuration of the SVM model is shown in Table 1.

| RF
RFs are a type of bagging algorithm that constructs many uncorrelated decision trees and outputs predictions through simple voting or simple averaging methods. RF further reduces the correlation among decision trees by improving the decision-tree learning algorithm. In decision tree generation, the RF algorithm unsystematically selects some features, but not all features, when dividing the sub-nodes. In addition, RF can estimate the model's generalization error without relying on external data sets, enabling it to conduct model fitting and model verification simultaneously with training. 51,52 By introducing random sampling and random column sampling, the RF model is more robust to outliers, and the model has better generalization ability. Usually, only two hyperparameters need to be adjusted: the maximum number of decision trees in the forest and the maximum depth of the decision trees. As the number of decision trees increases, the accuracy of the model increases, but too many will also counteract the introduction of randomness, and the final model will be an overfitting model with reduced generalization performance. Figure 5 illustrates the R 2 under different maximum numbers of decision trees and the maximum depth combination of decision trees. To improve the performance of the model and reduce the degree of overfitting, we set the maximum depth of the decision tree to 10. The larger the maximum number of decision trees, the more complex the model and the more wasteful the resources. For this, we take the value of the maximum number of decision trees as 15. The detailed information about the configuration of the RF model is shown in Table 1.

| GBM
Boosting refers to an effective method for forming strong learners by combining multiple weak learners; this is another meta-algorithm for ensemble learning. It adds new weak learners to fit the training data sequentially; then continuously focuses on observation data with the poor performance of existing ensemble learners to improve the fitting effect of the ensemble. The GBM extends the boosting idea to any differentiable loss function and becomes an optimization algorithm that minimizes the loss function by sequentially increasing the decision tree that can most effectively reduce the loss function. [53][54][55] Similar to RF, the maximum depth of the decision tree plays a role in controlling the degree of fit. While using the GBM with regularization, another optimization focus of XGBoost (extreme gradient boosting) is the performance of the hyperparameter regression volume. Similarly, we optimized the number of maximum decision trees in the GBM model. Figure 6 illustrates the R 2 for different combinations of the maximum number of decision trees and the maximum depth of the decision trees. Overall, R 2 tends to increase as the depth and number of decision trees increase. However, to avoid overfitting the model, we set the maximum depth  Table 1.

| Data collection
LD10-A is an exploration well drilled by CNOOC in the South China Sea. Figure 7 illustrates the logging and testing data collected in this study for well LD10-A in the 3726-4015.9 m section, involving 2900 data records. The logging data include density, gamma, longitudinal time difference, and neutron porosity, and the test data are Pp coefficients. A composite histogram of well LD10-A is shown in Figure 7.

| Data analysis
The water depth in this area is approximately 56-87 m, and the density of seawater is 1.03 g/cm 3 . The lithologies of the involved layers were mudstone and sandstone. Among them, the 3971-4005 m sandstone section is a F I G U R E 5 The R 2 for different combinations of the maximum number of decision trees and the maximum depth of decision trees F I G U R E 6 R 2 for different combinations of the maximum number of decision trees and the maximum depth of decision trees gas-rich layer. After entering the gas-bearing sandstone section, the rock properties and Pp changed abruptly. The high stratigraphic temperature and low permeability in the study section are much greater than the effective confinement conditions required for hydrothermal pressurization to take effect. 56 Moreover, the deep fluid is transported vertically through the bottom-opening channel or fracture system, such that the deep overpressure is transferred to the shallow part, leading to the development of strong overpressure in the shallow permeable strata. 57,58 Figure 8 illustrates the intersection of the density and acoustic velocity of mudstone. From the figure, we can see that the acoustic velocity varies with depth, while the density remains approximately 2.6 g/cm 3 with a vertical trend, indicating that unloading is the main reason for the formation of overpressure in this layer section. When the formation is unloaded, the F I G U R E 7 A composite histogram of well LD10-A F I G U R E 8 Acoustic velocity and density intersection in the mudstone section elastic rebound has a greater effect on the conduction properties than the volume properties, and the longitudinal wave velocity and density have special characteristics in response to unloading. 16 This also indicates that the density term is essential for predicting the pressure.
We conducted further correlation analysis. Figure 9 illustrates the heat map of the correlation matrix for both methods. The Pearson correlation coefficient method was used for linear correlation analysis, whereas the maximum information coefficient method was used for nonlinear correlation analysis. Figure 9 illustrates a heat map of the correlation matrix of the two methods. The linear correlation coefficients of acoustic velocity with argillaceous content, density, porosity, and effective stress were lower than 0.46. In comparison, the nonlinear correlation coefficients are higher than 0.59, which indicates that the simple linear combination is not accurate in describing the compressional acoustic velocity. And from the perspective of effective stress, the nonlinear correlation between effective stress and other factors is stronger than the linear correlation. In addition, the correlation coefficient between effective stress and density is greater than 0.65, regardless of whether it is linear or nonlinear, indicating that the introduction of the density term is correct.

| Statistical measures of prediction accuracy
In the present study, a vigorous approach for making a reliable comparison between the prediction performance of ML methods was employed to compare the prediction efficiency of the machine learning models developed for Pp prediction. The statistical indicators estimated for comparison of the predictive models' accuracy, as given by Equations (6)-(8), were relative error (RE) or percentage deviation (PD i ), absolute average percentage deviation (AAPD), and root mean square error (RMSE).
RE or percentage deviation (PD): RMSE: Coefficient of determination (D 2 ): These statistical indicators are among the most common indicators used to estimate and compare the prediction accuracies of machine learning models. Among them, the RMSE is regarded as the most efficient statistical indicator used to measure the accuracy of machine learning models. Because the algorithms developed are configured to minimize the RMSE, this statistical indicator is more important than the other statistical parameters applied.

| Prediction performance of the ML methods
By applying the statistical indicators defined above, the prediction accuracies of the machine learning models developed for the prediction of Pp were compared, and the best predictive model that presented the lowest error was identified.
In the present study, the data set evaluated was divided into two parts, training and testing subsets. Tables 2-4 display the comparison between the prediction accuracy achieved by each machine learning model developed to predict Pp for the training (2610 data records; 90%), testing (290 data records; 10%), and overall data set (2900 data records; 100%), respectively.
The results of the statistical error analysis performed (see Tables 2-4) demonstrate that all five machine learning models evaluated, MLP, RBF, SVM, RF, and GBM, provided reliable and accurate predictions of Pp.
The least prediction accuracy for Pp was achieved by the SVM model, while the highest accuracy was attained by the MLP model with respect to D 2 = 0.9976 and RMSE = 0.00788 g/cm 3 for the testing subset. Figure 10 illustrates a comparison of the measured and predicted Pp values for the five machine learning models evaluated for the testing subset, training subset, and overall data set. The comparison shown in Figure 10 also reveals that the MLP model provided the best prediction accuracy among all five machine learning models evaluated in the present study. Based on the prediction accuracy obtained in terms of the correlation coefficient (Figure 10), the machine learning models evaluated can be sorted as MLP > RBF > RF > GBM > SVM.
The relative error determined for the predictions of Pp relating to each of the 2900 data points in the data set evaluated is displayed in Figure 11. The training subset data records are displayed first, followed by the testing subset data records. Figure 11 reveals that the developed MLP model demonstrated the lowest relative error for both the testing and training subsets in comparison to the other four machine learning models evaluated. In addition, as shown in Figure 11, outlying data for the RF and GBM models were observed; no outlying data records for the MLP model with the highest prediction accuracy (RE < 4%) were observed.

| Comparison with parametric models
To demonstrate the advantages of ML in Pp prediction more clearly, the prediction results of the parameterized models and the ML algorithm (the average of the five algorithms) were compared. For The density term was added to the multivariate transformation models. The models perform multivariate nonlinear regression according to the measured pressure value to determine the best coefficient value set. The predicted Pp coefficient curves of these two parameterized multivariate transformations are calculated utilizing Equations (10) and (11).
The Eberhart-Philips model added with density term:  Figure 12 illustrates the prediction results of the parametric model with the machine learning algorithm, while Figure 13 illustrates the analysis of the prediction accuracy. The study performs the same smoothing F I G U R E 11 Relative error comparison for the predicted values of Pp for the training and testing subsets' records for the five machine learning models evaluated. GBM, gradient boosting machine; MLP, multilayer perceptron neural network; Pp, pore pressure; RBF, radial basis function network; RF, random forest; SVM, support vector machine F I G U R E 12 Parametric model and machine learning algorithm prediction results F I G U R E 13 Comparison of prediction accuracy between parametric model and machine learning algorithm process on the predicted Pp (20 Hz low-pass filter for filtering). The Pp coefficient profiles predicted by the machine learning algorithm largely overlap with the measured pressure coefficient profiles with significantly higher accuracy than the other models. The Bowers unloading model, which only considers the relationship between acoustic velocity and effective stress, predicts the highest deviation of the Pp coefficient profile from the measured profile, that is, the lowest prediction accuracy. Although the multivariate model considers other parameters, the predictions are close to those of the Bowers unloading model, indicating that the acoustic velocity responds more strongly to the effective stress and that relatively large values of acoustic velocity attenuate or override the effects of other factors. Compared with the machine learning algorithm, the prediction results of the parametric model significantly deviated from the measured values, and the minimum AAPD and RMSE of the three methods were 11.20% and 0.34053 g/cm 3 , respectively, which are much higher than those of the machine learning algorithm. This demonstrates the evident deficiency of the parametric model in the complex relationship between the correctly described rock physical parameters and effective stress. Moreover, the machine learning algorithm can effectively identify abrupt changes in Pp in sections where the formation physical parameters change drastically, which can provide early warning information for the safe drilling of neighboring wells.

| Limitations
This method provides a framework for the application of the new ML algorithm. However, this method also has some limitations. First, a large training data set is required. Offshore well testing is more expensive than onshore well testing. In general, only exploration wells in new development areas will be tested for various geophysical parameters, such as acoustic log, compensation density log, and compensation log. Otherwise, only cross-log will be performed (e.g., acoustic log in one section and compensation density log in the other). Drillpipe tests, wireline tests, and so forth, are performed only in abnormally high-pressure or oil and gas zones. Secondly, the measurement data of different well sections and wells are greatly affected by the formation environment (temperature, pressure, etc.) and instrument performance, which leads to the accuracy of the training model and thus weakens the application ability of the model. Therefore, the improvement of measurement instruments, the correction of measurement data, and the expansion of training samples (the establishment of regional database) will enhance the application effect of ML technology in the Pp prediction.

| CONCLUSION
In this study, 2900 data points collected from CNOOC were utilized to construct five machine models, including MLP, RBF, SVM, RF, and GBM, and combined with the effective stress theorem to achieve the goal of accurately predicting the formation Pp. Four input variables were used in the developed models. These input variables comprise acoustic velocity, density, mud content, and porosity. The compiled data records were divided into two subsets, where 90% of the data records were used to train the machine learning model, while the remaining data records (10%) were used as a test subset to validate the performance of the evaluated model. The results of the study demonstrate that all the developed machine learning models provide reliable and accurate predictions of the formation pressure. Among them, the multilayer perceptual neural network model presented the highest prediction performance and accuracy (D 2 = 0.9981 and RMSE = 0.00709 g/cm 3 ). When applied to the overall data set, the developed machine learning model obtained a prediction precision of D 2 = 0.9981 and RMSE = 0.00718 g/cm 3 compared with the conventional single and multivariate parametric models, which presented a prediction error of AAPD = 11.20% and RMSE = 0.34053 g/cm 3 . The developed model avoids the tedious process of manual interaction, effectively avoiding the coefficient correction problem of parametric models, and is not affected by the mechanical depositional environment. As such, it is applicable to sandy mudstone formations and can be a useful and highly accurate alternative to traditional formation pressure prediction methods with fixed parameter forms.

Pp
pore pressure (generally converted to equivalent density), kbar (g/cm 3 ) σ ev effective stress, kbar P o overburden pressure (generally converted to equivalent density), kbar (g/cm 3 ) σ max maximum vertical effective stress at the beginning of formation unloading, kbar A B , Bowers model's coefficient U elastoplastic coefficient of mudstone, decimal V acoustic velocity, km/s ϕ porosity, decimal V sh argillaceous content, decimal