Production prediction and main controlling factors in a highly heterogeneous sandstone reservoir: Analysis on the basis of machine learning

Owing to a lengthy oil‐bearing interval, strong anisotropism, and significant difference in the fluid properties of the sandstone oil reservoir in P Oilfield, it is quite challenging to accurately the productivity of the oil well at the initial stage. In this study, a deep neural network model is established, based on a gradient boosting algorithm, XGBoost, to forecast the initial productivity of oil wells, followed by an evaluation of the main controlling factors of productivity. One hundred oil wells in the study area were divided into training and verification groups. With a specific productivity index of an oil well with a stable period of approximately 6 months at the initial production stage as the target data, and geological, engineering, and oil reservoir parameters as input data, hyper‐parameters for adjustment and optimization were selected, and a deep‐learning‐based unconsolidated sandstone productivity forecast model was established to forecast the initial productivity of oil wells in the target area. The mean square root error of the forecast result was <0.15, which is highly consistent with actual productivity. Finally, by adopting the XGBoost algorithm, the weight ranking of the controlling factors of productivity was clarified as follows: microscopic pore structure parameter > crude oil viscosity > median grain size > lithology index > well completion method > flow zone indicator. Machine learning has the advantages of effective forecasting of oil well productivity and the main controlling factors using multiple dimensions and big data.


| INTRODUCTION
The P oilfield is a continental multilayer sandstone reservoir. 1 It is a large offshore oilfield with oil reserves greater than 6 × 10 8 t. 2 The oil-bearing intervals of this oilfield are located in the lower member of the Neogene Minghuazhen and the Guantao Formations, and the multilayer commingled production development method of one set of directional wells is adopted. 3 However, the oil-bearing layers in this oilfield are long (100-650 m) and divided into 13 oil groups and 47 sublayers vertically. 4 Reservoirs are highly heterogeneous, with significantly varying fluid properties. 4 The aggregate crude oil is medium-quality heavy crude oil, with properties that are quite different vertically and horizontally, thereby making the prediction of the initial production of oil wells very difficult. The initial production capacity of an oil well directly affects the subsequent drilling work and the investment scale. Therefore, effectively establishing the main factors that affect the oil well production capacity and predicting the initial oil well production capacity are key to efficient oilfield development. 5 Currently, the logging interpretation model, formula calculations, numerical simulations, and neural networks are among the commonly used production forecasting methods. The logging interpretation model has the advantage that the response characteristics of logging curves can be calibrated based on data, such as cores, thin sections, and scanning electron microscopy, allowing the data from uncored wells to be extended and interpreted and the reservoir characteristics of uncored well sections to be determined. 6,7 The interpretation results are substituted into the capacity calculation formulas and capacity forecasts. However, interpreted logs results are generally static geological parameters, such as porosity, permeability, and shale content. Meanwhile, oil well productivity is influenced by multiple factors, including characteristic engineering and reservoir parameters, and the correlation between well production and the determining elements is a complex nonlinear relationship. Therefore, the application of log interpretation, only, to predict well production requires improvement.
The advantage of numerical simulation is that it can solve the production prediction problem without actual production data 8 ; however, it requires a large amount of data calculation and high-precision data sources. 9 Moreover, the physical mechanism of fluid motion is very complex. Currently, the most widely used numerical methods include the finite difference method (FDM), 10 finite volume method (FVM), [11][12][13] finite element method (FEM), 14 and spectral method. 15,16 Numerical methods are extensively used in various fields, for example, FEM and FVM, are used in computational fluid dynamics. 17,18 Reservoir numerical simulation has always been an integral part of reservoir management and is used to estimate production status, and check reservoir parameters. 19 Running a numerical simulation model can take days or even months, and sometimes hundreds of simulations are required to achieve optimal results.
The neural network method is also commonly used in production forecasting. The shallow neural network method is widely used, which includes artificial neural networks, 20 radial-basis neural networks, 21 and backpropagation neural networks. 22 However, the shallow model has only one hidden layer or no hidden layer nodes. The shallow model cannot scale, and gradually becomes incapable of meeting the needs of the application as the number of samples and feature dimension increase. In 2006, Hinton and Salakhutdinov conducted multilayer neural network training and made a breakthrough in deep neural network (DNN) research. 23 Experts have also applied DNNs to oil and gas exploration and development. For example, Liu and Liu 24 used deep learning to predict the lithofacies classification of reservoirs. Tewari and Dwivedi 25 integrated big data analytics to automatically applied reservoir development. Wang et al. 26 predicted the production of a Bakken shale reservoir using DNN.
The initial well production is a comprehensive reflection of the geological, reservoir, and engineering parameters. The factors affecting oil well production involve a wide range of aspects. It is difficult to manage production needs by relying only on simple linear models. DNNs contain more hidden layers and have the advantage of being good at processing large amounts of multiple sourced and dimensioned data. Therefore, using a DNN for production prediction has advantages over using traditional methods.
In this study, the micro-characteristic parameters of the reservoir were selected from aspects of rock structure and pore-throat micro-structure characteristics, and the reservoir characteristics and engineering parameters were fully considered. Finally, the XGBoost algorithm was used to analyze the importance of each parameter and to clarify the main controlling factors of unconsolidated sandstone production.

| GEOLOGICAL BACKGROUND
The P oilfield, located east of the Bohai Sea, was developed on the Tanlu fault zone and has a fault anticline structural type that was developed in the background of the paleo-uplift 27,28 ( Figure 1A). The main oil layers developed in the Neogene Guantao WANG ET AL.

| 4675
Formation and the lower member of the Minghuazhen Formation 4 ( Figure 1B). The lithology of the reservoir is fluvial terrigenous clastic rock, and that of the target interval is mainly feldspar sandstone. The average porosity of the reservoir was 27%, and the average permeability was 1321 mD. This oilfield is a reservoir with high porosity and permeability. The thickness of the oil layer is 63-151 m, and the length of the oil-bearing area is greater than 50 km 2 . 2,4 The oil-bearing layer is thick and the reservoir is highly heterogeneous. The reservoir's crude oil has the characteristics of high density, high viscosity, high gum content, low asphaltene content, low wax content, low sulfur content, and low freezing point. The properties of the crude oil gradually improve as the depth of the reservoir increases. On the plane, the fluid properties showed good central body regions and relatively poor wing portions 4 ( Figure 1C).
The initial production (6 months of production) of 100 oil wells is shown in the following table (Table 1):

| MATERIALS AND METHODS
Two main problems are required to be solved in this study: (1) the optimization of the representative reservoir parameters, and (2) the optimization of the fitting accuracy of the DNN model. The specific steps toward realizing these are illustrated in Figure 2. The solution is further explained by analyzing a case of the study area.
Step 1: Parameter selection. Representative reservoir parameters were selected using data mining methods, such as correlation analysis and principal component analysis (PCA).
Step 2: Data preprocessing. Preprocess data for computational efficiency.
Step 3: DNN hyperparameters setting. The hyperparameters of the model are optimized, and the best parameters are selected to train the model. Step 4: Oil well production forecast and evaluation of key controlling elements. The initial production prediction of the actual production well is conducted to verify the reliability of the DNN. The DNN is combined with the XGBoost algorithm to clarify the main control factors.

| Parameter optimization and data processing
Data from a total of 100 production wells were collected for this study, including the fluid properties and geological, engineering, and production performance parameters ( Table 2). The parameters were porosity, permeability, pore throat radius at 35% mercury saturation (R35), 29 flow zone indicator (FZI), 30 reservoir quality index (RQI), 30 median grain diameter (Md), shale content (Vsh), mature mineral composition degree, crude oil viscosity, well completion method, and specific oil recovery index. The geological parameters were obtained from the weighted average of the thicknesses of the perforation section, which were obtained from logging interpretation results and core tests. The crude oil viscosity was obtained from an analysis and laboratory data, the engineering parameters were obtained through a completion method, and the dynamic parameters were obtained from the initial stage of production. The specific oil recovery index during the stable period was approximately 6 months.

| Production impact parameter
Reservoir microscopic characteristic parameters The microscopic characteristics of a reservoir directly affect its storage and seepage capacities, which are of great significance in management of the reservoir. 31 Currently, there is no unified definition of the microscopic characteristics of reservoirs. In this study, aiming at the basic characteristics of the loose sandstone, the microscopic characteristics of the reservoir are defined as the comprehensive attributes of the heterogeneity of the reservoir rock structure and composition and the porethroat structure. Based on the above definitions, some representative reservoir micro-characteristic parameters were selected.
1. Pore structure characteristics (1) Porosity and permeability Porosity plays a significant role in the reservoir evaluation system 32,33 by representing the reservoir capacity and establishing a relationship with the logging curve. The formulas are as follows: where ϕ t is the total porosity (decimal fraction), ϕ t is the effective porosity (decimal fraction), ρ is the logging density value (g/cm 3 ), ρ ma is the pure sandstone density value (2.65 g/cm 3 ), ρ fluid is the formation water density value (1.0 g/cm 3 ), V cl is the clay content (decimal fraction), and V cbw is the bound water content of the clay (decimal fraction). Permeability has always been a difficult point in logging interpretation, 1 which has established eight lithofacies constraint standards in the P oilfield to explain permeability ( Table 3). The correlation was found to be good. The log permeability is closely related to the core permeability. The relative standard deviation is 1.76%.
(2) R35 Coring is difficult in offshore fields and pore-throat parameters are difficult to obtain based on mercury injection tests in all wells. Geophysicists have introduced empirical formulas that relate the pore-throat radius to porosity and permeability (which are readily available and inexpensive to measure). 29,[34][35][36] The defined R35 can be used to characterize the reservoir's effective porethroat radius, but it must be calibrated based on the reservoir characteristics. Ae study of a sample of the research region to carry out a porosity, permeability, and capillary experiment, Pittman's method, 36 was applied to fit the pore throat radius, porosity, and permeability relationship, obtaining a series of corresponding mercury saturation values of 10%-75% of the pore-throat radius ( Table 4). The correlation at 35% is the best, and R35 has a good correlation with other mercury injection T A B L E 2 Initial production capacity data of oil wells The well number | 4679 characteristic parameters ( Figure 3A) and can better represent the mainstream pore-throat radius than R50 ( Figure 3B). Therefore, R35 was selected as a comprehensive parameter to represent the pore-throat structure of the reservoir.
(3) FZI and RQI Amaefule et al. 30 first introduced the FZI and RQI concepts to characterize the pore structure of clastic rocks and divide the flow units. Subsequently, these parameters were widely used in reservoir research. 37 The FZI and RQI values can be used to differentiate pore structure types with different seepage characteristics. [38][39][40][41][42] The derivation process is as follows: where k is the permeability (mD),  He is the effective porosity (%), and  z is the ratio of pore volume to particle volume.

Rock structure and composition parameters
(1) Shale content and the medium grain diameter The median grain diameter (Md) corresponds to the point where the particle content is 50% on the probability cumulative curve drawn according to particle size sieve analysis data and is generally expressed in millimeters (or the φ value). The thicker the sediment, the more its hydrodynamic effect. Given the loose reservoir of unconsolidated sandstone, shale particles easily fall off and migrate during the development process, blocking the throat and adversely impacting the development. Therefore, shale content (Vsh) was selected as an evaluation parameter. Generally, a linear correlation exists between the logarithms of Md and Vsh. In this study, using laser particle size analysis data, the statistical relationship between median particle size, shale logging value, and regression analysis, that is, the regression formula of median particle size is established as follows ( Figure 4A): where Md is the median grain diameter (mm) obtained by the laser particle size analysis experiment, and Vsh is the argillous content (%) obtained by logging interpretation.
(2) Shale content and medium grain size Mineral composition maturity refers to the relative content of the most stable components in clastic rocks that mark the maturity of its composition: composition maturity = (quartz)/(feldspar + cuttings). The lower the relative value, the purer the lithology of the reservoir, the greater the compositional maturity index, and the better the general physical properties. The composition maturity index was calculated from cast thin-section data. However, the coring well was limited and the coring interval was greater. Consequently, the corresponding thin-section data are minimal. To obtain a longitudinally continuous composition maturity index value, logging data is required. Logging curves that are highly sensitive to reservoir lithology include gamma ray (GR), spontaneous potential (SP), and photoelectric index (Pe). Lai et al. defined the lithology index 43 LI = GR/Pe (where the GR unit is API, and the unit of Pe is b/e), which is the ratio of GR (American Petroleum Institute, API) to Pe (barns per electron, b/e) and is a suitable parameter for characterizing the maturity of the reservoir composition ( Figure 4B). Generally, the smaller the LI value, the higher the content of quartz and feldspar, and the lower the content of rock debris, which means the purer the lithology of the reservoir, the higher the composition maturity index, and the better the physical properties of the reservoir. is the LI was used to characterize the composition maturity index of the reservoir.

Engineering parameters
Completion methods also strongly influence the development capability of oil wells. Commonly used completion methods in the study area are open-hole screens, fracturing and packing, high-speed water and frac pack, and casing perforation. Sand is produced during the production process. Fracturing and packing are commonly used to stimulate and control sand. The different types of completion methods have specific impacts on the output of the oil well at the initial stage of production. In this study, a completion method was selected as an engineering parameter (Table 2).

Fluid property
In the study area, the oil interval is long, the reservoir highly heterogeneous, and the fluid properties varied greatly in the vertical direction (Table 5). Fluid properties have a strong impact on oil-well production. In this study, we attempted to establish a DNN model with and without crude oil viscosity ( Figure 5). The final neural network model without crude oil viscosity had low prediction accuracy. Therefore, the crude oil viscosity was selected to characterize the properties of the reservoir fluid. Data were obtained using sampling assays.

| Production parameter
The oil production index refers to the daily liquid production volume under a unit difference in production pressure. The calculation formula is as follows 44 : where J is the fluid production index (t/(day MPa), Q is the daily fluid production (t/day), P e is the reservoir static pressure (MPa), and P ωt is the reservoir flow pressure (MPa). The specific oil recovery index refers to the oil recovery index per meter of oil-well thickness with a unit production pressure difference. The calculation formula is as follows: where J h is the specific oil recovery index (t/(day MPa)/ m), and h is the oil layer thickness (m).

| Data preprocessing 1. Data normalization
Data preprocessing was used to normalize the input data. The various parameters have features of multiple sources, dimensions, and formats, and using their data directly will cause problems in the calculation process. Therefore, it is necessary that these data be normalized through preprocessing. In data preprocessing, all features are mapped between 0 and 1 through mathematical calculations, thereby improving the operation efficiency. The formula used in this study is as follows: where x max is the maximum value and x min is the minimum value.

One-Hot Encoding
The engineering parameters are not continuous numerical variables, but discrete categories, whereas the artificial neural network (ANN) training process is based on numerical values. Therefore, discrete categorical variables should be converted into numerical values to train neural network models. One-hot encoding solves this problem by mapping discrete categorical data into a column of binary vectors with at most one value. 45 Onehot encoding treats each feature category as a new one. Taking the completion method as an example, after encoding, the data of the completion method were converted into the following format (Table 6):

| Deep neural network
The principle of ANN is a machine learning technology created by imitating the neurons of the human brain. In the neural network, output results are obtained by obtaining the sum of the weights assigned to neurons between the input and output, and the model is modified by changing the weights and thresholds of the neurons according to a defined loss function. Figure 6 shows a schematic of two neural networks. No unified distinction exists between the two neural networks, but most scholars believe that the number of hidden layers is the difference between the two types of neural networks. 45 An activation function enables a neural network to process nonlinear functions, which represent the curves. Under the premise of using activation functions, the more layers of neural networks, the more complex the problems that can be solved. The principle of the ANN is to apply the following formula: where ω ij is the weight, b i is the threshold, x i is the input eigenvector, and z i is the predicted result.

| Activation functions
Without an activation function, the ANN model is purely a linear model, regardless of its number of hidden layers; it can only solve linear problems. Given the limitation of linear representation, many features of the initial input  Table (Table 7) lists commonly used activation functions and their expressions. The sigmoid 46,47 and tanh 48,49 functions were common activation functions in the early days. Their function curves are shown in Figure 7. When the input value of the sigmoid function has a positive or negative infinity tendency, the gradient approaches zero, that is, the gradient dispersion phenomenon occurs. As shown in Figure 7, the shapes of the tanh and sigmoid functions are similar. The output value of the sigmoid function is between 0 and 1, and the output value of tanh is between −1 and 1. Therefore, tanh is more efficient than the sigmoid function. However, the disadvantage of these two activation functions is that the amount of input data significantly affects the learning speed of the neural network. To solve this problem, a new activation function, rectified linear unit (ReLU), has been proposed. 50,51 ReLU is the activation function selected in this study. When the input of the ReLU function is positive, the output has a linear relationship with the input, the derivative is always 1, and no gradient dispersion occurs, thereby addressing the problems of the previous sigmoid and tanh functions. The operation speed is significantly improved compared with that of the exponential operation of the sigmoid and tanh functions owing to its linear calculations.

| Adam optimizer
Computer scientists have developed many optimization algorithms in the history of artificial intelligence development; however, many algorithms can only be applied to certain types of neural networks. Homik et al. proposed the gradient descent with momentum (GDM) algorithm, 52 which can be applied to various neural network structures. Csáji 53 proposed the root-meansquare propagation (RMSprop) method. An optimizer applies an adaptive learning efficiency to avoid over-themouth points. Adam is a learning rate-adaptive optimization algorithm that combines root mean squared propagation (RMSprop) and GDM and is currently the most commonly used optimization algorithm. 54 Its iterative formula is as follows: where η is the initial learning rate, g t is the gradient of the parameters in the tth iteration, and ∆x t is the change in the parameters in the tth iteration. The denominator is the L 2 norm for all gradients in each dimension. 54

| Principal component analysis
The principal component analysis is a data analysis method used for dimensionality reduction, 55,56 which can convert multiple variables into a small number of

Activation function Expression
Linear Softplus f x e ( ) = log (1 + ) x principal components. Each target parameter is transformed into a principal component through linear change, and the premise of the nonlinear correlation between the original variables is satisfied. The advantage of this method is that the data dimension can be reduced, and the data can also be classified. This study analyzed the meaning of each type of parameter and classified the parameters using PCA. The calculation steps are as follows: Step 1. Normalize the raw data.
Step 2. Build the covariance matrix from the normalized data matrix.
Step 3. Calculate the eigenvalues according to the covariance matrix and obtain the principal component and cumulative variance contribution rates.
Step 4. Build the factor loading matrix and calculate the principal components.

| XGBoost
The XGBoost algorithm 57 is an improved ensemble learning algorithm for gradient boosting decision trees (GBDTs). It adopts the gradient boosting idea of the GBDT and the regression tree generation algorithm and has been improved in many aspects (Figure 8). The advantages of XGBoost are as follows: 1. The XGBoost algorithm performs a second-order Taylor expansion on the loss function and simultaneously uses the first-and second-order derivatives to improve prediction accuracy. 2. The XGBoost algorithm also supports linear classifiers. 3. The XGBoost algorithm saves the data as a block structure so that the gain calculation of each feature can be performed in multiple lines, improving the efficiency and accuracy of the operation. 4. The XGBoost algorithm can obtain the importance ranking of feature variables through statistics. The importance is generally expressed as a score. The importance of each decision tree is calculated by the number of improved performance measures at each attribute segmentation point. Then, the feature importance of all decision trees in the model is averaged. This provides a new idea for data reduction and main control factor analysis.

| Parameter optimization
According to the reservoir characteristics, many parameters affecting productivity were selected from the aspects of the reservoir's microscopic characteristics, fluid properties, and engineering parameters. However, the calculation methods for some of these parameters are quite similar; for example, R35, RQI, and FZI are obtained indirectly through porosity and permeability owing to the lack of cores. Therefore, through correlation and principal component analyses, representative parameters were simplified and selected for the quantitative evaluation of productivity factors.

| Principal component analysis
Generally, the cumulative contribution rate must exceed 85% for categorization into a class of principal components. In this study, through PCA, a large number of selected characteristic parameters were divided into mutually independent types (Figure 9). The cumulative contribution rate of the first three principal components in the study area reached 85.97%, which provides a good overview of the original variables. The first principal component had high positive loadings for the original variables of porosity, permeability, R35, RQI, and FZI, where R35 represents the effective pore throat radius of the reservoir, and FZI and RQI characterize the seepage capacity of the reservoir. Thus, the first principal component is called the pore-structure component. The second principal component has a high positive load on the median grain size and lithological structure parameters, and a large load on the argillaceous content. The second principal component was called the rock structure and composition. The third principal component has a large positive load on oil viscosity and can be called a fluid property component. The rationality of extracting the various parameters was also verified by PCA.

| Correlation analysis
The parameters were divided into three categories through PCA; however, there were strong correlations between the various parameters.
1. Reservoir pore structure characteristic parameters Figure 9 shows that porosity, permeability, R35, and RQI have high correlations among the parameters, and FZI has a weak correlation with the above parameters. The FZI and R35 were finally selected for characterizing the pore structure.

Rock texture and fabric parameters
The correlation between shale content and median particle size was strong, and the lithology index was weakly correlated with other parameters. Finally, the median grain size and lithology index were selected to characterize the rock combination and composition ( Figure 10).

Fluid properties
The viscosity of crude oil was selected to characterize the properties of the reservoir fluid ( Figure 10).

Engineering parameters
The well completion method was selected as the engineering parameter ( Figure 10). Figure 11 shows the characterization results of the optimized evaluation parameters in well P-9. We have established the matching relationship between the core test and the logging curve, which can intuitively characterize the distribution characteristics of various parameters. Then, the parameters of each well are calculated by means of the weighted average of sandstone thickness.

| DNN hyperparameter optimization
Before the final establishment of the DNN model, hyperparameters, such as the optimizer, learning rate, dropout of the DNN, and number of hidden layers were optimized using the specific oil recovery index of 100 oil production wells. 58 The hyperparameters are listed in Table 8.
The optimizer determines how the parameters are updated during the neural network backpropagation process, which has a significant impact on the model's predictive performance. The optimization was carried out using commonly used optimizers, such as the stochastic gradient descent (SGD), 59 Adam optimizer, 54 and root mean squared propagation (RMSprop) 60 (Figure 12). The number of neurons and layers determines the scale of the network ( Figure 13). Different scales of deep neural networks have different learning capabilities, which has a decisive impact on the prediction accuracy of the DNN. In the parameter table (i.e., Table 8), for the number of neurons row, each bracket represents the network size, and the elements in the brackets represent the number of neurons in the DNN. The learning rate indicates the size of the parameter update during the backpropagation process. If the learning rate is extremely large or small, the network will experience difficulty in updating the parameters and converging ( Figure 14). Consequently, the global minimum point cannot be determined, and the network cannot achieve an improved prediction performance. Dropout 61 was used to solve the issue of neural network overfitting, which is a common problem in machine learning and should be considered ( Figure 15).
The mean square error (MSE), a common regression evaluation index, was used to measure the prediction performance of the specific oil recovery index of the depth neural networks at different depths. The MSE and RMSE are defined as follows: where y i and ŷ i are the actual and predicted specific oilrecovery indices, respectively.
Based on the above analysis, the optimized hyperparameters are listed in Table 9.
F I G U R E 10 The plot of the parameter correlation coefficient matrix

| Prediction results of the DNN model
In this study, data from 100 wells were collected at the initial stage of production (6 months), and the samples were divided into 10 parts using cross-validation technology. One part was selected as the test set, and the remaining 9 were used as the training set. The root mean F I G U R E 11 Characterization results of evaluation parameters in well P-9 square error (RMSE) is 0.14 indicating that the predicted value is consistent with the actual value ( Figure 16). The deep neural network model has significant advantages over traditional methods (such as numerical simulation and logging interpretation). First, compared with the traditional linear fitting direction, a neural network can deal with complex nonlinear data relations, thereby making the prediction results highly accurate. Second, compared with the numerical simulation method, a reservoir model needs not be constructed. Rather, only data mining is required, which is easy to implement. Third, the model is easy to update; the data of old wells can be updated, and that of new wells can be added, indicating good adaptability and easy implementation.

| Analysis of main control factors
First, the parameters of the XGBoost algorithm are optimized to determine the optimal parameter combination. The optimal number of iterations and tree height of the XGBoost algorithm is mainly adjusted using the grid search (Grid SearchCV) method. The other parameters were adjusted in sequence by step size within the set parameter adjustment range. The final adjustment parameters were as follows ( Table 10).
The feature importance ranking was obtained based on the XGBoost, and the importance scores of the feature variables followed the order R35, crude oil viscosity, Md, LI, well completion method, and FZI. Figure 10 shows that the pore structure characteristics, crude oil viscosity, and rock structure parameters had the greatest impact on production. The pore structure parameters directly controlled the seepage capacity and storage space of the reservoir. 62,63 Crude oil viscosity has a significant influence on oil well productivity. The rock structure parameters indicate the sedimentary facies change of the reservoir and differences in rock and mineral types. The cementation was loose, and the shale particles migrated easily during development and blocked the pore throats. Therefore, the shale content and median particle size also have a significant influence on the productivity of the oil well. The importance of the evaluation parameters is in keeping with the previous geological understanding ( Figure 17). Unconsolidated sandstone heavy oil reservoirs are distributed in major oil-producing regions worldwide (Table 11), mainly in Tertiary strata, and the sedimentary type is mainly a fluvial-delta sedimentary system. Such reservoirs have the characteristics of a shallow burial depth (<1800 m), high porosity and permeability, loose cementation, and large changes in crude oil properties. The research methods and optimal evaluation parameters adopted in this study also have reference significance for other unconsolidated sandstone reservoirs.

| CONCLUSIONS
Based on the definition of the microscopic characteristics of a subject reservoir, this study proposes a deep neural network production prediction method and combines it with the XGBoost algorithm to itemize the main controlling elements affecting oil well production. The data set includes pore structure characteristics, rock structure, and fabric characteristics, fluid properties, and engineering parameters. The results of this study are summarized as follows.
1. The microscopic characteristics of the unconsolidated sandstone reservoir, that is, the comprehensive attribute parameters, quantitatively describe the heterogeneity of the reservoir rock structure and composition and the pore-throat structure parameters. According to the above definition, the reservoir micro-characteristic parameters were selected from the aspects of rock structure and pore-throat micro-structure characteristics, and combined with the reservoir and engineering parameters to obtain good prediction results. 2. Through sensitivity analysis of hyperparameters, a deep neural network model with improved predictability was established. The results of the neural network hyperparameter optimization show that dropout has a low impact on this model. When the hidden layer is 2, the absolute error function is the smallest; the activation function adopts ReLU, the learning rate is 0.01, and the Adam optimizer is used.
Hyperparameters have a significant impact on the accuracy of neural networks, therefore, deep neural network models should be tuned before they are used to produce predictions. 3. Identifying independent variables that have a significant influence on the initial production of oil wells based on the XGBoost algorithm and calculating the feature importance ranking of variables has practical benefits. The results show that the microscopic pore structure, crude oil viscosity, and median particle size are key parameters, further indicating that the microscopic characteristics of the reservoir have a significant impact on the production of oil wells. The viscosity of crude oil in the oil layer in heavy oil reservoirs varies significantly, and both static and fluid property evaluation parameters must be fully considered to reflect actual reservoir performance.