Elastic impedance based facies classification using support vector machine and deep learning

Machine learning methods including support‐vector‐machine and deep learning are applied to facies classification problems using elastic impedances acquired from a Paleocene oil discovery in the UK Central North Sea. Both of the supervised learning approaches showed similar accuracy when predicting facies after the optimization of hyperparameters derived from well data. However, the results obtained by deep learning provided better correlation with available wells and more precise decision boundaries in cross‐plot space when compared to the support‐vector‐machine approach. Results from the support‐vector‐machine and deep learning classifications are compared against a simplified linear projection based classification and a Bayes‐based approach. Differences between the various facies classification methods are connected by not only their methodological differences but also human interactions connected to the selection of machine learning parameters. Despite the observed differences, machine learning applications, such as deep learning, have the potential to become standardized in the industry for the interpretation of amplitude versus offset cross‐plot problems, thus providing an automated facies classification approach.

Machine learning applications are already becoming increasingly widespread in a variety of data-driven industries ranging from financial services to life sciences.The data-rich environment of the oil and gas industry is also becoming an advocate of this emerging technology.De-risking of exploration and development opportunities, using quantitative-interpretation driven by amplitude versus offset (AVO) workflows, is regarded as a crucial step in developing hydrocarbon resources.The quantitative nature of this work often involves the classification of facies using multiple elastic impedances and is therefore ideally suited to machine learning algorithms.
Machine learning algorithms, such as the support vector machine (SVM; Vapnik and Learner 1963;Vapnik 1995) method, have been used extensively in fields such as pattern recognition.There are two main problem-solving capabilities to SVM: classification problems (Vapnik 1995) and regression problems (Smola and Sch ölkopf 2004).In this regard, SVM basically consists of three main elements: linear neural networks (NN), kernel-tricks (Sch ölkopf et al. 1999) and regularizations.The linear NN part consists of two layers (input and output) and broadly imitates the neuron to synapse model of a biological brain.The NN works with the maximalmargin approach that attempts to maximize the margin (most often Euclid distance) between data groups and their decision boundaries in the feature domain.With the kernel-tricks, in addition to linear decisions, non-linear decisions can also be handled.Finally, the regularization helps avoid overfitting problems, which when combined with the NN and kerneltricks provides the general procedure for most SVM applications.SVM's simplicity in implementation provides the main reason as to why it is easily adapted in many applications.Mudge (2014).The field data used in this study is centred on the Avalon discovery in Block 21/6b.Avalon' s reservoir is contained within the Cromarty Member that is predominantly a late stage depositional system restricted to the basin margins.(b) RMS amplitude through the reservoir interval of the Avalon discovery.The line of section, as depicted in Figures 11 and 12, is shown along with the locations of Wells #1, #2 and #3.
Deep learning (DL; Hinton, Osindero and Teh 2006;Hinton and Salakhutdinov 2006), on the other hand, provides a more recent advancement over SVM methods.In the broad definition, DL consists of a multi-layer perceptron, input layer(s), hidden layer(s) and output layer(s), which collectively solves problems (e.g.classification) by automated feature extraction.The perceptron is a self-training algorithm for classifying data (Rosenblatt 1958).The simple (single) perceptron solves linear problems while the multi-layer perceptron has the ability to solve non-linear problems.DL is considered a deep neural network rather than the already established systems such as convolutional neural networks

Input layer
Hidden layer (Kernel) Output layer (Indicator) (CNN being mainly for image-recognition problems) or recurrent neural networks (used mainly for time-series problems such as speech recognition).
Past studies, specifically focusing on AVO, have predominantly used only SVM methods.Kuzma (2003) and Li, You and Liu (2015) presented an AVO regression problem using synthetic wells, whereas Li and Castagna (2004) introduced AVO cross-plot classification using SVM but again only using a synthetic dataset.In the following case study, we show the application of SVM and DL to a field dataset, obtained from a discovery in the UK Central North Sea, for the purpose of lithological classification using elastic impedance cross-plot products.Furthermore, we compare the results from SVM and DL with a more simplified linear projection and a Bayes-based classification approach (Zabihi Naeini and Exley 2017).

F I E L D D A T A
The case study shown in this paper centres on a Paleocene discovery, in Block 21/6b of the UK Central North Sea located at the north-western edge of the Central Graben, just south of the Buchan Field (Figure 1(a)).The discovery was initially identified using conventional, simultaneous, pre-stack, inversion, followed by an exploration well that successfully encountered a 26-m oil column in good quality sands.The reservoir sands lie within the proximal part of the prolific northwest to southeast, late Paleocene, Forties and Cromarty depositional trends, which includes the giant Forties Field.Post discovery, the seismic data were re-inverted but on this occasion using a joint impedance-facies inversion (Kemper and Gunning 2014) to provide updated seismically derived elastic properties calibrated to the discovery well (Well #1 in Figure 1(b)).
The initial discovery well (Well #1) was used to provide the input rock physics data in order to train the support vector machine (SVM) and deep learning (DL) methods detailed in this paper.Post-training the equivalent elastic data (acoustic impedance, AI and V P / V S ) output from the joint impedancefacies inversion was characterized in terms of potential facies using both SVM and DL.A further test was provided by the drilling of an appraisal well (Well #2 in Figure 1

Structural risk minimization
Support vector machine (SVM) learning is based on two fundamental theories termed statistical learning theory and structural risk minimization (SRM; Vapnik 1995).Although statistical learning theory aids in finding the maximal-margin that corresponds to the solution of an optimization problem, SRM provides us with a trade-off between hypothesis and model complexity, which is called the Vapnik-Chervonenkis dimension (VC-dimension;Vapnik and Chervonenkis 1974)  which contains nth number of vectorized samples x i , and their associated labels y.Such a dataset can be written as: We would like to find a learning function that classifies the unlabelled or unknown dataset.The number of feature dimensions in the input domain is m.In this case, the empirical risk R emp ( f ), which we would like to optimize in order to obtain the learning function, can be described as: of both R emp ( f ) and the number of data samples.Following Vapnik (1995), this can be written as: where h is the VC-dimension of the model used to solve the problem, and (1 − δ) is a probability that equation (3) can satisfy.The second term of the right-hand side of equation (3) is called the VC-confidence.From equation (3), one can expect that the VC-confidence gets smaller (thus N increases and h decreases) when R exp ( f ) and R emp ( f ) converge.
In other words, the best model occurs when the sum of R emp ( f ) and the VC-confidence is minimized and should therefore be automatically chosen by SRM.

Optimization problem
Support-vector-machine (SVM) theory, with respect to the optimal hyperplanes that classify the dataset non-linearly in the feature dimension via the so-called kernel-trick, has been presented by Li and Castagna (2004) and Li et al. (2015).
Let us here directly start with the Lagrangian dual-problem regarding the maximal-margin of the classifiers as shown in the following: with the constraints: where α i are the Lagrange multipliers (weights), K(x i , x j ) is the kernel function that represents a dot (inner) product of feature vectors and c is the cost (penalization) parameter of the hinge-loss function that indicates how many misclassifications during the optimization run we can allow.c originally appeared in the primal problem (not shown here) of the dual-problem equation ( 4).Together with the Kuhn-Tucker conditions, the learning (identification) function can be obtained by solving equations ( 4) and ( 5) as follows: where x i are the support vectors for α i = 0 and b is the bias parameter that shifts the classifiers from the origin of the hyperplane solution.In Figure 2, a sketch of neural networks (NN) architecture for SVM is depicted.Since our actual problem is a multi-labelled classification, rather than the binary one, we used the one-against-one classification strategy.
C 2018 The Authors.

Kernel and hyperparameters
There are some parameters and functions that are preconceived in order to train the support vector machine (SVM).
The key parameters are those associated with the kernel function and the hyperparameters.For the kernel function, following Sch ölkopf et al. (1997) who concluded that the radial basis function (RBF or Gaussian) generally performed better than other kernels, RBF is also selected in this study.The RBF can be expressed as: where σ is the kernel width (variance) that determines how far the influence should be reached.Intuitively speaking, we get smoother (linear) classifiers when smaller values of σ are used and vice versa.σ and the c parameters in equation ( 5) are often referred as the hyperparameters of SVM.
In order to determine specific values of these hyperparameters, an exhaustive grid search was carried out followed by a twofold cross-validation (Hsu, Chang and Lin 2003) over the training dataset.Since the number of major hyperparameters is two (σ and c), we were not encouraged to apply a random grid search (Bergstra and Bengio 2012) or Bayesian optimization (Mockus, Tiesis and Zilinskas 1978).After finding the optimized hyperparameters, via the exhaustive grid search, they were then applied to the rest of the data.

TensorFlow
For the deep learning (DL) approach in this study, we used TensorFlow developed and provided by Google (https://www.tensorflow.org).In TensorFlow, the architecture for DL is expressed as a graph that is basically a much simpler representation of a conventional neural networks (NN).Figure 3 shows the graph specifically designed for our problem.The architecture of Figure 3 can be translated into an algebraic equation as:  where q denotes the vector of outputs (the output layer in Figure 3), a denotes the activation function (activator) that introduces non-linearity to the output, p denotes the vector of inputs (the input layer in Figure 3), w denotes the matrix of weights (w i ) whose values can be varied depending on the strength of the synapse (similar analogy to the biological brains neuron connectors) and b denotes the vector of bias (b i ) that acts as a threshold of the neurons excitation.The number of neurons and hidden layers are parts of DL's hyperparameters.Note that, in the nomenclature of TensorFlow these are all different dimensions such that vectors and matrices are all termed tensors with different ranks.Following Heaton (2008), the number of neurons is set to be roughly two-thirds the size of the input layer plus the size of the output layer and the number of hidden layers are set to be two in order to handle any non-linear decision boundaries.

Optimization problem and hyperparameters
Similar to other learning algorithms, deep learning (DL) solves maximization/minimization problems using cost (objective, error and loss) functions.During the training process of the supervised classification in this study, the cost function C, between the expected labels and known labels, is backpropagated (Rumelhart, Hinton and Williams 1986) to the nearest and shallower layer to be updated.Our cost function is therefore a cross entropy between two probability functions; however, this process does not happen without the hidden layers.The updating procedure is commonly carried out by minimizing the cost function using gradient-descent (Svetlana and Solov'ev 1997), which can be written as: with the constraint: where d is the direction of the gradient-decent solver.Generally speaking, the gradient-descent (the first derivatives of the objective function) is easily trapped by local minima.An alternative approach, to avoid being trapped by local minima, is based on the calculation of Hessian matrices (the second derivatives of the objective function) that directly indicate a minimum, maximum or saddle point, thereby providing an escape route and more accurate solutions.However, their computational costs are much more expensive compared to the gradient descent approach alone.Therefore, we used a mini-batch stochastic gradient descent algorithm (Metel 2017) to increase computation speed.The algorithm is a combination of both batch gradient decent and stochastic gradient decent.Batch gradient decent tries to solve the cost function using the whole training dataset, which leads to local minima for non-convex surfaces of the cost function.
The stochastic approach, on the other hand, provides improvement in computational cost and more chances to escape from local minima due to the method's ability to search out  more solutions from multiple directions.In order to solve the cost function in equation ( 9), the weights and bias in equation ( 8) are updated as follows: where η is the learning rate (step size).However, one of the drawbacks of stochastic gradient decent is that the average trend of data redundancy is not measured because the algorithm is always based on one random data point.Therefore, mini-batch stochastic gradient decent combines the advantages from both algorithms providing both an average trend of data redundancy and the ability to escape from local minima.The mini-batch size can therefore be as one of key influencing hyperparameters for DL.It is very difficult to optimize all of the hyperparameters since there (e.g. of layers, the optimizer, the activator, the learning rate, the number, the minibatch size, the regularization and so on) as well as accommodating the trade-off effects among them, which can be an added complication (Li et al. 2015).This issue explains why this optimization is one of the most pertinent research topics in DL.However, there seems to be consensus that the learning rate is one of the most critical hyperparameters (Bergstra and Bengio 2012).7, (e) is the SVM test accuracy with the constrained hyperparametrs as in Figure 9 and (f) is the DL test accuracy as in Figure 8.
The hyperparameters used in the DL workflow were determined as follows.The activators for the hidden layers and the output layers are the rectified linear unit (ReLU, also called the ramp-function) and the softmax.The ReLU in the hidden layers is known to avoid the gradient vanishing problem, whereas the softmax in the output layer provides the probability of each output making the DL approach probabilistic in application.There is no substantial difference between either the sigmoid or the softmax in the output layer if the classification is a binary problem.The iteration number is set as 40,000.The choice of the solver and the depth of the neural networks (NN) are already described above.The remaining two main hyperparameters (the learning rate and the mini-batch size) are optimized by the exhaustive grid search, just as we applied to the support vector machine (SVM), thereby providing a fair comparison.Note that there are many reports about the optimization of the hyperparameters.For instance, while Bergstra and Bengio (2012) suggested that the random grid search is a good choice, Snoek et al. (2015) argued the Bayesian optimization is better.Such comparisons are beyond the scope of this study.

Synthetic test results
The field dataset we intend to classify has four facies (shale, brine sand, hydrocarbon-bearing sand and tuff) that are determined by elastic cross-plot products principally using AI and V P / V S .Before performing the actual application of support vector machine (SVM) with LIBSVM .These data were derived from a joint-impedance facies inversion and used as the input to the various classification methods as shown in Figure 12.An anomalous response, in terms of decreased AI and V P/ V S , is visible at the top of the section and corresponds to a hydrocarbon accumulation encountered within Wells #1 and #2.
(http://www.csie.ntu.edu.tw/˜cjlin/libsvm),we tested characterization feasibility with a synthetic dataset in order to check whether the elastic parameters provided sufficient input to identify the four facies.The synthetic dataset has been generated by a randomizer to realize the four-labelled facies.The number of training data (supervisor) and test (classified) data per label is 1000 and 50, respectively.Figure 4(a) shows the cross-validation value obtained between σ and c using the synthetic dataset.The cross-validation was done by the exhaustive grid search with the global-maximum value being found successfully (Figure 4(a)).The test accuracy of the classification was found to be 76.5%.With these given hyperparameters, SVM was performed as shown in Figure 4(b).The classifiers in Figure 4(b) appeared to not be too hard (linear wise) or be too soft (non-linear wise), which gave us some confidence that the SVM worked in terms of chosen hyperparameters.
Using an identical dataset, as input to the SVM modelling in Figure 4, we employed the same synthetic test for the deep learning (DL) approach (Figure 5).  5 (DL) appear to be visually similar.The computation time of the DL approach is, however, 1.6 times faster than that of the SVM approach when all the hyperparameters are fixed in our environment.
To objectively compare the performance between the SVM and DL approaches, when the above synthetic data are used, we plot confusion matrices for the training and test accuracies in Figure 6.The percentiles for each bin are also shown in Figure 6 and provide the prediction accuracy per target class when the sample numbers in the same bin are used.

Field data results
In Figures 7 and 8, we show the classification results of the elastic impedance cross-plot products applied to support vector machine (SVM) and deep learning (DL) using the Avalon discovery well (Well #1 in Figure 1).Wells #2 and #3 were not used for the training process, but instead acted as blind tests, in order to objectively test the seismic upscaling of the SVM and DL facies classification outputs.The ratio of the training to test data was fixed at 0.7.Although the hyperparameters of the SVM approach seem to be sufficiently optimized via cross-validation (Figure 7(a)), the classifications in Figure 7(b) appeared to be over-fitted (exhibit strict decision boundaries) with a classification accuracy of 84.8%.As pointed out by Kuzma (2003) previously, higher penalization values can be numerically unstable and for this reason we therefore limited their range in Figure 9.This range limitation yielded softer decision boundaries (Figure 9 As we made confusion matrices with the synthetic data (Figure 6), we also show equivalent matrices in Figure 10 when SVM and DL are applied to the field data.However, based on the matrices, we do not observe any major differences between the approaches.

Seismic upscaling
For the purpose of demonstrating practical application and validation of the obtained results against the blind test wells (Wells #2 and #3), we apply the SVM and deep learning (DL) trained classifiers, as shown in Figures 9(b) and 8(c), to actual seismic data.Figure 11 shows the amplitude versus offset (AVO) inversion derived AI and V P / V S used as input to allow the upscaling away from Well #1.It is noted that areas with low AI and low V P / V S (labelled in Figure 11) correspond to proven hydrocarbon bearing facies.In order to determine any substantial differences between more conventional facies classification methods and the SVM or DL approaches, we compare four different classifications: linear projection, Bayes, SVM and the DL solution, as shown in Figure 12.We normalize the seismic input in the same fashion to the elastic impedances from the well data to ensure a consistent scale.The linear-regression line (Projection = (uAI + v) − V P / V S (where u denotes the gradient and v denotes the intercept of the regression line)) is calculated by gradient decent.The Bayes result is calculated by the facies-based seismic inversion (Zabihi Naeini and Exley 2017) and uses probability distribution functions to determine facies in elastic impedance cross-plot space.Note that the result of the Bayes approach (Figure 12(b)) is representatively taken from where we have strongest confidence from the corresponding probability density functions (the maximum a posteriori).The main formation tops around the reservoir and Vshale (the volume of shale) at the three wells are also plotted in Figure 12.

D I S C U S S I O N
One of the most obvious observations from Figure 12 is that the four different methods generally exhibit similar results.In particular, the results of the linear projection (Figure 12(a)) and the deep learning (DL) approach (Figure 12(d)) appear to be most similar compared to other comparisons.However, it can be noted that the linear approach wrongly identified hydrocarbons within the Base Bittern Sand at Well #1 (Figure 12 To further check the performance of the SVM and DL approaches, we applied AI and V P / V S derived from blind Wells #2 and #3 to the predicted classification in Figure 13.Although the test accuracy in Figure 10 indicated similar performance, the classification of the brine sand in Figure 13(a) seems to be overestimated, around the normalized AI of 0.1 and V P / V S of 0.5, where hydrocarbons should have been classified (labelled in Figure 13).This would explain why the SVM result in Figure 12(c) shows a decreased hydrocarbon column compared to what was actually encountered in the wells or the DL result in Figure 12(d).In general, the SVM result suffers from an overly strict classification, with marginally more constrained decision boundaries, compared to the DL result.However, both the SVM and DL results could converge with additional training using a wider range of input training data, more representative of general rock physics trends.
It is natural to expect that better results can be obtained when more training wells, for instance Wells #2 and #3, are used.However, such improvements are not necessarily guaranteed for the linear approach because it cannot handle nonlinear (or complex) decision boundaries.On the other hand, while the Bayes approach is more suited to more complex decision boundaries, that is where different facies overlap in elastic cross-plot space, it requires good prior information to determine appropriate probability distribution functions.Whilst the SVM and DL approaches do not require this prior knowledge, they do not take into account any rock physics trends that may be known outside of the immediate input data range provided by the learning data.This is in contrast to both the linear and Bayes based methods that are orientated to capture general compaction/porosity/depth trends.
One potential way to increase DL's performance further is to use data augmentation, especially when the available data are limited.In the case of convolutional neural networks (CNN), augmenting image samples by rotating, stretching, shrinking, flipping, adding noise and so on is common (Okafor et al. 2017).These processes can be applied either in the original or feature domain.However, since DL automatically and implicitly extracts relevant features from the input data, one might, for instance, try to use Vshale, density, V P , V S or other rock property trends such as the Castagna equation (Castagna, Batzle and Eastwood 1985).These inputs could be directly used in the first layer of DL, or pre-training, to extract features via an auto-encorder (Baldi and Hornik 1989).
Although all of the details regarding the well correlations are not disclosed here, the overall correlation of sand, shale, tuff and oil sand facies compared with the seismically derived classifications is qualitatively reasonable, as shown in Figure 12 despite differences in vertical resolution.It is, however, difficult to decisively choose which one of the classification methods might be superior because each method is inherently subject to different weaknesses and ultimately classification errors.For example, as briefly described above, the linear regression can be too simplistic when there is significant overlap of different facies types in elastic cross-plot space.Whilst the Bayes-based method can treat such overlapping scenarios better it does, however, require good prior information that may not be known.Also, as discussed previously with the SVM and DL approaches, appropriate optimization of the hyperparameters is essential and also subject to error.Nonetheless, we find great potential in the application of the DL method with respect to elastic impedance cross-plot classification as given sufficient training one can expect to achieve a high level of automation within inversion workflows, whilst also reducing reliance on human interactions and rock physics assumptions.

C O N C L U S I O N S
This paper presents support vector machine (SVM) and deep learning (DL) facies classification examples using well-derived elastic impedances from the UK, North Sea.Additionally, the SVM and DL methods were also upscaled and applied to equivalent elastic outputs of an amplitude versus offset (AVO) inversion applied to a seismic field dataset covering the well data.Although the SVM and DL approaches provided similar results with a simple synthetic input, there were obvious differences when upscaled to the seismic data.Such differences seem to be connected to the variation in optimized hyperparameters between the SVM and the DL approaches.Even though the SVM approach provided similar training accuracy, compared to DL, the DL approach showed visually more realistic results and better correlation with the well data.The similarities of the 'automated' SVM and DL results when compared to established 'manual' classification methods such as linear projection or Bayes-based classification are encouraging and suggest that machine learning approaches such as DL have the future potential to guide us towards automated quantitative interpretation, whilst also mitigating subjective human interactions.
C 2018 The Authors.Geophysical Prospecting published by John Wiley & Sons Ltd on behalf of European Association of

1040C 2018
The Authors.Geophysical Prospecting published by John Wiley & Sons Ltd on behalf of European Association of Geoscientists & Engineers.This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.

Figure 1
Figure1(a) Regional map showing study area and the deposition of Dornoch, Cromarty and basin wide Paleocene Forties sandstones.The depositional extents are modified fromMudge (2014).The field data used in this study is centred on the Avalon discovery in Block 21/6b.Avalon' s reservoir is contained within the Cromarty Member that is predominantly a late stage depositional system restricted to the basin margins.(b) RMS amplitude through the reservoir interval of the Avalon discovery.The line of section, as depicted in Figures11 and 12, is shown along with the locations of Wells #1, #2 and #3.

Figure 2
Figure 2 Schematic of the NN architecture used by the SVM approach in this study.The original data x in the input layer are classified with the indicator i, via the kernel K, with the weight w, and bias b.
(b)), which along with an older well drilled outside of the discovery (Well #3 in Figure 1(b)) provided two 'blind tests' for the SVM and DL facies classification.

Figure 4
Figure 4 Synthetic elastic impedance cross-plot (AI and V P/ V S ) results from the SVM where (a) is the cross-validation (CV) of the hyperparameters and (b) is the classification.The filled circles are the supervisors (training data), the hollow circles are the classified (test data) and the crosses are the ground truth.The hollow orange circles are the support vectors that were used to establish the boundaries in this realization.S, B, H and T in the legend box stand for shale, brine-bearing sand, hydrocarbon-bearing sand and tuff facies.The four different background colours correspond with the classified facies labels.Confusion matrices of the training and test accuracies are shown in Figure 6(a) and (c).

Figure 5
Figure 5 Synthetic elastic impedance cross-plot results from the DL approach where (a) is the cost function when the optimized hyperparameters are used, (b) is the lowest costs between the learning and modelled outputs when different hyperparameter values are used and (c) is the classification.The legend in (c) is the same as is described in Figure 4 with omission of the support vectors, which are not applicable to DL. Equivalent confusion matrices of training and test accuracies are shown in Figure 6(b) and (d).

Figure 6
Figure 6 Confusion matrices from the synthetic data classification where (a) is the SVM and (b) is the DL training accuracies, whilst (c) is the SVM and (d) is the DL test accuracies.The percentile in each bin is the prediction accuracy between the target and result classes.The integer in each bin is the number of counts/samples used.

Figure 7
Figure 7 Elastic impedance cross-plot results from the SVM approach using the field data where (a) is the CV of the hyperparameters and (b) is the classification.The legend in (c) is the same as is described in Figure 4. Confusion matrices of the training and test accuracies are shown in Figure 10(a) and (d).

Figure 8
Figure 8 impedance cross-plot results from the DL approach using the field data where (a) is the cost function when the optimized hyperparameters are used, (b) is the lowest costs between the learning and modelled outputs when different hyperparameter values are used and (c) is the classification.The legend is the same is described in Figure 5. Equivalent confusion matrices of the training and test accuracies are shown in Figure 10(c) and (f).

Figure 9
Figure 9 Elastic impedance cross-plot results from the SVM approach using the field data where (a) is the CV of the hyperparameters and (b) is the classification.The legend is the same as is described in Figure 4.The range of the hyperparameters are constrained when compared to Figure 7. Confusion matrices of the training and test accuracies are shown in Figure 10(b) and (c).

Figure 10
Figure 10 Confusion matrices from the field data classification, where (a) is the SVM training accuracy as in Figure 7, (b) is the SVM training accuracy with the constrained hyperparameters as in Figure 9, (c) is the DL training accuracy as in Figure8, (d) is the SVM test accuracy as in Figure7, (e) is the SVM test accuracy with the constrained hyperparametrs as in Figure9and (f) is the DL test accuracy as in Figure8.

CFigure 11
Figure 11 Cross-sections (location shown in Figure 1(b)) depicting the output seismically derived elastic impedances, where (a) AI and (b) V P/ V S.These data were derived from a joint-impedance facies inversion and used as the input to the various classification methods as shown in Figure12.An anomalous response, in terms of decreased AI and V P/ V S , is visible at the top of the section and corresponds to a hydrocarbon accumulation encountered within Wells #1 and #2.
Figure 5(a) shows the cost function per iterations when the optimized hyperparameters from the grid search are used.In Figure 5(b), we show the cost value associated with the learning rate and the minibatch size.The results indicate that the global minima was found (Figure 5(b)).Finally, the classifier results are shown in Figure 5(c).The test accuracy of the classification is found to be 77.0%,which is nearly same as to what was achieved with the SVM approach.Both the results in Figures 4 (SVM

Figure 12
Figure 12 Cross-sections (location shown in Figure 1(b)) depicting the seismic upscaling output from the four main facies classification methods where (a) is the linear, (b) is the Bayes, (c) is the SVM and (d) is the DL approach.The Vshale logs are also plotted for each of the wells (yellow for sand and brown for shale).Note the Vshale logs are shown without being filtered back to the seismic's vertical resolution in order to illustrate the marked difference in vertical resolution between the seismic and well data.

Figure 13
Figure 13 Blind test of elastic impedance cross-plot products from Wells #2 and #3, whose locations are given in Figure 1(b).The classification is based on Well #1 by (a) the SVM and (b) the DL approaches, which are identical to the results already shown in Figures 9 and 10.The arrow marks the area, corresponding to Well #2, where hydrocarbons should have been classified instead.This SVM misclassification resulted in a decreased hydrocarbon column around Well #2, Figure 12(c), compared to what was actually encountered.
(a)); conversely this error does not present itself in the DL result (Figure 12(d)).The Bayes result (Figure 12(b)) has better signal-to-noise ratio compared to the other methods with good lateral continuity of potential sedimentary layers but had the poorest correlation with the wells particularly the Cromarty Sand interval in Well #3.The support vector C 2018 The Authors.Geophysical Prospecting published by John Wiley & Sons Ltd on behalf of European Association of Geoscientists & Engineers.Geophysical Prospecting, 67, 1040-1054 machine (SVM) result (Figure 12(c)) shows better correlation within the deeper section but a reduced hydrocarbon column compared to what was actually encountered in the wells, which was due to the harder decision boundaries in Figure 9(b) (SVM) compared to Figure 8(c) (DL).