Acquisition design for direct reflectivity and velocity estimation from blended and irregularly sampled data

Blended acquisition along with efficient spatial sampling is capable of providing high‐quality seismic data in a cost‐effective and productive manner. While deblending and data reconstruction conventionally accompany this way of data acquisition, the recorded data can be processed directly to estimate subsurface properties. We establish a workflow to design survey parameters that account for the source blending as well as the spatial sampling of sources and detectors. The proposed method involves an iterative scheme to derive the survey design leading to optimum reflectivity and velocity estimation via joint migration inversion. In the workflow, we extend the standard implementation of joint migration inversion to cope with the data acquired in a blended fashion along with irregular detector and source geometries. This makes a direct estimation of reflectivity and velocity models feasible without the need of deblending or data reconstruction. During the iterations, the errors in reflectivity and velocity estimates are used to update the survey parameters by integrating a genetic algorithm and a convolutional neural network. Bio‐inspired operators enable the simultaneous update of the blending and sampling operators. To relate the choice of survey parameters to the performance of joint migration inversion, we utilize a convolutional neural network. The applied network architecture discards suboptimal solutions among newly generated ones. Conversely, it carries optimal ones to the subsequent step, which improves the efficiency of the proposed approach. The resultant acquisition scenario yields a notable enhancement in both reflectivity and velocity estimation attributable to the choice of survey parameters.


I N T R O D U C T I O N
During the last decade, blended acquisition has realized the industry's ambition towards efficient and cost-effective seismic operations that still attain the required data quality (Beasley, Ronald and Jiang 1998;Berkhout 2008;Bouska 2010;Abma et al. 2012;Nakayama et al. 2015). Furthermore, the enhancement in the survey productivity contributes to minimizing health, safety and environment (HSE) exposure in the field. * E-mail: s. nakayama@tudelft.nl; shotaro.nakayama@inpex.co.jp In general and where applicable, separation of overlapping source-wavefields comes to the initial step to process blended data rather than processing them directly (Moore et al. 2008;Lin and Herrmann 2009;Mahdad, Doulgeris and Blacquière 2011;Kontakis and Verschuur 2015). Deblended data subsequently enable us to use standard processing and interpretation practices.
While conventional seismic surveys aim ideally at regular and dense sampling to satisfy the Shannon-Nyquist theorem, further easing the spatial-sampling requirements contributes to the business aspect, and reduces the environmental This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited. footprint. For example, compressive sensing has recently been successfully implemented in the industry (Hennenfent and Herrmann 2008;Herrmann 2010;Mosher, Kaplan and Janiszewski 2012). This technique allows for a non-uniform spatial sampling along with fewer measurements in the field. Subsequent signal recovery from compressive measurements is generally posed as an underdetermined inversion problem. To solve it, compressive sensing exploits the sparsity of signals in a proper transform domain, allowing a small number of coefficients to sufficiently represent the signals that one attempts to obtain. The improvement in spatial-sampling efficiency, coupled with optimal data recovery, potentially enables us to design seismic surveys that satisfy demanding geophysical objectives at an affordable cost.
An alternative approach to handle blended data is to apply imaging without deblending (Tang and Biondi 2009;Verschuur and Berkhout 2011;Berkhout, Blacquière and Verschuur 2012;Chen et al. 2015;Caporal, Blacquière and Davydenko 2018). Instead of directly migrating blended data, these studies utilized a least-squares migration (LSM) scheme. The formulation of the imaging problem as a least-squares problem enables LSM to iteratively minimize the misfit between the real and the modelled blended data, which consequently produces the subsurface reflectivity without the separation of blended wavefields. Insufficient spatial sampling in the acquisition often induces aliasing noise in migration. A common practice to address this issue is to apply data reconstruction prior to imaging, yet alternatively, LSM can be used for this purpose. Nemeth, Wu and Schuster (1999) showed that the technique is capable of suppressing migration artefacts and producing images with balanced amplitudes and high resolution even when the input data suffer from coarse and/or irregular spatial sampling. Full-waveform inversion (FWI) is capable of using acquired seismic data to retrieve the subsurface properties that determine the seismic wavefield in an iterative manner (Tarantola 1984), that is, blended records can be directly used in FWI, which attempts to minimize the misfit between observed data and forward-modelled blended data (Florez, Mantilla and Ramirez 2016). Due to the heavy computational burden in FWI, the concept of blending is considered as one of the crucial strategies to improve its efficiency, even when the acquired data were not blended. It combines individual shot records into supershot records which are then simulated during waveform forward modelling and the residuals are back-propagated for gradient calculation. This implementation ideally accelerates the inversion process with a factor of the number of shots assembled in the supershot record.
A fundamental drawback in estimating reflectivity or velocity directly from blended data is the crosstalk noise that arises from the interference of overlapping wavefields from multiple sources. One way to minimize this issue is to apply encoding to the sources to be blended both for LSM and FWI, see, for example, the studies by Krebs et al. (2009), Boonyasiriwat andSchuster (2010), Ben-Hadj-Ali, Operto and Virieux (2011), Jeong et al. (2013) and Huang and Schuster (2012). In these studies, different encoding schemes were implemented, such as the use of random time shifts, frequency scheduling, amplitude encoding, selection of source locations used for the inversion or combinations thereof. However, the inadequacy in spatial sampling still deteriorates inversion results (Ben-Hadj-Ali et al. 2011;Aldawood, Hoteit and Alkhalifah 2014). Boonyasiriwat and Schuster (2010) applied a random distribution of sources to minimize the crosstalk noise in FWI. Wang et al. (2014) confirmed the improvement of FWI results with irregularly decimated data compared to regularly decimated data. These studies infer that designing a survey incorporating irregularity potentially leads to effective reflectivity and/or velocity estimation when one aims to directly process blended and irregularly sampled data. This paper, hence, introduces a survey-design workflow that iteratively optimizes the survey parameters related to both blending and spatial sampling of detectors and sources, leading to satisfactory reflectivity and velocity estimation via joint migration inversion (JMI). JMI is a method that iteratively estimates a high-resolution subsurface reflectivity model as well as a migration velocity model by estimating two independent operators responsible for reflection and propagation (Staal and Verschuur 2013;Berkhout 2014b). We extend the standard JMI scheme to handle blended and irregularly sampled data, which is then incorporated into the proposed surveydesign scheme. The workflow utilizes errors in reflectivity and velocity estimates from the JMI process for given survey design. They are assigned to its objective function and are subsequently input into a survey-parameter update system based on the integration of a genetic algorithm (GA) and a convolutional neural network (CNN). Stochastic operators in the GA that imitate the theory of natural evolution allow for simultaneous update of the blending and sampling operators towards optimum JMI results. The implementation of the CNN aids the population management in our GA by selecting optimum designs while discarding suboptimal ones among newly generated solutions from genetic operators. Only classified designs in the CNN are fed into the subsequent step that involves the evaluation of the objective function through JMI. Since JMI is computationally expensive, this use of an antecedent classifier effectively prevents wasteful computation incurred to suboptimal solutions, which consequently enhances the performance of the overall survey-design scheme. We also incorporate the dispersed source array (DSA) concept (Berkhout 2012a) in the proposed survey-design scheme. This technique utilizes a plurality of source types, each having a dedicated narrow bandwidth. This permits each narrowband source to be independently deployed to satisfy its own spatialsampling criteria determined by its frequency range. Caporal et al. (2018) extensively discussed its advantages in terms of geophysical, operational and environmental perspectives.
In our acquisition design, the aim is to find the optimum blending and spatial-sampling schemes, contributing to efficiency, economics and HSE, that satisfy the geophysical objectives without relying on an expensive survey acquired in an unblended and regularly well-sampled fashion. Our main objective in this paper is twofold: (1) to explore the effect of the choice of survey parameters on the performance of JMI and (2) to illustrate the proposed survey-design workflow. Numerical examples provide the results of our approach that aims to enhance the JMI results by simultaneously updating different survey parameters involved in DSA acquisition.

S U R V E Y -D E S I G N F R A M E W O R K
Using the matrix notation proposed by Berkhout (1982), the monochromatic expression of seismic data is given by (1) Each column and row of the data matrix, P(z d ; z s ), correspond to a common shot record and a common detector gather, respectively, each being measured by detectors at depth z d and sources at depth z s . The entire seismic data volume can be stored by the collection of P(z d ; z s ) for each frequency component. Matrices D(z d ) and S(z s ) are the detector and source matrix, respectively. The columns in D(z d ) represent the spatial detector locations whereas the rows in S(z s ) are the spatial source locations. Matrix X(z d , z s ) is the Earth transfer operator containing the subsurface response including propagation effects and (multiple) reflections. The elements of each matrix contain the amplitude and phase information for one frequency component. In the following, we assume the detectors and sources to be located at the same depth, that is, z s = z d = z 0 . According to Berkhout (2014a), X(z 0 , z 0 ) can be approximated by where W − (z m−1 ; z m ) and W + (z m ; z m−1 ) account for the upward and downward propagation, respectively, between depth levels z m−1 and z m .  Berkhout (2014a), the assumption of angle-and frequency-independent reflectivity along with negligible mode conversion allows for the following approximations to be used: where R ∩ (z m ) is the up-down reflectivity operator. For notational simplicity, the depth index and the superscripts indicating the directions of wave propagation and reflection are hereinafter omitted. As for the design of D and S, our primary focus is on the effect of spatial sampling. We, therefore, assume the application of a point detector and a point source, each having a delta-functioned response. This simply makes all the non-zero elements of D and S one. In addition, the distributions of zero and non-zero elements in D and S depict the spatial locations of the detectors and sources, respectively. In an ideal acquisition geometry, that is, carpet detectors and sources, D and S become identity matrices. In this case, P equals X, and, consequently, optimum retrievals of R and W from P are expectantly attainable. However, in practice, operational and economic constraints lead to imperfections in the spatial sampling, which means that D and S are no longer identity matrices. This eventually results in the erroneous estimation of R and W. Therefore, we need to properly design survey parameters to obtain reliable estimates of R and W, which is targeted in our acquisition design strategy. Berkhout (2008) proposed the theoretical framework of source blending by introducing a blending matrix, . Based The survey-design workflow. The iterative scheme aims to output the blending and sampling operators that can provide the optimum deblending and reconstruction quality. The workflow starts with R and W. The forward process in the blue-filled step derives P using FWMod. The inverse process in the red box then estimates R and W from P using JMI. The procedure stops if the misfits between R and R as well as between W and W are sufficiently small or the maximum number of iterations is reached. If not, D, S and are updated in the green box and subsequently used in the next iteration. on equation (1), the forward model of blended data, P , can be expressed as Each column and row of represent a blended experiment and a source location, respectively. One column of contains the blending code of the sources involved in one blended experiment, and element i, j for frequency ω is given by where a i, j and φ i, j are the amplitude term and phase term corresponding to the i th source of the j th blended experiment. Any type of source signature can be encoded in equation (6). In the case of simple time delays τ i, j , the corresponding phase shift can be expressed as φ i, j = ωτ i, j . Our forward-modelling scheme is also applicable to the dispersed source array (DSA) concept, where a plurality of narrowband sources with different centre frequencies accounts for the total blended wavefield. Being formulated in the frequency domain, equations (2) and (5) imply that starting with R and W, various blending and spatial-sampling schemes can be modelled by designing D, S and . Figure 1 illustrates the proposed survey-design workflow to find optimum D, S and , while we assume R and W, which contain all relevant information of the subsurface, to be available as prior information. Such information is available, for example, at the development or production stage, or when data from a previous acquisition and appropriate well logs are available. It means that the approach is subsurface dependent. The blue-filled step in Fig. 1 corresponds to the forward process to generate blended data P from R and W along with the current survey parameters, D, S and . This forwardmodelling engine is called full-wavefield modelling (FWMod) (Berkhout 2014a). In FWMod, scattering is assumed to take place at each depth level via the reflection operator R, whereas in between each level the wavefields propagate via the propagation operator W.
The red-filled step is the inverse process, that is, joint migration inversion (JMI), to obtain R and W from P , where the angled brackets and indicate estimation. JMI iteratively derives both reflectivity and velocity models using FWMod as its forward engine (Berkhout 2014b).Therefore, in this study, the same FWMod engine is used for both blue-filled and redfilled steps in Fig. 1. It should be noted that proving the viability of the JMI algorithm is not the objective of this study. Our main focus is on exploring and understanding the effect of acquisition design on the performance of JMI. Hence, we utilize this idealized condition in which the choice of survey parameters can be a major factor determining the quality of estimated subsurface properties. The implementation of FWMod and JMI in our study is based on Staal and Verschuur (2013) and Staal (2015), which updates and estimates one single reflectivity parameter per grid point in R and one single slowness parameter per grid point in W. We consider the underlying assumptions of acoustic wavefield behaviour along with angle-and frequency-independent reflectivity made in this implementation to be valid in this study. Staal (2015) suggested the JMI framework can be extended to blended acquisition. In this study, we formulate the objective function in JMI as Here, we feed P directly into the JMI process. We perform JMI on the spatially regular and well-sampled detector grid determined by X. Each blended and irregularly sampled shot record can be stored on this grid, yet the information is only present at active detector locations defined by D. Note that this choice means that our solution is spatially irregular in the sense that not all grid points contain detectors. However, all detectors coincide with grid points. By taking the grid size in X sufficiently small, the consequences of this choice can be largely mitigated. Together with the current reflectivity and slowness estimates, FWMod simulates each blended shot record based on the given S and at the regular, well-sampled detector grid. By selecting traces from the modelled shot records according to D, P can be obtained. This subsequently allows us to compute the residual between P and P as described in equation (7). The JMI procedure iteratively minimizes the data misfit via a gradient descent scheme. At each iteration, reflectivity and propagation operators are updated by cross-correlating wavefields, that is, the back-propagated residual wavefield and the forwardmodelled wavefield, in the opposite directions and in the same direction, respectively (Verschuur, Staal and Berkhout 2016). In our case, empty traces corresponding to detector locations that are not present in D are also used in the back-propagation in JMI along with the forward-propagation of blended sourcewavefield determined by S and .
The overall acquisition-design scheme attempts to minimize the residue between R and R as well as W and W : where j is the objective-function vector containing errors in R andŴ. The termŴ represents a scalar velocity field in the space-depth domain derived from a propagation operator, andR is a scalar reflectivity field converted to the space-time domain such that any undesired effects from errors in W on J R can be avoided. The green-filled step in Fig. 1 updates the blending and sampling operators that are subsequently carried into the next iteration. The procedure stops once the objective function is sufficiently small, or the maximum number of iterations is exceeded. Therefore, our approach iteratively computes the acquisition design parameters D, S and that minimize the objective-function vector, meaning that optimum reflectivity and velocity estimates can be obtained. With the proposed survey-design scheme, the variation of the JMI results is assumed to be attributable fundamentally to the choice of survey parameters.

S U R V E Y -P A R A M E T E R U P D A T E
Designing survey parameters in an irregular fashion requires extensive efforts as it inherently makes the parameter selection problem huge, unlike a conventional survey that generally employs regularly positioned detectors and sources along with a spatially uniform source signature. To manage the large problem space, we presented a survey-design scheme utilizing a genetic algorithm (GA) (Nakayama et al. 2019). Together with a constraint on the parameter space, a so-called repeated encoding sequence, this scheme aimed at deriving survey parameters allowing for optimum deblending and data reconstruction quality. In this study, we extend the approach to obtain blending and sampling operators that optimize joint migration inversion (JMI) results. Owing to its ability to manage optimization problems with non-convexity, non-differentiability, the existence of many local minima and large problem space, GAs have successfully been implemented in various application domains (Monteagudo and Santos 2015;Perez-Liebana et al. 2015;Bak, Rask and Risi 2016;Davies et al. 2016;Scirea et al. 2016). However, a standard GA inherently and inevitably needs to evaluate all the solutions to obtain their objective-function values. The evaluation of suboptimal solutions, that is, those that do not contribute to the next generation, potentially makes GAs computationally expensive.
A number of studies have shown the integration of metaheuristics and neural networks. In spite of recent successes in neural networks (Hinton et al. 2012;Leung et al. 2014;Schmidhuber 2015), advanced architectures have become deeper and more complex. Designing network architectures is still a challenging task due to the existence of numerous parameters. In this respect, a typical approach is to explore and design efficient network architectures using GAs (Ahmadizar et al. 2015;Xie and Yuille 2017). The other way around, Kramer (2016) outlined ways to support and accelerate evolutionary algorithms with the learning process. In various engineering domains, determining the objective function of a single candidate takes a long time to execute due to the use of finite element analysis and computational fluid dynamics simulations. To deal with such computationally expensive optimization problems, a viable option is to use a model that can approximate the outcomes from the fitness evaluations at relatively low computation cost and to limit the number of real evaluations (Tenne and Goh 2010). Kramer (2017) illustrated a workflow to embed a supervised learning model into a GA where the real fitness evaluations are performed only to the solutions that fulfil criterion in the predictive model. This way of integration is easily recognizable in different engineering applications (Marcelin 2004;Sreekanth and Datta 2011;Ibaraki, Tomita and Sugimoto 2015;Sato and Fujita 2016;Sakaguchi et al. 2018). Exploiting a strategy from these engineering problems is certainly valuable to our survey-design problem because a large computation effort to calculate the objective function inherently restricts the number of fitness evaluations in our case. Hence, this study implements a convolutional neural network (CNN), which aims primarily to aid the population management in our GA and subsequently reach a satisfactory solution with affordable turn-around time.
As described in Algorithm 1, the proposed approach updates a set of parameter vectors representing D, S and . The vectors for the pth individual solution in the gth generation are given by Here, d g, p and s g, p are binary vectors indicating detector sampling and source sampling operators for the pth individual solution in the gth generation, each of which can be expressed as and Their dimensions, l d and l s , are equal to the number of detectors and sources in X. As described previously, the assumption of point detectors and point sources allows distributions of zero and non-zero elements to explicitly indicate their spatial coordinates. With the use of delta-functioned detector and source responses, D and S can be simply expressed by binary vectors using equations (11) and (12). Vector γ is a binary vector representing the blending operator for the pth individual solution in the gth generation. Parameter l g equals the required bit length to parametrize a given blending code per single source. In our case, a binary string for a given source with a length of l g accounts for the amplitude term and the phase term given by equation (6). Since the spectral properties of the dispersed source array (DSA) sources are predetermined, the information on the source type is included in the frequency-dependent amplitude term.
Let n g and n p represent the numbers of generations and populations. The population in the gth generation, c g , consists of a set of parameter vectors, having n p individuals as The first step is to randomly generate the initial population, c 0 , across a given problem space: Each parameter vector in the initial population, c 0, p , then goes to the forward process (going to P 0, p from R and W along with D 0, p , S 0, p and 0, p ). The inverse process subsequently follows (going to R 0, p and W 0, p from P 0, p ). Using equation (8), a set of objective functions for the initial population, j 0 can be obtained by Using c 0 and j 0 , we train the CNN. As shown in Fig. 2, the neural network architecture in this study consists of layers that are commonly used in CNN architectures: three pairs of 2D convolutional layers (LeCun et al. 1998) and ReLU (rectified linear unit) layers (Hahnloser et al. 2000), followed by a fully connected and softmax layer (Bishop 2006). In a convolutional layer, we apply zero padding and a stride of one along the height and width dimensions. The channel size and kernel size of each layer are given in Fig. 2. In this study, we select square shape kernels. Instead of down-sampling feature maps within the CNN via a pooling layer, we first arrange Compute selection probability of each solution in c g−1 8: while stopping criterion not met do 10: Select two parental solutions from c 0:g−1 using roulette wheel selection 11: Perform crossover operation to generate c g,2p−1 and c g,2p

12:
Evaluate c g,2p−1 and c g,2p with the CNN trained from g − 1 th generation Generate D g,p , S g,p and Γ g,p from c g,p (= d g,p , s g,p , γ g,p T )

25:
Generate P g,p from R and W along with D g,p , S g,p and Γ g,p via FWMod

26:
Estimate R g,p and W g,p from P g,p via JMI

27:
Compute reflectivity and velocity errors using equation 8 The scheme to update survey parameters. Genetic operators update D, S and simultaneously. Newly generated parameters then go to the CNN. Survey parameters are regenerated till the classification result meets the criterion (the expected residue between R and R as well as between W and W ). Only selected solutions go to the actual evaluation that involves JMI. Designs having smaller objectivefunction values form a new generation. Additionally, the evaluation results are used to train the CNN applied to the classification step in the next iteration. The diagram at the right depicts the applied network architecture. Conv means a convolutional layer with two numbers indicating the channel size and the kernel size, respectively. ReLU indicates a rectified linear unit layer and FC a fully connected layer.
blending and sampling operators into a 3D matrix, Y, its size being c w × c h × N, where c w and c h are the width and the height of Y. They equal the numbers of columns and rows in X, respectively. N equates to the number of DSA source types. For the pth individual in the initial population, the nth feature map of Y is given by Matrix S n contains the spatial locations of the nth source type of the DSA acquisition, λ n is a user-defined weight parameter and vector τ contains information on the activation time applied to each source. Matrix L is the forward transform operator to the wavenumber domain. Since we assume delta-functioned detector and source responses, Y becomes frequency independent, leading to a reduction in the data size. Although a deeper and more complex network presumably enables us to directly input D, S and into the CNN, for now this mathematical operation helps to achieve reasonable classification results with an affordable computation time. After the initial iteration, the CNN is embedded into our GA. As mentioned, its task is to exclude individual solutions that are likely to not satisfy the predetermined threshold criterion for proceeding to the next generation.
To 'minimize the vectorized objective function in equation (8), we utilize non-dominated sorting and crowding distance approaches (Deb et al. 2002). Algorithm 2 illustrates the procedure to assess the individual solution. It starts with finding the first-rank solutions. This involves a calculation of two entities: (1) domination count q g, p (the number of solutions dominating a given solution, c g, p ) and (2) a set of individuals, q 0 g, p , being dominated by c g, p . For all solutions with rank one, their domination counts become zero. For each member dominated by these solutions, its domination count is reduced by one. Any members having a zero domination count obtain rank two. This procedure is repeated until the ranks of all solutions are identified.
We then derive the selection probability of each solution according to its rank as with r g, p = exp(−βRank g, p /max where β is a scalar controlling the diversity of the selection. The new population is formed by elitism replacement based primarily on the domination rank. Additionally, to discriminate solutions with the same rank, we analyse their crowding distances. An infinite crowding distance is given to two solutions within the tth rank having the maximum and the minimum value of a given objective function (in our case either J R or J W ). Crowding distance values for other solutions are then calculated by the sum of individual distance values corresponding to each objective function as where f t is a set of solutions belonging to the tth rank and n t is the number of solutions in f t . This ensures the diversity of if c k,h ≺ c i,j then 5: f t = q 29: end while the new generation by assigning a higher priority to a more isolated solution in the objective-function space. According to the selection probabilities of the individuals, parental solutions that feed into crossover and mutation operations are selected. Let n c and n m be the number of parental solutions to apply crossover and mutation, respectively. The sum of n c and n m is then equal to n p . Crossover mixes the portions of two parental solutions to generate new solutions. Mutation then adds a random refinement to a single parental solution. Our parameter sequence consists of different parameter vectors related to the blending and sampling operators, each having a different length and constraint. Thus, we apply crossover and mutation to each binary vector separately. Newly generated solutions are evaluated using the CNN trained in the previous iteration to classify solutions according to their likelihood to pass the predetermined threshold based on the non-domination rank. Genetic operators repeatedly generate new d, s and γ until the CNN classifies all candidates in a given generation as likely to pass the threshold.
All the new candidates c g of generation g obtained in this way go to the JMI process to derive a set of objective-function vectors: We combine c g with the solutions obtained from all previous generations, c 0:g−1 , and j g with the objective functions from all previous generations, j 0:g−1 , as and j 0:g = j 0:g−1 , j g T = j 0 , . . . , j g T .
Since the new individuals do not necessarily attain better results, the n p best solutions in c 0:g are selected on the basis of elitism. They are subsequently replaced to c g which is then used for the next generation. In this way, the best parental solutions are carried into the new generation.
Finally, we train the CNN using c 0:g and j 0:g . We apply a fivefold cross-validation to evaluate the predictive classifier. The procedure randomly splits all the samples into five subsets with equal size. One of the five subsets is used as a testing set, whereas the remaining four subsets are assembled to form a training set. This process is repeated five times by alternately using every subset for testing and the remaining subsets for training. This interchange enables all the samples to be used for training as well as for testing. The mean of prediction errors along with the variance among the five trials indicates the predictive performances of the classifiers. The best of the five models is applied to the next iteration. At each generation, the threshold criterion applied in the classification is updated, starting from a mild value and moving to a harsh one. To improve the stability of the CNN, we modify the number of epochs and the strength of the L2-regularization term applied to every weight in the network through the course of the iterations. The former parameter is increased while the latter is decreased with the number of iterations. Controlling them is considered as an effective and simple means to prevent overfitting (Murphy 2012). It is worth noting that the update of survey parameters is based primarily on the actual performance of JMI, while the CNN classifier has a supporting role in the proposed scheme.
The iterative procedure stops when the objective function is sufficiently small, or when the number of generations reaches the pre-defined number of generations n g . The potential risk of our approach is that we may end up in some local minimum. Fortunately, since we are dealing with survey design, this is still acceptable as long as the resulting design is efficient and leads to the required image quality. It should be noted that various constraints on the blending and sampling operators can be imposed within the genetic operators in order to prevent the generation of any undesired solutions, such as operationally infeasible designs. It is also noteworthy that the proposed optimization scheme is not a mandated choice while our survey-design workflow can accommodate different methods. Both metaheuristics and neural networks can be flexibly modified and designed, enabling us to adapt them for a problem-specific task. Therefore, a user can freely select a preferred framework for a given survey-design problem.

N U M E R I C A L E X A M P L E S
We numerically simulated four acquisition scenarios: one representing standard blended acquisition and the other three incorporating the dispersed source array (DSA) concept. Table 1 summarizes the spatial-sampling schemes used in this study. In the standard scenario, detectors and sources are regularly deployed, while in the other three the geometries are irregular. Additionally, the DSA scenarios use fewer sources than the standard one. Figure 3 and Table 2 compare the source properties of the standard and DSA scenarios. The standard scenario uses a spatially uniform source signature with a wide frequency range. On the contrary, the DSA scenarios employ three source types, each having a dedicated narrow frequency bandwidth and a spatial-sampling scheme according to its frequency range. This illustrates that the DSA scenarios emit significantly less energy in both space and frequency. Note   that we applied dedicated low-frequency sources, called DSA source type 1, whose frequency range is not well covered by the source used in the standard scenario ( Fig. 3 and Table 2). The DSA concept allows a lower frequency source to be more coarsely sampled and a higher frequency source to be more densely sampled. This effectively prevents both oversampling of the lower frequencies and undersampling of the higher frequencies (Berkhout 2012a;Caporal et al. 2018). Figure 4 shows a blended shot record from the standard scenario and one from a DSA scenario that exemplify our blending and spatial-sampling schemes. Unlike the standard scenario shown in Fig. 4(a), sources with different frequency responses are blended in the DSA scenario shown in Fig. 4(b). Our synthetic data contain both primaries and internal mul-tiples. These data are directly fed into the joint migration inversion (JMI) process to obtain reflectivity and velocity estimates, that is, without deblending and reconstruction of missing traces.
For comparison purposes, we provide three DSA scenarios, two of which are obtained from blending and sampling operators created by 800 realizations of uniformly distributed, random variables. We derived the probability density function (PDF) by kernel density estimation using the 800 realizations. We show one result, called 'P50', having the mode value in the estimated PDF, which we assume to be representative for the situation where we rely on a single random realization to embed irregularity into blending and sampling operators. In addition, we provide the best result, 'P1', which is assumed to represent the outcome of a Monte Carlo approach. The previously mentioned three scenarios (standard, P50 and P1) are compared with the result from our optimized design. Figure 5 shows the acquisition configurations applied to the standard and three DSA scenarios, respectively. In our examples, the detectors and sources are placed at the surface (z d = z s = 0 m). The detectors are stationary, making the maximum offset in the data 2000 m. All scenarios use the same number of detectors yet they are deployed differently. In each record of this study, two active sources are blended with a 1000 m distance separation having different activation  In the standard scenario, all sources employ the same signature with irregular activation times and with a regular detector interval. In three DSA scenarios, blue, green and red circle markers with different marker sizes correspond to DSA source types 1, 2 and 3, respectively. Activation times of these DSA sources are irregular with an irregular detector interval.
times ranging from 0 to 256 ms. In the DSA scenarios, three types of source units are irregularly distributed according to the requirements defined in Table 2. Figure 6(a,b) shows the true subsurface properties used in this study in terms of reflectivity and velocity, respectively. The model contains a lens-shaped high-velocity body above three weak horizontal reflectors. Figure 6(c,d) shows the initial reflectivity model, which involves no contrast, and the initial velocity model, which contains a gradient only. The estimation by JMI starts with these models having (almost) no indication of true geological features. Figure 7(a,b) shows the JMI results from the standard acquisition design. Although some oblique lineaments are still recognizable below the lens body, both reflectivity and velocity models are estimated reasonably well. Figure 7(c,d) shows the JMI results from P50. This scenario apparently accentuates linear artefacts, leading to some jitter on the three horizontal reflectors. The lateral velocity variation, particularly beneath the high-velocity lens, adversely affects the kinematics of wave propagation. This explains the undesired structural undulations on the three reflectors. Compared to P50, some improvements are observable in P1 (Fig. 7e,f). However, it still is hard to find a justifiable rationale for the applied DSA scheme compared to the standard one in terms of the JMI quality. The optimized design, however, attains notable enhancement in the JMI results (Fig. 7g,h). The lens-shaped body can be easily delineated in both reflectivity and velocity estimates. Reduction of artefacts improves the coherency of the reflectors. The optimized design also achieves a robust estimation of the velocity model, which enables all the reflectors to be recovered close to their actual locations. Even edge effects are well reduced in the optimized design compared to the other three scenarios. In addition to the observations made from reflectivity and velocity estimates, Figure 8 quantitatively differentiates the overall data quality of each scenario. Figure 8(a) shows a cross-plot of J R against J W from 800 random realizations. The colours of the circle markers indicate the number of realizations. The blue, green, red and cyan squares represent the standard, P50, P1 and optimized design, respectively. The three dashed lines in Fig. 8(a) are constant probability density contours each of which represents the boundary of the area that contains a certain percentage (25%, 50% and 75% from the inner to the outer contour) of the estimated PDF. It shows a close to unimodal distribution indicating an increase in data points towards a single peak in the J R − J W space. This implies that P50 obtained from the mode value in the PDF is expected to reasonably represent the anticipated data quality in the case where we use a single random realization to design blending and sampling operators. If it is assumed that the PDF based on the outcome of our Monte Carlo optimization procedure is correct, we observe that the cumulative probability of the objective-function values from our optimized design turns out to be smaller than 10 −13 . Statistically, this suggests that an enormous number of random realizations are needed to reach a result that is equivalent to our optimized design. On the other hand, our workflow is capable of obtaining it with 800 realizations. Figure 8(b) shows a cross-plot of J R against J W from our approach. This clearly demonstrates that the proposed workflow effectively and efficiently minimizes both J R and J W through the course of iterations. The optimized DSA acquisition scenario consequently leads to proper reflectivity and velocity estimates even compared to the standard design.
As described previously, we performed a fivefold crossvalidation at every 50 realizations to evaluate classification accuracies while training the convolutional neural network (CNN) (Fig. 9). A vertical error bar indicates the minimum and the maximum accuracies obtained from each validation. Blue and red circle markers represent the mean value from five cross-validations for training and testing, respectively. Although we adjusted the regularization strength applied on each weight in the network and the number of epochs to obtain a reliable model, due to the insufficient number of samples the testing results still exhibit some indication of overfitting at the early stage of the iterative procedure. Since we altered the threshold criterion and some parameters within the CNN as mentioned previously, the validation results among different stages are not directly comparable. Nevertheless, the classification performance evidently improves through the course of iterations. Additionally, the difference in accuracies between training sets and testing sets becomes insignificant after a couple of hundred realizations, where the classification achieves accuracies well above 90% for both training and testing sets with a small discrepancy between the minimum and the maximum values. This indicates that our network architecture, along with the chosen hyperparameters, manages the bias-variance trade-off reasonably well and successfully relates the survey parameters to the resultant JMI quality.

D I S C U S S I O N
Insufficiencies in quality and quantity of available subsurface information potentially cause uncertainty in R and W used to optimize survey parameters in the proposed workflow although its degree varies with the situations. In this respect, we investigated the effect of this uncertainty on the joint migration inversion (JMI) results. Figure 10 The blue, green, red and cyan squares are the results from the standard, P50, P1 and optimized design, respectively. In (a), the three dashed lines are constant probability density contours from kernel density estimation each of which represents the boundary of the area that contains a certain percentage of data points (25%, 50% and 75% from the inner to the outer contour).
subsurface model to employ different R and W. For each arbitrary derived subsurface model, we performed full-wavefield modelling (FWMod) to simulate four different P using the four survey designs discussed earlier (standard, P50, P1 and optimized). We then applied JMI to these four datasets to obtain different reflectivity and velocity estimates. As shown in Fig. 10(b), this procedure was applied to all the subsurface models. Figure 11 shows a comparison among the JMI results from the four different acquisition scenarios for each subsurface model. The vertical axes of Fig. 11(a,b) indicates relative differences in J R and J W , respectively, with respect to the corresponding objective-function values from the optimized design. A negative value indicates that the optimized design attains the smaller misfit. The figure clearly shows that our optimized design achieves proper reflectivity and velocity estimates for all the models. Although certain variations can be observed among different subsurface models, the relative relationship among the four designs in terms of JMI quality still holds. For the reflectivity estimation, the standard and optimized designs attain comparable results. Next, P1 produces a somewhat less optimal outcome, followed by P50. For the velocity estimation, a distinct improvement can be seen in the optimized design. It clearly outperforms the standard design as well as the P1 design, the two of which provide almost comparable results. Then, the P50 result is suboptimal for all the subsurface models. Hence, this suggests that blending and sampling operators derived from our approach can still provide optimum JMI results even when prior subsurface properties used for the survey-design scheme are not precisely known. One of the unique properties of JMI is its ability to explain and utilize multiple reflections. They can provide subsurface illumination at different angles from the same shot and may carry information on areas that primaries cannot reach (Berkhout 2012b;Verschuur and Berkhout 2015).  Internal multiples are also capable of illuminating the subsurface in downward and upward directions (Davydenko and Verschuur 2013). By treating multiples as signal, they can also assist primaries in the subsurface illumination, provided that proper processing algorithms are implemented. For instance, Berkhout and Verschuur (2016) illustrated a contribution of multiples by comparing JMI results with and without the use of multiples. Seismic surveys are often designed to sample primary reflections. On the contrary, less emphasis is given to multiples as they are to be attenuated in the subsequent processing. Exploiting features of multiples potentially allows us to ease the sampling requirements in data acquisition (Verschuur and Berkhout 2015). Alternatively, it can enhance the subsurface illumination with the same acquisition footprint (Kumar et al. 2016). The proposed workflow, aimed at improving the JMI results, then enables us to derive acquisition scenarios that can account for the effect of both primary and multiple reflections in the area of interest.
In this study, we utilize the standard implementation of JMI based on Staal and Verschuur (2013) and Staal (2015), which disregards certain subsurface characteristics as mentioned previously. In this respect, some recent studies have been carried out towards exploring phenomena which are not explained in this standard framework such as mode conversion (Berkhout 2014b), angle-dependent reflectivity , horizontal wave propagation (Masaya and Verschuur 2017) and anisotropy (Alshuhail and Verschuur 2019). The research has been also extended to other perspectives such as applications to 3D data (El-Marhfoul and Verschuur 2016), time-lapse data (Qu and Verschuur 2017) and integration of JMI and full-waveform inversion (FWI) . Furthermore, the presented numerical examples were designed fundamentally to demonstrate the proposed survey-design scheme. To this end, a simple 2D subsurface model was chosen which does not reflect the geological complexity encountered in actual fields. Hence, our future work is aligned with the recent developments in JMI coupled with the use of more complex and realistic 3D subsurface scenarios. To examine the effect of acquisition design on the JMI result, our study employed the idealized situation comprising FWMod as a forward-modelling tool and JMI as a property estimation tool. It is worth noting that the proposed approach is not necessarily limited to this implementation as the contents within the overall scheme can be freely and flexibly altered depending on the needs of users. A possible option would be to replace them with different methodologies, for example, different forward-modelling schemes such as a finite-difference modelling as well as different velocity estimation and/or imaging algorithms such as FWI and least-squares migration (LSM). Our approach would then be able to design a seismic survey that can meet the quality requirements of the applied process(es).
This study demonstrates that optimally designed survey parameters lead to JMI results that are better than the results obtained with the standard scenario. Alternatively, the same results could have been obtained in a more efficient and cost-effective manner. Furthermore, the implementation of dispersed source array (DSA) is of help in contributing to a health, safety and environment (HSE) perspective (Caporal et al. 2018). The emission of acoustic energy may incur a potential environmental risk, particularly in marine surveys. Sound pressure level (SPL) and sound exposure level (SEL) are of primary concern to determine the effects of an acoustic source on the marine environments, in particular on marine mammals. Airgun clusters that generate an impulsive signal are widely used in the industry. As a broad frequency range of acoustic energy is instantaneously generated, the technique inevitably accentuates the peak pressure. Furthermore, these conventional marine sources inevitably emit highfrequency components, for example, above 100 Hz, which are normally discarded in seismic imaging yet significantly overlap with the hearing ranges of odontocetes and pinnipeds (Southall et al. 2007). As illustrated in Fig. 3, the use of dedicated narrowband sources decreases both peak amplitude and total energy for each shot. It is also capable of preventing a seismic survey from the emission of unnecessary frequency components, yet of acquiring the information needed to characterize the subsurface. Hence, the technique contributes to reducing both SPL and SEL. The optimized DSA scenario uses reduced source locations without adversely affecting the data quality. These elements lower the total energy accumulated over the survey duration, known as cumulative SEL. Our optimization scheme is therefore capable of designing an efficient, cost-effective and environmentally favourable seismic survey that also enhances the performance of JMI.

C O N C L U S I O N
The survey-design workflow introduced in this study simultaneously optimizes parameters that determine source blending and the spatial sampling of detectors and sources. The numerical examples in this study demonstrate that the joint migration inversion (JMI) results can vary with the design of survey parameters. Optimally designed parameters lead to the enhancement of both reflectivity and velocity models estimated directly from blended and irregularly sampled data. The proposed approach integrates a genetic algorithm (GA) and a convolutional neural network (CNN) to make the optimization of survey designs feasible within an affordable computation time. Genetic operators coupled with non-dominated sorting are capable of minimizing errors in reflectivity and velocity models by simultaneously updating the survey parameters. The workflow can effectively deal with complex blending and spatial-sampling schemes that employ the dispersed source array (DSA) concept. The neural network architecture applied to this study successfully relates the performance of JMI to the choice of survey parameters. The classification through a CNN helps us to minimize the computation of suboptimal designs while keeping the optimal ones into the iterative scheme. The proposed approach provides survey parameters to enhance the results of JMI that have been obtained by directly processing blended and irregularly sampled data without the need for deblending or data reconstruction. The resultant acquisition scenarios allow us to optimally estimate subsurface properties at an affordable cost and with a low environmental footprint.

A C K N O W L E D G E M E N T S
The authors would like to express their sincere appreciation and gratitude to all the sponsors of DELPHI consortium and INPEX Corporation for their continuous support.