A new method of smoothness‐constrained magnetotelluric modelling with the utility of Pareto‐optimal multi‐objective particle swarm optimization

Particle swarm optimization, one of the modern global optimization methods, is attracting widespread interest because it overcomes the difficulties of conventional inversion techniques, such as trapping at a local minimum and/or initial model dependence. The main characteristic of particle swarm optimization is the large search space of parameters, which in a sense allows the exploration of the entire objective function space if the input parameters are properly chosen. However, in the case of a high‐dimensional model space, the numerical instability of the solution may increase and lead to unrealistic models and misinterpretations due to the sampling problem of particle swarm optimization. Therefore, smoothness‐constrained regularization techniques used for the objective function or model reduction techniques are required to stabilize the solution. However, weighting and combining objective function terms is partly a subjective process, as the regularization parameter is generally chosen based on some kind of criteria of how the smoothing constraints affect the data misfits. This means that it cannot be completely predefined but needs to be adjusted during the inversion process, which begins with the response of an initial model. In this paper, a new modelling approach is proposed to obtain a smoothness‐constrained model from magnetotelluric data utilizing multi‐objective particle swarm optimization based on the Pareto optimality approach without using a regularization parameter and combining several objective function terms. The presented approach was verified on synthetic models and an application with field data set from the Çanakkale–Tuzla geothermal field in Turkey. Findings from these analyses confirm the usefulness of the method as a new approach for all constrained inversions of geophysical data without the need to combine the objective function terms weighted by a regularization parameter.


INTRODUCTION
A good way to avoid conventional inversion techniques that depend on an initial model and partial derivatives is to use global optimization methods such as particle swarm optimization (PSO) that is widely considered an alternative method to overcome these drawbacks without trapping a local minimum (e.g.Fernández Martínez et al., 2010;Godio & Santilano, 2018;Pallero et al., 2015Pallero et al., , 2018Pallero et al., , 2021;;Peksen et al., 2014;Shaw & Srivastava, 2007;Yuan et al., 2009).The overall success of PSO, as the other global optimization methods, such as genetic algorithm (Goldberg & Holland, 1988), neighbourhood algorithm (Sambridge, 1999) and simulated annealing (Kirkpatrick et al., 1983), is related to the extensive search in the parameter space, such that, theoretically almost the entire objective space is explored if the input arguments are properly chosen.On the other hand, a high-dimensional model space may have an ill-posed character, reduce the uniqueness and numerical stability of the solution and lead to a structural complexity in the final models that is not physically meaningful if deterministic inversion techniques are used (Fernández Martínez et al., 2012).In this case, the solution sampling of the PSO is also limited by the curse of dimensionality, and not all dimensions of the model space are generally represented by the observed data (Curtis & Lomax, 2001;Fernández Martínez et al., 2012).Therefore, some types of regularization techniques are used in the objective function (Godio & Santilano, 2018;Pace et al., 2019b;Pace et al., 2021;Schnaidt et al., 2018) or model reduction techniques in the parameter space (Fernández Martínez et al., 2012) are required to guide the solution sampling.
There are numerous works in which a weighted regularization term, generally expressed by the spatial gradient of the model parameters, is used to increase the stability of the solution for ill-posed inversion problems (Aster et al., 2018;Giraud et al., 2021;Godio & Santilano, 2018;Xiang et al., 2017).The regularization term of smoothness, which does not allow serious changes in the parameter model, limits the enormous possible variations in the estimated parameters (Aster et al., 2018;Zhdanov, 2018).Two conventional methods, namely (1) generalized cross-validation proposed by Haber and Oldenburg (2000) and (2) L-curve criteria proposed by Hansen (1998), are generally used to find an optimal regularization parameter to stabilize the solution and are often implemented for the spatial smoothness regularization.They use the trade-off that generally occurs between resolution in data fitting and model variance to define regularization parameter (Guo et al., 2011).However, a regularization parameter that is determined by such a criterion is somehow a subjective choice (Wagner & Uhlemann, 2021).For example, a starting model and its response that are used to initiate the inversion process can significantly impact the convergence behaviour, and thus the most appropriate regu-larization parameter can change significantly during iteration, especially when the data are noisy.Particularly, if the noise in the observations is not well known, automatically determining an appropriate value for the regularization parameter can be problematic.Furthermore, the use of subjective and inappropriate weights for the objective function terms can lead to misleading results and misinterpretation (Büyük et al., 2020;Kozlovskaya et al., 2007).
The Pareto optimality approach introduced by Moore (1897) for economic aspects provides an optimal solution set within trade-off solutions in multi-objective optimization (Baumgartner et al., 2004).If this approach is integrated into global optimization algorithms, joint inversion of geophysical data sets can be successfully carried out to obtain an optimal solution set eliminated from the aforementioned drawbacks of conventional inversion techniques without weighting and combining data misfit terms obtained from different data sets.There are a number of studies, such as Akca et al. (2014), Büyük et al. (2020), Dal Moro (2010), Kozlovskaya et al. (2007), Paasche and Tronicke (2007), Pace et al. (2019a), Schnaidt et al. (2018) and Tronicke et al. (2011) utilized the Pareto optimality approach integrated with global optimization methods in various joint inversion of different geophysical data sets.Despite this interest, to the authors' knowledge, no one has investigated the practicability of Pareto integrated global optimization method to the constrained inversion without the need for a regularization parameter.
This paper describes a new approach to smoothnessconstrained one-dimensional modelling of magnetotelluric data by taking advantage of the integration of PSO and the Pareto optimality approach.Apart from Cui et al. (2020) and Godio and Santilano (2018) that have successfully performed with combined single-objective PSO for one-dimensional modelling of magnetotelluric data, this paper proposes the modelling of magnetotelluric data without regularization term and hence, weighting by combination of objective function terms utilizing the multi-objective PSO (MOPSO) based on the Pareto optimality approach (henceforth will be referred as Pareto-MOPSO).As a first step, some synthetic analyses were performed with noise-free and noisy data to confirm the applicability of the approach.These tests have highlighted that the Pareto-MOPSO approach reproduces the synthetic models in good agreement with their responses, even if a high number of parameters and a large search space is used for the electrical conductivity, which varies by many orders of magnitude.The second step was to compare these outcomes with the results obtained using single-objective PSO with the same parameterization and search space.Although the single-objective PSO successfully fitted the data, the model obtained was slightly different with a noticeable artefact layer.These findings highlight the importance and necessity of using Pareto-MOPSO to retrieve the produced model with defined parameterization and search space in the case using of swarm optimization technique.The results of these analyses strengthened my confidence in MOPSO and I modelled the magnetotelluric field data set obtained on the Çanakkale-Tuzla geothermal field in Turkey.The resistivity-depth sections obtained from modelling the field data set were also compatible with the well logs close to the field.The analyses of the field data confirmed that the presented approach can be used not only for joint inversion of different data sets but also successfully for constrained inversions by minimizing the objective function terms without estimation of any regularization parameter.

OBJECTIVE FUNCTION AND SMOOTHNESS REGULARIZATION CONSTRAINT
In geophysical inversion, the goal is to estimate the best set of model parameters that represent the data.As it is generally difficult to estimate the true model in the set of possible solutions, the fundamental concept is to minimize the objective function, which corresponds to find the maximum likelihood of a probabilistic density function.These concepts are summarized below, but more details on their description can be found in the respective handbooks such as Aster et al. (2018) and Tarantola (2005).Let   (|) be the probability density function of the data misfit of the data vector  = ( 1 ,  2 ,  3 , … ,   ) and model vector  = ( 1 ,  2 ,  3 , … ,   ) that holds the physical property values in the -dimensional model space.Let also   () be the probability density function of the  th type of constraint, which is a measure of a property of the model.Thus, the probability density function is defined as follows: Assuming that the Gaussian probability density best approximates the solution, the probability density function of the data misfit can be expressed as follows: where () is the vector of model responses obtained from a forward modelling operator .  is the weighting matrix representing the standard deviations from the noise in the data.  is a parameter describing the relative importance of the data misfit term, which can be derived from the probability density function defined by the uncertainties of the measurements (Tarantola, 2005).Similarly, the constraint terms can be expressed as follows: where  () is a function given by the model parameters or prior knowledge.  is weighting matrix of the model parameters of the discretized Earth model.  describes the relative importance of the  th type of constraint, which can be inferred according to the prior knowledge described from the probability density of the model parameters or set according to the purpose of the study (Giraud et al., 2021;Tarantola, 2005).Equations ( 2) and ( 3), corresponding to the maximization of the probability density function (|), are equivalent to the minimization its negative logarithm, called the objective function Φ(|), in the following way: If we consider the  th type of constraint as a unit weighted smoothness term, second term on the right-hand side of the Equation ( 4) is generally given a gradient of the model parameters and can be expressed as follows: where   is the model covariance vector and modulates according to the importance of the each layer (Giraud et al., 2021).The smoothness, one type of regularization, is most commonly used to constrain the inversion of geophysical data.The smoothness term, originally proposed by (Rudin et al., 1992), is generally used to reduce structural complexity to approximate realistic geological features, similar to total-variation regularization.
If the unit weighted data misfit term is expressed as ‖  ( − ())‖ 2 2 =   , Equation (4) can be simplified as follows: where  on the right-hand side of the equation is the regularization parameter that stabilizes the results of the data misfit and constraint terms (Farquharson & Oldenburg, 2004).Equation (6), a weighted sum of objective function terms with a regularization parameter as relative weight, shows the objective function commonly used in the traditional constrained inversion.

PARTICLE SWARM OPTIMIZATION
Particle swarm optimization (PSO) has received much attention in the last decade in the inversion problems of the geophysical data, such as DC-resistivity data (Fernández Martínez et al., 2010;Peksen et al., 2014;Shaw & Srivastava, 2007), self-potential data (Monteiro Santos, 2010;Peksen et al., 2011), gravity data (Essa et al., 2021;Pallero et al., 2015), time-domain electromagnetic data (Amato et al., 2021;Pace et al., 2021), magnetotelluric data (Karcıoğlu & Gürer, 2019;Godio & Santilano, 2018;Pace et al., 2019b), magnetic data (Essa & Elhussein, 2018;Liu et al., 2018) and seismological data (Song et al., 2012).Pace et al. (2021) also published a recent review indicating the growing body of literature on this topic.PSO, introduced by Kennedy and Eberhart (1995), is a stochastic global optimization technique inspired by the simulation of a swarm of birds or fish.Each individual in a swarm, called a particle, is represented as a vector of model parameters in a feasible model space and takes a position projected onto the objective space according to this vector.Particles take a position in the model space and learn their next position from the relations with all other particles, so that knowledge is continuously shared within the swarm.Through the swarm trajectory, each particle changes its position with a velocity vector in the model space at each iteration and converges to a global minimum without being trapped in a local minimum in the objective space.If the vector of model parameters is expressed as  and the objective function is expressed as Φ(), the particles  change their position in the model space according to Φ() at each iterative solution and converge to the global minimum in the objective space, as illustrated in Figure 1.
In the implementation of the PSO algorithm, the initial particle velocities are usually assumed to be zeros and the model parameters are randomized within the parameter search space.The particle that has the best fit among all particles, that is the particle that has the lowest root mean square fit between the observed data and the model responses, is set as the global leader.The next position of all particles is calculated based on their velocity vectors ) , (7) and position vectors, where subscript  is the particle number, superscript  is the iteration number,    is the position of the th particle at the kth iteration,    is the velocity vector of the particle,  is the inertia weight term,  1 and  2 are usually referred to as "acceleration factors" (e.g.Engelbrecht, 2007;Kennedy 2006),  1 and  2 are uniformly random numbers in the range [0,1],  pbest, is the best position of particle  in the past and  gbest is the best position of the all particles in the past.
The new position of each particle is obtained by adding the new velocity vector  +1  to the previous position of the particle    , expressed in Equation ( 8).If the updated position of a particle produces a better fit compared to the previous iteration, the new position is updated to  pbest .The particle is set to  gbest if the updated position is the best among the other particles.This process is repeated: (1) until the objective function (Φ) becomes less than or equal to the threshold () set by the user depending on the optimization problem or (2) until the maximum number of iterations defined by the user is achieved (Engelbrecht, 2007).Ozcan and Mohan (1999) have expressed concerns that PSO may get trapped at a local minimum due to premature convergence conditions and that the PSO algorithm has performance problems.In premature convergence, some particles converge to a locally best solution that is not the globally best solution in the swarm, and then other particles mistakenly focus on this local solution without discovering other parts of the search space.Clerc (1999) and Clerc and Kennedy (2002) proposed a constriction factor () added to the velocity vector in Equation ( 7), showing rapid convergence without premature convergence by balancing between exploration (global search) and exploitation (local search).Accordingly, under the condition that the constriction factor is expressed as where  1 and  2 are acceleration factors.If the constriction factor () is then substituted with , the velocity vector is changed as follows: (11) In the PSO method, the change of the velocity vector may cause a particle to tend to explode to large values if the particle is far from the global and local best position.Therefore, I used a velocity-limiting approach to particle motion proposed by Fan and Shi (2001) to avoid divergence of the particles.This approach limits the velocities as follows:  max = + and  min = − , where  = ( max −  min )∕. max and  min are user defined scalar values that indicate the upper and lower limits of a particle. is an interval number defined by the user considering  max and  min .

MULTI-OBJECTIVE PARTICLE SWARM OPTIMIZATION BASED ON THE PARETO OPTIMALITY APPROACH
In particle swarm optimization (PSO), geophysical data can be modelled with an objective function consisting of a single margin-of-misfit axis.However, in joint modelling, that is, modelling of more than one data set, the optimization problems of different data sets are expressed in separative objective functions such that a trade-off must be made between these objective functions.It is well known that Pareto optimality (Baumgartner et al., 2004;Pareto, 1896) is one of the most successful criteria to analyse the variation between the objectives in such a trade-off, and Figure 2 schematically illustrates the Pareto-front representing a tradeoff between objective functions.According to the Pareto optimality approach, the vector of model parameters  in the  -dimensional model space are changed such that at the same time the misfit axes in the N-dimensional objective function are minimized.In general, it is hard to find a single vector that can minimize all misfit axes simultaneously.Therefore, a set of solutions can be determined that satisfy the Pareto optimality.
Given two model vectors   ,   in the minimization problem, if Φ(  ) yields a better solution than Φ(  ), which means that   dominates   in the way that Φ  (  ) ≤ Φ  (  ) for all  = 1, … ,  in the -dimensional objective function.If there is no other model vector   satisfying the condition that  2. The solution closest to the origin (0,0) may be considered the preferred Pareto-optimal solution within the  * (Baumgartner et al., 2004;Reyes-Sierra & Coello Coello, 2006;Schnaidt et al., 2018).
Several considerations need to be taken into account during the multi-objective PSO (MOPSO) process, namely promoting diversity, maintaining non-dominated solutions and selecting a leader (Reyes-Sierra & Coello Coello, 2006).In contrast to a global best position leading a swarm in PSO, in MOPSO, one of the non-dominated solutions indicating the presence of a set of optimal solutions is selected as the leader.At this point, the selection of the leader from the non-dominated solutions is one of the most important steps of the method.Apart from this, two other steps are promoting diversity and maintaining non-dominated solutions.It is important to promote diversity properly to prevent premature convergence (e.g.convergence to a local minimum), which is one of the characteristics of the PSO algorithm.Promoting diversity with this feature increases the exploration capabilities of the algorithm and thus controls the convergence.In the maintaining phase, the goal is to maintain non-dominated solutions with a limited number of solutions stored in an external archive with respect to previous swarms during the search process.Therefore, proper maintenance of the non-dominated solutions is another important phase for comparable diversity and convergence of solutions to Pareto-front (Reyes-Sierra & Coello Coello, 2006).The implementation of the MOPSO algorithm in the form of a pseudocode notation is shown in Figure A1.At the end of the optimization, some criteria such as the generational distance, the spacing and deviation angle between the obtained non-dominated solutions (see the 'Evaluation of the solutions' section) may be considered to quantitatively evaluate the performance of the algorithm and the solutions.In addition, the deviation angle criteria are used to evaluate the compatibility of the objective function terms.For more details on the methods used to improve the quality of the solutions during the MOPSO process, evaluate the compatibility of the objectives, and the performance of the algorithm in the final iteration, see previous studies (Büyük, 2021;Coello Coello et al., 2004;Dal Moro, 2010;Pace et al., 2019a;Pace et al., 2021;Reyes-Sierra & Coello Coello, 2006;Schnaidt et al., 2018).

Synthetic data example
Figure 3 shows synthetic apparent resistivity and phase angle data for twelve periods from 10 −3 to 10 2 s obtained from F I G U R E 4 Magnetotelluric stations and borehole logs over a simplified geological base map, reconstructed from (Akal, 2013;Karacık & Yılmaz, 1998).
a synthetic, isotropic 1D electric resistivity model (henceforth will be referred as SM-1).Synthetic apparent resistivity and phase angle curves were calculated from the effective impedance tensor proposed by Berdichevsky et al. (1989) and were generated using Wait's (1954) recursion formula based on one-dimensional magnetotelluric responses, which were implemented as for forward modelling in the particle swarm optimization (PSO) and multi-objective PSO (MOPSO).Coen et al. (1983), Parker (1983), Vozoff (1990) and Weaver and Agarwal (1993) have used one-dimensional magnetotelluric inversion in which the resulting model structures have a continuous resistivity-depth function with many layers as opposed to few layers that can suppress important structures.In this case, however, the problem can be underdetermined, with more unknown model parameters than data points, requiring prior knowledge or regularization to control the excessive variance of the parameters and keep the solutions stable (Constable et al., 1987;Godio & Santilano, 2018;Guo et al., 2011;Pace et al., 2021).Therefore, 16 layers were used in the SM-1, with thickness increasing logarithmically from 100 m to 7 km with depth (Figure 3).

Field data set and site description
Magnetotelluric field data set marked in Figure 4, with a total of five stations (FD-1 to FD-5) spaced approximately 1 km interval were collected using a Metronix ADU-07e receiver unit.The measurements were acquired close to two wells at the Çanakkale-Tuzla geothermal field in Turkey (Figure 4).Sampling rates of 65,536, 16,384, 4096, 1024, 512 and 128 Hz were used to record electromagnetic time series with durations of 2, 4, 8, 16, 32 and 1440 min, corresponding to 24 h at each magnetotelluric station.The ProcMT software package (Metronix, Germany 2015) was used to perform most of the data processing steps, namely time windowing, fast Fourier transform (FFT), power spectra and cross spectra calculation, stacking and remote reference to obtain apparent resistivities and phase angles from the frequency dependent impedance tensors at each station.To obtain the spectral ratios of the electric and magnetic fields, the FFT was repeatedly applied on portions of the data set that were defined by a constantly moving time window to obtain a stacked power and cross spectra.To obtain a robust estimate of the impedance tensor, bias effects were removed using the remote reference method (Gamble et al., 1979), with the data obtained from a station where located about 15 km from the measurement points.
In Figure 4, the simplified geological map based on welldocumented production wells with depths of 814 and 1020 m (Akkuş et al., 2005, p. 97-103) shows highly altered volcanic rocks such as andesite-dacite lavas and ignimbrites indicative for a hydrothermal zone.The hydrothermal system is associated with late Miocene volcanism.A granitic-granodioritic pluton partially intruded into metamorphic rocks of schist and in places of marble and crystallized limestone (Karacık & Yılmaz, 1998;Okay & Satir, 2000) is considered the heat source of the system (Şener & Gevrek, 2000).

Misfit functions
The objective function to be optimized in this study is comprised of two components that are data misfit term   (, ) and model regularization term   (), as shown in Equation ( 6).These terms were minimized simultaneously using MOPSO and a misfit function () for each term was used to evaluate the performance of the models during the optimization.The misfit function of the data term was calculated using: where  is the number of observations,  obs  and  cal  are the observed and calculated apparent resistivities (log 10 Ω m), respectively;  obs  and  cal  are the observed and calculated phase angles (radians), respectively; Δ , and Δ , are the standard deviations of the observed apparent resistivities and phase angles, respectively.Using resistivity values on a logarithmic scale has the advantage of obtaining strictly positive electrical conductivity values that vary by many orders of magnitude.A normalized root mean square error (NRMSE) (Jones, 2019), which is appropriate for log-transformed misfit function (Otto, 2019), was used for evaluation.
Equation ( 14) was used for both synthetic tests and field data to ensure consistency when comparing solutions.Although the combination of the misfits of apparent resistivity and phase angle does not appear to provide an advantage in synthetic tests, it may be important in field data, which may be affected by galvanic distortion such that systematic biased data can be easier identified.The impedance phase angle is generally unaffected by this type of distortion, whereas the apparent resistivity data can be affected quite severely (Jones, 1988).
The misfit of the second objective function, (  ), was calculated by the NRMSE of the unit weighted numerical gradients of the model parameters (), symbolically represented in Equation ( 5).The output of this term corresponds to ∕, which are the differences of  (log 10  Ω m) in the z (depth) direction.Thus, a smooth model is obtained in which the variation of resistivity values in each layer is minimized.

Model parameters
As the estimation of the layer thicknesses could lead to a complicated solution (Siripunvaraporn et al., 2005), only the resistivities in each defined layer thickness were estimated in the modelling phase.Both PSO and MOPSO require a parameter search space that should be limited to ensure a feasible and optimal solution, unlike deterministic optimization methods that produce a solution depending on an initial model.The parameter search space defined by the lower ( min ) and upper ( max ) bounds of the particles was chosen in the range from 1 to 1000 Ω m.Layer thicknesses were used that logarithmically increase from 100 m to 7 km depth, as in the SM-1.

Particle swarm optimization parameters
Equation ( 11) was used for determination of velocity updates of particles in PSO and MOPSO, with  1 and  2 as 2.05, that is,  = 4.1 and  = 0.7298, proposed by Eberhart and Shi (2000) for rapid convergence.Recent experiments on the number of particles by Pace et al. (2019a) indicate that nine times the number of model parameters is appropriate for PSO.Therefore, this was adopted for the number of particles (e.g.144 particles for 16-layer resistivities in SM-1).Equation ( 14) was also used for the misfit function of the single-objective PSO.The interval number () was set to 10 for the velocity-limiting approach.

Termination criteria
An exact iteration number was not used for PSO, because obtaining a final and appropriate solution depends on the numerical problem (Engelbrecht, 2007;Pace et al., 2021).In this study, the PSO process was terminated if change in misfit function of global best particle (Δ) was continuously Δ < 0.0001 in the last 100 iterations.If this condition was not provided, the process was terminated at the 1000 iterations without considering the change in the misfit function.The termination criterion for MOPSO has generally been implemented in two ways: (1) by limiting the maximum number of iterations and (2) by ensuring that the non-dominated solutions are not dominated for several successive iterations, resulting in a monotonic condition (Feng et al., 2019;Reyes-Sierra & Coello Coello, 2006).In this study, the MOPSO iteration was terminated if the number of non-dominated solutions did not change in the last 200 iterations.As in Peksen et al. (2014), 10 experimental solutions were repeatedly generated and the final model was selected from the solutions with the lowest data misfit.

Selected methods and parameters for Pareto optimality criteria
In the Pareto optimality approach, the -dimensional objective function space is divided into hyper-rectangles of length  > 0, as shown in Figure 2 (Coello Coello et al., 2004;Coxeter, 1973).The number of hyper-rectangles for each objective function was set to one-tenth of the number of particles used, and the -dominance approach proposed by Laumanns et al. (2002) was adopted to maintain non-dominated solutions.Assuming a minimization problem Φ(  ) is considered -dominant to Φ(  ) for  (0.01 used in this study) if and only if (1 − )Φ  (  ) ≤ Φ  (  ),  = 1, … ,  in the -dimensional objective function (Helbig & Pateva, 1994;Laumanns et al., 2002).If a new solution dominates another solution in the ε-box, the new solution is replaced by the old solution.This approach controls which solutions were added to the external archive to prevent the size of the archive from increasing too much (Reyes-Sierra & Coello Coello, 2006).
It is generally accepted that the scheme theorem given by Holland ( 1992) is well-suited to select a proper leader for MOPSO.Therefore, I used the roulette wheel selection scheme proposed by Coello Coello et al. (2004), which is based on the rational division of the segments of the wheel according to the number of non-dominated solutions in each of the hyper-rectangles.The ratio decides the probability of selection ( s ) of an individual hyper-rectangle as follows: where  is the number of hyper-rectangles having nondominated solutions and   is the number of non-dominated solutions of the  th hyper-rectangle (Rao, 2009).After a hyper-rectangle is determined, a leader is randomly selected from the hyper-rectangle.
The power mutation approach introduced by Deep et al. ( 2009) was also used to increase the exploration capabilities of the Pareto-MOPSO algorithm.Mutation not only provides a greater diversity of solutions but also prevents solutions from being trapped in a local minimum even if only small perturbations are applied (Coello et al., 2004;Deep et al., 2009;Melanie, 1996).In this approach, a random number () is generated from a power distribution with  =   1 , where  1 is a uniform random number in the range [0, 1] and  is the mutation index set to 0.5.Then, the model vector () is perturbed and the new vector ( new ) is obtained as follows:  new =  − ( −  min ) if  < , or  new =  + s( max − ), if  ≥ , where  =  −  min ∕ max −  and  is the uniformly distributed random number between 0 and 1.

Evaluation of the solutions
In this study, the solutions of the multi-objective optimization problems were evaluated using some criteria such as generational distance (GD), spacing and deviation angle, as mentioned in 'Multi-objective particle swarm optimization based on the Pareto optimality approach' section.For the GD and spacing methods, a Pareto-optimal reference set is required, which is obtained by considering the entire feasible decision search space (Deb & Sinha, 2010).Therefore, an optimal reference set was determined by selecting model parameters in the range from 0.1 to 10,000 Ω m based on the possible range of rock resistivities according to Palacky (1987).In addition, the iterative process was terminated when the size of the external archive included 100 solutions, as recommended by Coello Coello et al. (2004).The GD approach introduced by Van Veldhuizen (1999) was used to determine the distances of each non-dominated solution to the Pareto-optimal set, defined as follows: GD = ∕, where  is the number of non-dominated solutions,   is the Euclidean distance between each non-dominated solution and the nearest member of the Pareto-optimal set in the objective space.If the value of GD is zero, this is an indication that all identified solutions are in the Pareto-optimal set.Lower values of GD indicate good performance of the solution, whereas higher values indicate how far the solution is from the Pareto-front (Coello Coello et al., 2004;Van Veldhuizen, 1999).The spacing (SP) criteria proposed by Schott (1995) was used to measure the distribution of solutions corresponding to a range variance of adjacent solutions defined as follows: SP = 2 , where is the mean of all   and  is the number of non-dominated solutions, ,  = 1, 2, … , .A value of zero for this criteria means that the members of the solution set are equidistantly distributed, giving the most smooth and uniform distribution of solutions (Coello et al., 2004;Pace et al., 2019a).The final approach to evaluating the solutions used in this study was to analyse the degree of compatibility between the different objective function terms.I used the deviation angle () method successfully applied by Pace et al. (2019a) and Schnaidt et al. (2018), which gives the deviation of a linear regression representing the non-dominated solutions along the ideal line with a slope of 1 in the two-dimensional objective space using the Theil-Sen estimator (Sen, 1968;Theil, 1950).In such objective space {( ,  , )| = 1 ⋯  }, where  is the number of nondominated solutions, the estimator is calculated by the median of the slopes as follows: m = median{ , } between the combination of all possible points:  , =   −   ∕  −   , ,  = 1, … ,  , ∀  ≠ .Then, the deviation angle was calculated by:  = tan −1 (| m − 1∕1 + m|).If the data sets are perfectly compatible, the solutions align along the line with a slope of 1 associated with α = 0˚satisfying the ideal condition, and the objective function values converge to the same value.An incompatible case generally exists at  > 45 • , while  < 45 • has compatible objectives (Pace et al., 2019a;Schnaidt et al., 2018).

Synthetic models
Figures 5 and 6 illustrate the multi-objective particle swarm optimization (MOPSO) results of jointly inverting of MT data misfit and regularization constraint term of the SM-1 (noisefree) and SM-2 (15% Gaussian noise added), respectively.Figures 5a,b and 6a,b show that the model outputs very well match the synthetically generated observations.In both cases (noisy and noise free data), the optimal model obtained in the last iteration fits the observed data well and provides a very similar model.The results appear to successfully reproduce the true model, and the models from the 10 experimental solutions are arranged around a common model for both the noise-free and noise-added case, which is shown in Figures 5c  and 6c, respectively.Although the experimental models have some uncertainties in retrieving the high resistivity structure at shallow depths, these analyses provide good estimates representing low resistivity in the middle part of the model and increasing high resistivities at larger depths.Figures 5d  and 6d show the individual best misfits of each objective function term versus iteration.Both terms show a rapid convergence to lower misfits at the beginning of the optimization process and stabilize at low misfits.Thereafter, these terms exhibit an approximately monotonic condition for the solution set.
As can be seen in Figures 5e and 6e and the numerical values of the deviation angle in Table 1, the Theil-Sen regression lines deviate from the ideal line by more than 45˚.One reason for the incompatibilities is probably that the two objective function terms are used along in different ways to minimize the data misfits and the model roughness, respectively.This is partly because models having resistivities that sharply change across interfaces can also partially reduce data misfit apart from the smoothness models, therefore full compatibility may not arise between these type objective function terms.Schnaidt et al. (2018) pointed out that incompatibilities can often arise in real-world problems, as different methods may have different sensitivities and resolutions and different depths of investigation, or data sets have different data errors.However, synthetic analyses show that the incompatibility is The estimates obtained from generational distance (GD), spacing (SP) and deviation angle of the SM-1, SM-2 and field data set.not only related to the real-world multi-objective optimization but also to the characteristic of objective function terms that are minimized in different ways.As can be seen in the deviation angle of SM-2 that is slightly higher than SM-1, suggesting that incompatibility increases when noise is added to the data, as noted by Schnaidt et al. (2018).

GD SP Deviation angle (𝜶
Table 1 also lists the other criteria (generational distance [GD] and spacing [SP]) to quantitatively evaluate the solution quality and algorithm performance for these synthetic tests.GD values, which are very close to zero in both tests, mean that almost all solutions are in the Pareto-optimal set.The fact that the SP value in the SM-2 solution is higher than in the SM-1 solution can be interpreted as the result of noisy data.As SP indicates a low or high value regardless of whether the solutions are in the Pareto-optimal set (Coello et al., 2004;Van Veldhuizen, 1999), this criteria can be misleading in determining the quality of the solution.Nevertheless, the values of SP, which are close to zero, indicate that the distributions of the solutions are approximately smooth and nearly uniform.
As can be seen in Figure 5e, the Pareto-front in SM-1 has a symmetric and concave shape, which means that one objective function cannot be further minimized without maximizing the other.On the other hand, Figure 6e shows that the solutions clustered in towards the direction of the non-dominated solution set, which deviates slightly in the direction of the misfit axis of the smoothness term.This deviation reveals the noise condition leading to non-unique solution (Kozlovskaya et al., 2007).Except for the results of SM-1, the distribution of this type clearly exhibits non-unique solution as noted by Dal Moro (2010), suggesting that there are more solutions similarly well satisfy the data misfit when noisy data are present in MT curves.Obviously, several alternative models can be considered the final model when a single-objective function based on MT misfit with its different local minima is used.

Single-objective particle swarm optimization
Figure 7 shows the results of the model with the lowest misfit obtained from repeated experimental solutions using F I G U R E 5 Results for SM-1.The fit between the observed and calculated (a) apparent resistivities and (b) phase angles; (c) resistivity-depth model obtained from the optimal solution indicating the lowest data misfit (green), and 10 experimental solutions (grey); (d) misfits of each objective function versus iteration; (e) two-dimensional objective function space.The optimal solution is marked as a green star, the non-dominated (dark dots) with the dominated solutions (light dots) with the ideal line (dashed) and Theil-Sen regression line (red).
SM-2 data with a single-objective particle swarm optimization (PSO).This figure, although the apparent resistivities and phase angles fit well and the reproduced model successfully identifies the resistive and conductive structure at shallow depth, a particularly conductive and noticeable wrongly determined layer was also observed in the deeper part.Considering the reciprocal relationship between the layer thickness and resistivity, these discrepancies are due to an excessively allowed variation of resistivity in each layer.Although these results yield a model compatible with the average resistivity, the artefact layer could be a misrepresentation of the geologic structures.However, no such difference was observed in the constrained modelling with Pareto-MOPSO, and the model represented by the solution with the lowest data mis-fit in the optimal solution set was successfully reproduced.These results provide an example that indicates the existence of a non-unique solution in single-objective modelling, as in the case of a defined parameterization and search space for this ill-posed problem.

Field data
Figure 8 shows the results of the measurement location FD-2 obtained in the Çanakkale-Tuzla geothermal field as an example for further discussing the solution.Figure 8a,b shows that apparent resistivity and phase angle curves obtained from the MOPSO are well-fitted to the observed data.The caprock, which consists of andesitic Ayvacık volcanics referring to well log observations (see Figure 4), appears to be associated with low resistivities (see Figure 8c).Although this type of volcanic rock generally has high resistivity values, a small increase in clay content can dramatically decrease resistivity values to less than 5 Ω m (Stanley et al., 1977).
Relatively high resistivities beneath the caprock indicate the existence of units composed of schist and limestone that metamorphosed by a granitic-granodioritic pluton partially intruded into the basement.The models of 10 experimental tests presenting similar resistivity-depth sections demonstrate that the proposed method provides accurate results that are arranged around one common model.Although the model parameters show a slightly higher variability in determin-ing the resistivities of the metamorphic rocks, the respective model parameters for the caprock structure appear to be consistently solved.Figure 8d also shows that both objective function terms have a rapid convergence to lower misfits at the beginning of the optimization process and then stabilize at low misfits, as in the synthetic tests.The non-dominated solution set and dominated solutions with ideal and Theil-Sen regression lines are shown found in Figure 8e.As reported by Dal Moro (2010), the optimal solution set, which has a nearly symmetric distribution, indicates that the solutions of the field data at the location FD-2 are obtained with a welldetermined parameterization.Schnaidt et al. (2018) pointed out that asymmetry can occur when the objectives are not fully constraining each other, for example when well-log  1) of 73˚> 45˚also reveals an incompatibility as in the synthetic analyses.The observation that emerges from the analysis of the synthetic and the presented field data is that objective function terms minimized in different ways can lead to incompatibilities.Although the objective function terms in these analyses are incompatible, the obtained solutions appear to be complementary as they are optimized simultaneously.
Figure 9 shows the final models of the resistivity cross sections from all measurement points having the lowest misfit selected of all experimental solutions.The shallow zone having resistivities less than 5 Ω m (see dashed lines in Figure 9) can be attributed to the caprock as observed in the nearby wells.This indicates that clay minerals are abundant at this site in the caprock zone, which increases in depth from north to south.The layers referred to as metamorphic rocks also appear to be consistent with the associated depth section in the borehole and the resistivity range of these rock types according to Telford et al. (1990) and Ward (1990).The relatively high resistivities of the metamorphic rocks at FD-4 and FD-5 may indicate a zone more affected by granitic-granodioritic plutons intruded into metamorphic rocks.The variable low resistivities beneath the metamorphic zone can be neglected because the resistivity of the lowest layer, which is a homogeneous half-space, has the least effect on the observed data (Zhdanov, 2018) and has effect onto the selected models that are used as solutions.The presented results provide only a rough overview of the shallow geothermally overprinted structure.Since this study focuses on the practicality of Pareto-MOPSO on real data, a detailed interpretation of geothermal exploration is outside the scope of this study.
Figure 10a,b shows the apparent resistivity and phase angle data of the five stations and the model responses obtained from the resistivity depth sections shown in Figure 9.The agreement between the observed data and the model responses within the standard deviation appears to be reasonably accurate and satisfactory.Figure 10c shows the non-dominated and dominated solutions of the total five stations with ideal and Theil-Sen regression lines.As can be seen in these subfigures, deviation angles between 70å nd 80˚(Table 1) also indicate the case of incompatibility, as already described for FD-2.This supports previous findings that the incompatibilities observed in the synthetic analysis and FD-2 are very likely due to the objective function terms are modified along different ways.Although the parameter space is limited by the feasible range of rock resistivities and the equation expressed for the smoothness is the commonly used regularization approach, the incompatibility may change if a different constraint or search space is used.As the focus of the study was on the applicability of the Pareto-MOPSO to constrained inversion without considering the aforementioned generalizations, different evaluations for objective compatibility were not attempted in this study.
On the other hand, a nearly symmetric distribution was observed for all points except FD-3 as shown in Figure 10c.Despite the change in the parameterization of the depth models due to possible effects of the inexact model parameterization, the solution set showed similar distributions in the experimental tests for FD-3.This highlights that there was a high trade-off between data fit and model variance, compared to the other points.Unlike the other points, a high noise component and the presence of a resistive structure above the cap rock structure of FD-3 could be responsible for both high data misfit and model variance.Table 1 shows the evaluation criteria GD and SP for all field data.The average GD values near zero for all data points indicate that almost all nondominated solutions are close to the Pareto-optimal set.The values of SP, which are close to zero, also indicate smooth and nearly uniformly distributed solutions.The relatively high SP value of FD-3 may be indicative of a highly trade-off solutions rather than a uniform solution.However, as we mentioned in the 'Evaluation of the solutions' section, it is reasonable to view this value with skepticism in terms of evaluating solution quality.
A limitation of this study is that Pareto-MOPSO was only investigated one-dimensional modelling of F I G U R E 8 Results for FD-2.The fit between the observed and calculated (a) apparent resistivities and (b) phase angles; (c) resistivity-depth model obtained from the optimal solution indicating the lowest data misfit, and 10 experimental solutions (grey); (d) misfits of each objective function versus iteration; (e) two-dimensional objective function space.The optimal solution is marked as a green star, the non-dominated (dark dots) with the dominated solutions (light dots) with the ideal line (dashed) and Theil-Sen regression line (red).

F I G U R E 9
Resistivity-depth models obtained from the optimal solution with the lowest misfit at each measurement point.magnetotelluric data.However, it is common knowledge that 2D or even 3D studies nowadays dominate in the magnetotelluric community.Although high-dimensional modelling of magnetotelluric data using only PSO requires large amounts of computing processing power, Pace et al. (2019b) conducted experiments on 2D modelling of magnetotelluric data using regularized PSO inversion with the traditional L-curve method.On the other hand, presented method in this study can provide additional support for 2D modelling of magnetotelluric data by adding a new dimension in the objective function space to regularize the spatial x-direction as well.Therefore, without the need for an additional regularization parameter for the x-axis, the Pareto-MOPSO solution set can be obtained from a three-dimensional objective space of one F I G U R E 1 0 The fit between the observed and calculated (a) apparent resistivities and (b) phase angles from resistivity-depth model at each station represented in Figure 9. (c) The two-dimensional objective function spaces for the same station with the optimal solution (green star), the non-dominated solutions (dark dots) together with their dominated solutions (light dots), the ideal line (dashed) and the Theil-Sen regression line (red).data misfit and two constraint terms in the spatial x-and z-directions.

CONCLUSIONS
The Pareto multi-objective particle swarm optimization (MOPSO) approach was applied for modelling magnetotelluric data without merging data misfit and smoothness regularization in a single-objective function.Magnetotelluric data obtained from synthetic models and five station measurements from the Çanakkale-Tuzla geothermal field were modelled one-dimensionally.A number of important conclusions can be drawn from this study: (1) The modelling approach presented in this study successfully reproduces the resistivity distribution of a synthetic model even if the data are noisy.A number observations demonstrated that Pareto-MOPSO provides a powerful methodology for simultaneously minimizing the data misfit and smoothness terms without combining them through a regularization parameter.(2) This new approach provides a better constraint for the resistivitydepth model compared to the particle swarm optimization (PSO) results obtained from single-objective.With a large search space of model parameters in the underdetermined case, the resistivity-depth model obtained with PSO may produce artificial layers that do not accurately represent the real model.However, no such difference was observed in the constrained modelling with Pareto-MOPSO, and the model represented by the solution with the lowest data misfit in the optimal solution set was successfully consistent with the real model.This indicates a clear benefit of using the Pareto-MOPSO for the analyses to obtain more meaningful models.
(3) In case of using the swarm intelligence method for 2D and 3D modelling of magnetotelluric data, in which a number of model parameters is too large, the Pareto-MOPSO is recommended instead of the single-objective PSO to find physically reasonable models.Although Pareto-MOPSO could be computationally intensive even in such dimensions, a reduction for continuous forward calculations would be possible by guiding the particles in a solution space.Therefore, this study provides a new framework for high-dimensional modellers dominated by the magnetotelluric community using high computational capacity.(4) One remarkable result of this modelling approach is that the presented methodology allows magnetotelluric modelling without depending on an initial model and derivatives, as well as weighting and combining data misfit and regularization constraints.The findings from this study also suggest that Pareto-MOPSO is very attractive for incorporating other types of constraints used in geophysical modelling into the objective function.

F
I G U R E 1 A schematic representation of particles in model space and one-dimensional objective space; (a) randomly distributed particles in a feasible search space and their positions projected onto the objective space; (b) the particles move towards the global minimum in each iteration as their position changes with a velocity vector in the model space; (c) the particles converge to a unique model and cluster around a global minimum in the last iteration.

F
I G U R E 2 Conceptual representation of the Pareto-front and the dominated solutions in the two-dimensional objective space.Here Φ 1 and Φ 2 are the values of objective function.The set of hyper-rectangles defined with -value is used to maintain non-dominated solutions from each rectangle.model vector and Φ(  ) is a non-dominated solution set.If   ∈  (feasible region) is non-dominated with respect to  ,   is called Pareto-optimal and Φ(  ) is also called Paretooptimal set  * .When the Pareto-optimal set is projected onto a surface, they describe the Pareto-front   * = {Φ(  ) ∈ ℝ |   ∈  * }, which indicates the trade-off solutions that compete with each other in the objective space, as illustrated in Figure

F
I G U R E 3 Apparent resistivities (a) and phase angles (b) that are generated from resistivity-depth functions (c) of the SM-1 model.

F
I G U R E 6 Results for SM-2 with noise added to synthetic data.The fit between the observed and calculated (a) apparent resistivities and (b) phase angles; (c) resistivity-depth model obtained from the optimal solution indicating the lowest data misfit, and 10 experimental solutions (grey); (d) misfits of each objective function versus iteration; (e) two-dimensional objective function space.The optimal solution is marked as a green star, the non-dominated (dark dots) with the dominated solutions (light dots) with the ideal line (dashed) and Theil-Sen regression line (red).

F
I G U R E 7 Results for SM-2 using particle swarm optimization (PSO).The fit between the observed and calculated (a) apparent resistivities; and (b) phase angles; (c) data misfit versus iteration; (d) resistivity-depth model obtained from PSO. information constrains only part of model.It can be concluded that the distribution of non-dominated solution set shown in Figure 8e has approximately symmetric properties, indicating that the used regularization constrains the model almost completely for the given model parameterization.The deviation angle for FD-2 (Table

R
This study was supported by the Scientific and Technological Research Council of Türkiye, Marmara Research Center, abbreviated as TÜBİTAK-MAM.I thank to TÜBİTAK-MAM Earth and Marine Sciences Institute for allowing me to use the Metronix 24-bit geophysical measurement system.D A T A AVA I L A B I L I T Y S T A T E M E N T S The data underlying this article were provided by Earth and Marine Sciences Institute, Marmara Research Center (MRC), The Scientific and Technological Research Council of Turkey (TUBİTAK) under licence.Data will be shared on request to the corresponding author with permission of Earth and Marine Sciences Institute.O R C I D Ersin Büyük https://orcid.org/0000-0001-9440-0918Implementation of the MOPSO algorithm.A PSEUDOCODE OF THE MOPSO Figure A1 outlines the implementation of the MOPSO algorithm.