Comparison of Different Coupling Methods for Joint Inversion of Geophysical Data: A Case Study for the Namibian Continental Margin

Integration of multiple geophysical data is a key practice to reduce model uncertainties and enhance geological interpretations. Electrical resistivity models resulting from inversion of marine magnetotelluric (MT) data, often lack depth resolution of lithological boundaries and distinct information for shallow model parts. This is due to the diffusive nature of electromagnetic fields, enhanced by deficient data sampling and model regularization during inversion. Thus, integrating data or models to constrain layer thicknesses or structural boundaries is an effective approach to derive better constrained and more detailed resistivity models. We investigate the different impacts of three cross‐gradient coupled constraints on 3D MT inversion of data from the Namibian passive continental margin. The three constraints are (a) coupling with a fixed structural density model; (b) coupling with satellite gravity data; (c) coupling with a fixed gradient velocity model. Here, we show that coupling with a fixed model (a and c) improves the resistivity model the most. Shallow conductors imaging sediment cover are confined to a thinner layer in the resulting resistivity models compared to the MT‐only model. Additionally, these constraints help to suppress vertical smearing of a conductive anomaly attributed to a fracture zone, and clearly show that the seismically imaged Moho is not accompanied by a change in electrical resistivity. All of these observations help to derive an Earth model, which will form the basis for future interpretation of the processes that lead to continental break‐up during the early Cretaceous.

A wide range of different geophysical surveys have been carried out along the Namibian Margin. They image the crustal and upper mantle structures both onshore and offshore to investigate the break-up-related features. Seismic studies revealed magmatic geological features such as the hot spot trail Walvis Ridge, seaward dipping reflectors caused during the initial subaerial stage of break-up volcanism (Gladczenko et al., 1998;Elliott et al., 2009), lower crustal high-velocity bodies interpreted as magmatic underplating (Bauer et al., 2000;Gladczenko et al., 1998;Planert et al., 2017), high v P /v S ratio (Heit et al., 2015), and thickened oceanic crust caused by the late stage of break-up volcanism . The gravity modeling study by Maystrenko et al. (2013) imaged high density lower crustal intrusions. Inversion of magnetotelluric (MT) data by Kapinos et al. (2016) and Jegen et al. (2016) shows high resistivities in the middle and deep crust indicative of magmatic processes. While all of these models show similarity concerning the lateral extent of imaged magmatic features, little coherence exists concerning the vertical extent and depth of the geological targets .
Of all these available data, the MT measurements are the only data set with both 3D coverage and depth resolution. They also have the largest depth penetration. However, there is a large range of resistivity models which fit the MT data, thus inversion requires input of additional information. Unfortunately, other geophysical data and parameter models available for joint inversion also have disadvantages. For example, the aforementioned seismic data are only available in 2D and only image down to the Moho. Furthermore, the models result from several different data analysis methods. For instance, Fromm et al. (2017) used forward velocity-and gravity modeling, while Planert et al. (2017) performed 2D tomographic inversion. Similarly, the gravity data yield 3D models, but have very little depth sensitivity. Therefore, a simple comparison of the resulting physical models is difficult, since they have different spatial resolutions and sensitivities. A combination of all these data should, nevertheless, improve the interpretation and lead to a more accurate Earth model that may partly overcome the disadvantages and ambiguities of the individual methods. In this study, we combine the MT and other acquired geophysical data 10.1029/2021JB022092 3 of 28 in various joint inversion approaches and test the feasibility of the features observed in the inverted models based on MT data only.
We implement four inversion experiments to evaluate the benefits of different constraints in 3D MT inversions. The reference is a single method inversion of the MT data (MT-only), and the three joint inversion approaches are: JI1: MT inversion constrained by the 3D structural density model derived from gravity modeling by Maystrenko et al. (2013), JI2: joint inversion of marine MT-and satellite gravity data, and JI3: MT inversion constrained by the gradient 2D velocity model by Fromm et al. (2017) with the 3D MT inversion.
Due to the complex geology in the continental margin regime, defining distinct and commonly valid parameter relationships to convert from velocity to density and resistivity or vice versa is extremely difficult. Therefore, we use the cross-gradient constraint by Gallardo and Meju (2003), which offers a weak structural coupling of the two models. Stronger coupling mechanisms based on a physical parameter relationship may be derived by the presented structural joint inversion, and enable fine-tuning of inversion results in the future.  Fromm et al. (2015) and Gladczenko et al. (1998), respectively, and the SDR extent is a combination of Elliott et al. (2009), Bauer and Schulze (1996) and Gladczenko et al. (1998). Orange color marks evidences for magmatic underplating. The light orange circular shapes onshore mark the high v P /v S structures by Heit et al. (2015), light orange profile lines mark the position of thickened crust and high velocity lower crustal bodies identified by Fromm et al. (2015) and Planert et al. (2017); and the dark orange profiles mark the position of high velocity lower crustal bodies determined by Gladczenko et al. (1998). Green lines are two seismic profiles interpreted by Goslin et al. (1974). Red stars with numbers show the positions of MT stations of this work: perpendicular to the coast is profile P100, parallel to the coast is profile P3.
Our study aims to achieve two main goals: (a) to improve the Namibian Margin Earth model regarding the geological features related to continental break-up, and (b) to investigate the impacts of different joint inversion coupling constraints on the resulting inversion models. Based on these goals, this study focuses on three main objectives: 1. How can we improve the 3D resistivity model from single method MT inversion to gain new insights and which inversion constraints can benefit geological interpretations? 2. What are the differences of using a fixed structural density model (JI1) or gravity data (JI2) to constrain the MT data inversion? 3. What is the impact of two different types of fixed models (structural density-, and gradient velocity model, that is, JI1 and JI3) as cross-gradient coupled model constraints on MT data inversion?
We first discuss the data sets considered in the analysis as well as the pre-existing models that we apply as model constraints in our joint inversion. Subsequently, we describe the joint inversion algorithm applied to the data. The results section is structured in three sections focused on (a) weighting parameters and misfit evolution, (b) data fit of the resulting inversion models, and (c) comparison of the models. For our discussion, we evaluate the model differences by comparing with reference models and other geophysical interpretations, also discussing data fits and sensitivity.

Data and Reference Models
In the following section, we introduce the MT and gravity data sets, and the processing applied to them before inversion. We also describe a 3D structural density model and a 2D local velocity model, which are used for our model-based constrained inversion of the MT data. Table 1 comprises a summary of the data and models used in the four inversion approaches.

MT Input Data
We use data from 32 marine MT stations collected during two cruises on RV Maria S. Merian (MSM 17-1 and MSM 17-2) from November 2010 to January 2011 (Jegen et al., 2021). The stations were deployed along two orthogonal seismic profiles to image the structure along (profile P100) and across (profile P3) Walvis Ridge ( Figure 1). Also, we have included data from eight land-based MT stations, deployed by GFZ Potsdam in October and November 2011 (see Kapinos et al., 2016) to expand the investigated area onshore and account for the electromagnetic coast effect, which arises due to the big resistivity contrast of seawater adjacent to continental crust (Ferguson et al., 1990;Worzewski et al., 2012).
Data processing consists of data rotation, impedance calculation, and dimensionality analysis. The marine MT data were corrected for the instrument's tilt and rotated to a common coordinate system pointing north . Impedance calculation and dimensionality analysis were carried out with the algorithms described by Chave and Thomson (2004), Egbert (1997), and Martí et al. (2009). The onshore stations' processing was conducted by Kapinos et al. (2016) with procedures described in Becken and Burkhardt (2004), Ritter et al. (1998), andWeckmann et al. (2005). Due to limited survey time, the stations of profile P100 include data only up to 10,000 s and the eight land stations up to 1,000 s. However, the MT stations along profile P3 cover periods up to 50,000 s. We interpolated all data (impedance matrix elements and errors) for 16 periods ranging from ∼30 to 5⋅10 4 s and replaced missing data, particularly large period data for land and profile P100 stations, by dummy values with large errors. Dimensionality analysis indicates a clear three-dimensional character of the marine data, that is, impedance matrix elements are independent of each other . Hence, all subsequent work had to concentrate on 3D processing, despite the original survey planning along two 2D profiles. The Supporting Information includes synthetic inversions with the same setup concerning grid size, station location, frequencies, and errors. These indicate that the three-dimensional extent of model features is only replicated to distances of circa 30 km around MT stations. Thus, interpretation of the 3D models is concentrated to the areas close to MT station locations.
MT data quality variations are displayed in Figure 2 for three marine stations. Station 34 (green) is an example of a good quality data set, exhibiting smooth apparent resistivity and phase curves and small errors. Station 31 data (blue) has larger errors and more rugged curves in the XX component but shows good data quality in the other components up to periods of ∼1,000 s. Impedance values at longer periods could not be derived at this station. Finally, station 6 (red) represents stations with poor data quality. All four components, resistivity, and phase curves are not smooth and we observe high errors in all components and various frequencies.

Gravity Input Data
We use satellite gravity data as input for the gravity inversion. The data set used is based on the high-resolution EIGEN-6C4 global field model, which is derived from a combination of the satellite gravity missions LAGEOS (Cohen & Smith, 1985), GRACE (Tapley et al., 2004), and GOCE (Drinkwater et al., 2007) as well as DTU ground data (Andersen et al., 2010;Förste et al., 2014). We use a spherical approximation of the anomaly at Apparent resistivity and phase for three marine MT stations with varying quality: Station 34 (green) as an example for high data quality, station 31 (blue) as an example for medium quality with lacking high period data, and station 6 (red) as an example for poor data quality.
10.1029/2021JB022092 6 of 28 3,000 m height above sea level on a 0.1° × 0.1° grid (Barthelmes & Köhler, 2016;Ince et al., 2019) as depicted in Figure 3a. This height is suitable, because we want to remove the effect of the onshore topography, and therefore need to be well above the highest elevation. The effect of topography on the gravity data is clearly visible in Figure 3a, as gravity highs (red colors) follow the bathymetry lines around Walvis Ridge, several seamounts, as well as the onshore topography around Brandberg Mountain (∼21.1°S, 14.5°E) and the mountain ranges of the Kaokoveld desert at the Angolan border. The data are also influenced significantly by different average densities of continental -(2,810 kg/m³), and oceanic crust (2,900 kg/m³) as well as deeper Moho depths for continents (∼40 km) compared to the oceanic regime (∼10 km).
To resolve underlying density variations of interest, the satellite data need to be corrected for the effect of topography/bathymetry (equivalent to Bouguer correction), as well as for the effect of variation in Moho depth 10.1029/2021JB022092 7 of 28 which corresponds to a simple form of an isostatic correction. For both corrections, we calculated the vertical gravitational component g z of a correction model at 3,000 m using the forward modeling code Tesseroids (Uieda et al., 2016) and subtracted the responses from the input satellite gravity data (following the method in Szwillus et al., 2016). The correction model applied to data is illustrated in Figure 4 (bottom right). Topographic heights and water depth for the first correction were extracted from a spherical harmonic expansion of the ETOPO1 model on a 0.1° × 0.1° grid. Topographic heights (above 0 m) were assigned a value of 2,810 kg/m³ (reference crustal density) and oceanic regions were corrected with a value of −1,780 kg/m³, that is, the difference between an assumed water density of 1,030 kg/m³ and reference crustal density of 2,810 kg/m³. The resulting topographic effect is shown in Figure 3b. After subtracting this effect from the input data (Figure 3c), the residual gravity anomaly is dominated by the difference of the continental and oceanic crust, controlled by lithostatic pressure variations between on-and offshore, that is, crustal thickness varies from ∼10 km in the oceanic domain to ∼40 km onshore.
To account for this continent-ocean difference, we corrected for the crustal thickness effect on gravity by creating a second correction model. Reducing the gravity field of previously known masses or regional effects to improve insight into the "unknown" density structures, is a common practice in gravity studies (cp. Braitenberg & Ebbing, 2009;Kende et al., 2017;Ekinci et al., 2021). For the second correction model, areas with a Moho shallower than reference (35 km) were compensated with a density of 412 kg/m³ (the difference between reference crustaland mantle density (2,810 and 3,222 kg/m³) and areas with a deeper Moho with a density of −412 kg/m³. By using these two constant values, we apply a simple form of isostatic correction based on Airy-Isostasy, only. We presume that due to the old age (∼130 Ma years) of the crust, lithospheric cooling has reached equilibrium in the survey area and no thermally induced lateral density variations exist anymore. Maystrenko et al. (2013) postulated this after their thermal modeling indicated that the temperature structure at depth (below 90 km) is mostly driven by the geometry of the lithosphere-asthenosphere boundary. The 3D Moho depths used in the correction were derived from a combination of a smoothed version of the structural density model presented in Maystrenko et al. (2013) and the global CRUST1 model (Laske et al., 2013). This combination allows us to take differences of continental-and oceanic crustal thickness as well as the thickened crust below the eastern part of Walvis Ridge into account. The gravity effect of the Moho depth compensation (isostatic effect) is shown in Figure 3d. The final gravity anomaly derived after subtracting the topographic and isostatic correction is shown in Figure 3e, and was used as the input gravity data for the subsequent joint inversions wherever gravity data are considered. The  Figure 3f). combined applied corrections ( Figure 3f) lead to gravity data specifically sensitive for density variations within the crust. The two most obvious features of the corrected gravity map are the strong positive gravity anomaly at the landfall of Walvis Ridge, and a clear negative gravity anomaly in the southern Walvis Basin.

Reference Earth Models
In addition to the corrected gravity data set as a constraining input to our joint inversion schemes for MT data (JI2), we use two different physical parameter models as inversion constraints: a 3D regional density model by Maystrenko et al. (2013) for JI1, and a local 2D seismic velocity model along profile P100 acquired as part of the SAMPLE project (Fromm et al., 2015 for JI3. The density model is based on a forward modeling study of 3D gravity satellite data and encompasses a much larger domain than considered in our study. We only use the northern part intersecting with our inversion model. The model has been derived by conducting 3D gravity and thermal modeling and inclusion of 2D seismic profiles that were available in the region at the time (Maystrenko et al., 2013). It comprises homogeneous regions with different density values representing, in addition to water and air, 10 different geological units including five sediments and four crustal units as well as the lithospheric mantle. We extrapolated the density model by Maystrenko et al. (2013) to the North-west and interpolated it onto our inversion grid so it can be used as a constraint in the MT inversion. The identified geological features as well as the geological inferences from the seismic data in our region of interest are described in more detail in the discussion section. As the regional 3D density model is characterized by discrete densities for the different geological units, it provides structural constraints for the inversion of the Earth model. Therefore, we refer to it as the 3D structural density model.
In contrast to the structural density model, we also examined a gradient velocity model as an inversion constraint.
Since no 3D velocity model is available in the region, we had to use a local 2D velocity model. We chose the 2D velocity model along profile P100 (and Walvis Ridge) derived by Fromm et al. (2017), which had been generated through forward modeling of refraction and reflection data. The model is characterized by layers or blocks with downward-increasing velocity gradients, leading to some strong gradients at layer boundaries and gentler gradients within the layers.
To account for the dimensionality difference of the described 2D seismic reference model and the 3D character of our MT data, we performed a 3D joint inversion (JI3) on a narrow cube around the profile (which we consider as "quasi-2D"). We chose a model width of 40 km in strike direction of profile P100 to limit the model to the expected extent of the anomalous crustal structure beneath Walvis Ridge. To build the constraining model, the reference velocity model was interpolated along the profile onto the main inversion grid and extended 20 km left and right of the profile. For the coast parallel profile P3, the strong resistivity variations caused by the coastal resistivity transition prevent the construction of a narrow model for a "quasi-2D" inversion and will therefore be excluded.

Joint Inversion Scheme
We used the jif3D framework for the joint inversion (Moorkamp et al., 2011). It includes a 3D MT integral equation forward engine developed by Avdeev et al. (1997), and an internally developed voxel-based full tensor gravity forward engine (Moorkamp et al., 2010). The framework allows the implementation of a parameter relationship or a structural cross-gradient coupling. The results presented in this study are based on a structural coupling approach because of the lack of commonly valid parameter relationships for the various lithologies found in the study area.
Equation 1 denotes the full objective function which is minimized iteratively using a limited memory quasi-Newton scheme (L-BFGS) (e.g., Avdeeva & Avdeev, 2006;Nocedal & Wright, 2006): The objective function includes the data misfit terms for the MT impedance matrix elements (Φ dMT ) and corrected gravity anomaly data (Φ dGrav ) and measures for the roughness of the physical parameter models (Φ RegMT and Φ RegGrav ). These regularization terms are included in the objective function to attain smooth models by minimizing parameter variations in neighboring cells, and stabilizing the iterative inversion procedure. Coupling between different physical parameter models is implemented through a structural coupling term Φ Cross , based on the cross-gradient of the two different physical parameter models under consideration (Gallardo & Meju, 2003). Cross-gradient coupling has proven to be a powerful tool for coupling in joint inversion in synthetic-as well as real data studies (e.g., Colombo & Rovetta, 2018;Um et al., 2014;Zhou et al., 2015), while it is usually not strong enough to prevent an adequate data fit. The cross-gradient coupling enforces spatial resemblance of the inverted models. This type of coupling is rather loose, as it does not constrain neither the actual parameters, nor their gradient values, including the case when either of the gradients is zero. In our inversion JI2, the cross-gradient couples the two models of the MT and gravity inversions, while in the model-constrained approaches JI1 and JI3, the MT inversion model is cross-gradient coupled to a fixed reference (density or velocity) model.
The data misfit, regularization-, and cross-gradient terms in the objective function are weighted with Lagrange multipliers (w MT , w Grav , λ MT , λ Grav , κ). We employ a cooling strategy for the two regularization weights λ MT and λ Grav , which recovers large-scale structures first and then allows for smaller-scale model variations by subsequently reducing the weights (Moorkamp et al., 2020). A setup with initial small regularization weights results in rough models with large parameter jumps and large regularization-gradient and cross-gradient terms. In the joint data inversion approach (JI2), the MT data misfit term is weighted stronger with a larger w MT than the gravity data misfit term, because its non-linear MT optimization requires many more iterations than the linear gravity data optimization (cp. Heincke et al., 2017). With smaller w MT the gravity data misfit term Φ dGrav quickly reaches a minimum, preventing the MT data to be fit adequately. For the model constrained approaches JI1 and JI3, w Grav is 0 (no second data set is included) and w MT equals 1. Cross-Gradient weight κ is kept high and constant to ensure a strong and steady coupling between the MT inversion and the gravity data or reference model constraints. The objective function's coupling term Φ Cross is calculated from the cross-product of two model's gradients (Gallardo & Meju, 2003;Moorkamp et al., 2011), and is, in parts, multiple orders of magnitude smaller than other objective function terms (e.g., regularization terms (Φ RegMT and Φ RegGrav ) are calculated from model parameter values and data terms (Φ dMT and Φ dGrav ) from input data values). Thus a large weight κ ensures sufficient impact of cross-gradient coupling. After testing many different parameter combinations, we finalized the specific Lagrange multipliers for JI1, JI2, and JI3 which are listed in Table 2. Those weights were chosen to reach the most satisfying results, that is, minimum values and a balanced regress of all objective function terms (Equation 1).
The inversion model grid consists of 96 × 96 × 34 cells. Horizontal cell sizes are uniformly 10 × 10 km 2 , vertical cell sizes increase from 300 m to 50 km to adapt to decreasing sensitivity with depth yet allow for a sufficiently close fit of the topography. We used the ETOPO1 Global Relief Model to constrain the water depth within the model area (Amante & Eakins, 2009). The integral equation MT modeling code requires a background model. We chose 4 layers overlying a half-space representing the air, ocean, sediment, crust, and mantle layers. The lower layer boundaries were set to −0.9, 3, 7, 65.5 km and the respective electrical resistivities to 100,000, 0.3, 1, 50 Ωm, and a 10 Ωm half-space below. These values are intended to represent the average surrounding resistivity structure. This cannot adequately account for the resistivity variations of adjacent crustal domains, that is, oceanic crust to the west and north of Walvis Ridge, continental crust to the east and thickened magmatic crust to the south. To reduce the influence of the background model and to minimize boundary effects, we increase the lateral extent of our model by 250 km beyond the region covered by the MT stations. 1 Ωm. This is underlain by a homogeneous half-space with an electrical resistivity of 50 Ωm. Salt water resistivities are set to 0.3 Ωm. We use a very simple starting model (i.e., halfspace including bathymetry and sediment layer) for our inversions, because we want to make as little assumptions as possible about the subsurface resistivity distribution. This way, we can investigate a large solution space and allow modifications of the resistivity distribution enforced by the cross-gradient coupled models without confining the model's resistivity distribution too much. To test whether a more evolved starting model improves our joint inversions, we performed separate inversions with a pre-fitted resistivity model. This pre-fitted starting model was the inversion model resulting from an MT-only inversion with high regularization after 75 iterations. At this point, the RMS MT data misfit was reduced from 6.55 to 3.92. The resulting model included increased mid-crustal resistivities and distinct shallow conductive basins, instead of a simple sediment layer. In the joint inversion with gravity data (JI2), this pre-fitted resistivity starting model seemed to have no significant effect on the final inversion results. However, for joint inversion approaches JI1 and JI3, the starting models seemed too specific causing large values in the cross coupling term that subsequent inversion iterations could not reduce. Hence, we computed the final inversions shown here with the half-space resistivity starting model described above, to allow the most flexible inversion convergence. Model parameters of the ocean layer and bathymetric variations were fixed during the inversion.
The density starting model for the full joint inversion of both data sets (JI2), requires more structure than the resistivity starting model, because gravity inversion alone has no depth sensitivity. We created an initial model with large-scale structures based on the density model by Maystrenko et al. (2013) (Figure 4, top left). In order to generate our inversion starting model, a correction model ( Figure 4, top center) was subtracted, which combines two corrections: First, conversion of absolute densities to density anomalies, by subtraction of a very simple 1D Earth reference model ( Figure 4, bottom left). This Earth reference model consists of three layers representing air (above 0 km; 0 kg/m³), crust (0-35 km; 2,810 kg/m³), and mantle (below 35 km; 3,222 kg/m³). Second, subtraction of the previously described data correction models for topography-and Moho depth variation ( Figure 4, bottom right) to account for the corrections applied to the gravity data (cp. Figure 3f). Working with cross-gradients, coupling is in theory only implemented at structural boundaries, that is, parameter contrasts. The second described correction corresponds to a removal of the model's two strongest density contrasts, that is, topography and Moho. As topography is retained during inversion, the removal, in practice, only applies to the Moho. Removal of the Moho density contrast has the following implications: first, inversion concentrates on anomalies within the crust, otherwise cross-gradient coupling would direct a main focus on the strongest gradient at the Moho making this contrast a primary target. Based on the MT-only and JI1 inversions, we do not expect a strong resistivity contrast at Moho depth, and therefore do not want this to be the main joint inversion coupling objective. Second, density anomalies normally do not cross the Moho, they are either anomalies to "normal crustal density" or anomalies to "normal mantle density" (which are defined by the 1D Earth reference model). Thus, no artificial boundaries are introduced by removing this main contrast. Third, the removal of a presumed Moho depends on the hypothesis, that its depth is correct and we cannot account for Moho depth uncertainties. Consequently, interpretation of density anomalies directly above or below the presumed Moho has to take this uncertainty into account, for example, increased upper mantle density could in fact indicate a thinner crust than assumed. Based on these corrections, the JI2 density anomaly model comprises deviations from the assumed corrected reference model ( Figure 4, top center). The starting model is characterized by the main features: (a) reduced density at shallow depths indicating sediments, (b) slightly increased density for oceanic crust, and (c) a density increase for the proposed lower crustal magmatic underplating.

Results
We compare the 3D MT data inversion to three different joint inversions: In JI1, we use the structural density model as an external constraint to MT data inversion, in JI2, we apply a joint inversion of the MT-, and corrected gravity data with the resistivity-, and density starting model described in the previous section and in JI3 we perform a "quasi-2D" inversion of the MT data along Walvis Ridge with a 2D gradient velocity model from the congruent seismic data set as a constraint. In the comparison and analysis of our inversion results, we consider three aspects. First, we examine the influence of the chosen weighting parameters and the development of the different objective function terms. Second, we compare the data misfits which are achieved by the inversions in detail. Third, we compare the resulting physical parameter models highlighting differences emerging from the different coupling strategies, and analyze these features further with a sensitivity analysis.

Weighting Parameters and Misfit Evolution
Inversion progress is heavily steered by the weighting parameters (w MT , w Grav , λ MT , λ Grav , and κ), which drive inversion convergence. For our analysis, we monitor the development of the different misfit terms (Equation 1): MT and gravity data misfits (Φ d(MT/Grav) ), MT and gravity regularization (Φ Reg(MT/Grav) ), and cross-gradient coupling (Φ Cross ). Due to the nature of the calculation of the misfit terms (see Moorkamp et al., 2011), the values differ by orders of magnitude, and require weights to be set to achieve a balance of the terms in the objective function.
The applied weights are summarized in Table 2, the evolution of each term is shown in Figure 5. Please note that values shown for data misfits in Figure 5a are RMS (Equation 2, averaged over all stations and frequencies) values. Therefore, values approaching a value of one indicate a fit of the modeled and observed response within the assumed data errors. The values shown for the regularization and coupling term (Figures 5b and 5c) are unweighted objective function terms, and therefore span different magnitudes which is regulated by the Lagrangian weights λ MT , λ Grav , and κ.
For the MT-only and model constrained approaches JI1 and JI3, only the MT data set is inverted, therefore w MT = 1 and w Grav = 0. In the joint data inversion approach (JI2), the two data sets are weighted differently to account for the much slower convergence of MT compared to gravity inversion (cp. Heincke et al., 2017). After testing different ways to balance both data sets, the best results were achieved by using weights w MT = 50 and w Grav = 1. Smaller w MT values lead to a fast fit of the gravity data, while MT data cannot be fitted well. The joint inversions were stopped when the MT data misfit had reached the single-method MT data misfit (3.01 for JI1 & JI2, 3.84 for JI3). All data misfits (Figure 5a) experienced an initial drop and decrease more moderately afterward. The two model-constrained approaches (JI1 and JI3 corresponding to the green and orange line in Figure 5) required distinctly more iterations than the MT-only and joint data inversion (JI2). This implies that using cross-gradient coupling with a fixed model strongly reduces the speed of inversion convergence.
For regularization weights λ MT and λ Grav , we implement a cooling scheme, starting with high values and successively reducing them (see Table 2) after each 75 iterations. The specific regularization weights are chosen to achieve an optimal balance between computation time and efficient convergence. However, optimization is stopped automatically before, if no suitable step size can be found to significantly minimize the objective function with the L-BFGS method (Avdeev & Avdeeva, 2009;Tarantola, 2005). Once the final weights λ MT and λ Grav are reached, the inversion continues as long as the value of the objective function can be further reduced or the target RMS is reached. The regularization terms of the objective function show, just like the data misfit, an initial decrease that we can attribute to the initial smoothing of sharp boundaries in the starting models ( Figure 5b). Afterward, the regularization terms increase as models develop more distinct and smaller-scale features resulting in increased model roughness. In (a), the overall model's RMS data misfit is depicted. The dotted lines indicate the target RMS misfit for the full 3D (MT-only (red), JI1 (green), JI2 (blue)) and the "quasi-2D" (JI3 (orange)) inversions. This is the final value reached by the single-method MT inversions. In (b), the development of the regularization terms Ф RegMT and Ф RegGrav without weighting parameters is depicted. In (c), the development of the coupling terms Ф Cross without weighting parameters is shown. Red is the MT-only inversion, green is JI1, both blue colors are JI2 (MT and gravity method), and orange is JI3.
The cross-gradient coupling weight κ was kept high during inversion to ensure a strong influence of cross-model coupling and balance the term Φ Cross with respect to the data misfit. The coupling terms Φ Cross show a fast initial drop and then smaller decreases for JI1 and JI2 (Figure 5c). While at larger iteration numbers, the inversion still alters the models to decrease the data misfit, the coupling term in the objective function does not change significantly any more. This can either indicate that model gradients are unified, and structural similarity is achieved (ideally), or that resistivity model changes are focused mainly inside constant features of the cross-model (i.e., for the density model constrained approach JI1). For the cross-gradient coupling with the 2D velocity-model (JI3), the cross-coupling term increased beyond iteration 250 (orange line in Figure 5c). At this point, the MT data misfit could only be improved by allowing for more structural dissimilarity between the resistivity model and the velocity model.
Increasing the coupling weight κ during the final iterations of JI2 (blue line, Figure 5c) does not change the MT data misfit but enabled us to decrease the regularization weights λ MT and λ Grav further without an otherwise strong increase in the coupling term Φ Cross .
In summary, we observe that all terms in the objective function show initial strong drops, caused by fitting of the inversion models to main data anomalies, and smoothing of the abrupt starting model boundaries. Also, the two model-constrained approaches (JI1 and JI3) converge much slower compared to the MT alone and joint data inversions (JI2). The reason for this reduced inversion speed is that in model-constrained joint inversion approaches, the structure-guided direction search departs from the optimal for MT data misfit reduction. Also, in joint data inversion (JI2), two models can be modified in order to satisfy the different objective function terms. Finally, we find that the regularization terms' behavior is strongly dependent on the corresponding weights. As we use a cooling strategy for λ MT and λ Grav , misfits Φ RegMT and Φ RegGrav increase (Figure 5b), however, the weighted misfits decrease.

Data Fit of the Resulting Inversion Models
Depicting MT data as pseudo-sections is an easy way to get a first overview of possible resistivity structures. Apparent resistivities ρ a can be calculated directly from the impedance matrix elements and plotted against period as an indirect measure for depth. Figure 6 shows pseudo-sections for the XY element of the observed apparent resistivity ρ a in direct comparison to the final model responses achieved by our four inversion approaches as a first evaluation of the achieved data fit. The observed data (second row, Figure 6) show low apparent resistivity anomalies in the short periods (especially stations 13-19 and 25-30), which are well represented in all modeled apparent resistivity data (rows 3-6 in Figure 6). The models also account for the high apparent resistivity anomalies at intermediate periods (especially stations 3-13 and 33-43), yet none of the four inversion approaches matches its full extent. For the onshore stations (right column, Figure 6), the observed apparent resistivities show large variations between neighboring stations 100 to 120, which are located along a coast-parallel profile with distances of around 20 km. The inversions do not result in similarly varying apparent resistivities.
To further evaluate the data fit of the final four resistivity models, we compare the root mean square (RMS) misfit of the impedance matrix Z (Equation 2, misfit calculated separately for each frequency and station). This misfit denotes the difference between a measured (Z obs ) and a calculated (Z syn ) parameter and is weighted by the parameter error Z err .
The final overall RMS data misfits (averaged over all frequencies and stations) are stated for all models in Table 2. In Figure 7, we show the RMS data misfits for each MT station and each frequency together with the mean impedance element's error Z err of Equation 2. This depiction allows to analyze where our resistivity model cannot explain the measured data well and hence the results should be interpreted cautiously. Several land stations show poor fits in all inversion approaches, while the corresponding impedance errors are high. These poor fits emphasize the difficulty to identify complex onshore resistivity structures (see Kapinos et al., 2016) with few MT stations and indicate the presence of noise in the onshore data measurements. The outer stations of offshore profile P3 (stations 1-3, 23) mostly have RMS misfits above 4, while having very low impedance errors. Stations 1-3 are located just north of the Florianopolis Fracture Zone that marks a change of thick crust below the Walvis Ridge to thin crust in the Angola Basin. The southern-most station 23 is isolated due to data loss on stations 20-22 and situated close to supposedly deep sedimentary basins (Maystrenko et al., 2013;Stewart et al., 2000). The high RMS data misfits are a first indication for complex resistivity structure at depth that may not be sufficiently resolved by the data coverage and model scope. Thus, the inversion results for the areas around the Florianopolis Fraction Zone, in the southern part of the study area, and onshore have to be interpreted with care.
The gravity data's final RMS misfit in the joint data inversion (JI2) is highly dependent on the coupling weight κ. A weak coupling constraint allows for a free fit of the density model. In the joint inversions, we vary κ between 10,000 and 500,000, reaching gravity data RMS between 1.4 and 3.7, while gravity inversion alone reaches an RMS of 1 (i.e., ideal fit) after ∼200 iterations. The spatial variation of gravity data misfit is shown in Figure 8. The largest differences occur at the abrupt bathymetry changes along the Florianopolis Fracture Zone and the landfall of Walvis Ridge. Residuals from gravity modeling by Maystrenko et al. (2013) show similar patterns, that is, high residuals are observed in the same region, and positive and negative residual anomalies occur next to each other. Rapidly changing crustal composition, small-scale seamounts, or sedimentary depocenters together with insufficiently resolved bathymetry variations can all be potential reasons for these increased data residuals. Neither a reduction of the gravity inversion cell size, nor an unconstrained gravity inversion of a half-space fully eliminates this pattern.
In summary, the inversions reach satisfactory data fits and are capable of resolving large-scale variations indicated by the apparent resistivity pseudo-sections. Larger MT data misfits are concentrated at the outer parts of profile P3 and the land stations, which indicates model uncertainties in these regions. The largest gravity data misfits are located at the landfall of Walvis Ridge and most likely correspond to short-wavelength gravity features generated by local density variations which are not resolved by our density model and the available bathymetry.

Model Comparison
To compare the four resulting inversion models, we present vertical slices along profiles P100 and P3 through all final physical models (Figures 9 and 10). The vertical slices are chosen to lie right underneath the MT stations, where the resistivity models are best constrained by the MT data. Figure 9 shows final resistivity models along profile P100 for all four approaches, that is, MT-only and JI1, JI2 and JI3 on the left column. In the right column, we depict the corresponding slice of the structural density model (JI1, Figure 9c) used to constrain our MT data inversion, the final density- (Figure 9e) and density anomaly (Figure 9f) model from the joint data inversion approach (JI2) and the velocity model (JI3, Figure 9h) along the profile that strikes along Walvis Ridge that constrained our MT inversion. Figure 10 shows the corresponding results for Profile P3 except for JI3 which could not be performed on this profile. For better comparability with the cross-model of JI1, we transform the density anomaly model (Figures 9f and 10f) of JI2 back to absolute densities (Figures 9e and 10e) by adding the correction model to the inversion result (reverse to what is described in Figure 4). To compare the three-dimensional extent of structural features, we show horizontal slices through the final 3D resistivity models at 25 km depth and the corresponding horizontal slices of the structural density model (JI1) and the density as well as density anomaly model derived from JI2 ( Figure 11). We focus our discussion on the most prominent three features in terms of size and resistivity contrast. These features are the shallow conductors corresponding to sediments (e.g., C2 and C3 in Figures 9 and 10), the large resistor at intermediate depths associated with magmatic underplating (R in Figures 9-11) and a deep, narrow conductor on profile P3 around profile kilometer 90 (C1 in Figures 10  and 11) in the vicinity of the Florianopolis Fracture Zone.
Prominent shallow, low-resistivity features associated with marine sedimentary basins occur in the same locations in all four inversion models, while some vary in thickness. Along profile P100 (Figure 9), two wide basins dominate from profile kilometers ∼20-100 and ∼220-380 (anomaly C2). On profile P3 (Figure 10), a more continuous, upper, conductive layer with several incisions (anomaly C3) is visible in the resistivity models. The low-resistivity sediments reach thicknesses of about 8-10 km in the MT-only resistivity models. In the joint inversion resistivity model (JI2) depicted in Figures 9d and 10d, these conductors are almost identical to the MT-only model with a slight conductivity increase (e.g., the basin at profile kilometer 150 on P100). In the corresponding density model (Figures 9e, 10e, 9f, and 10f), low-density anomalies are observed analogue to the conductivity anomalies. For the two model-constrained approaches JI1 and JI3 (Figures 9b, 9g, and 10b), the sediments have more shallow lower boundaries (e.g., anomaly C2 on profile P100 at ∼3-5 km depth) due to the cross-models' strong vertical gradients from sediments to upper crust.
We conduct sensitivity tests by varying the feature's resistivity values and extent to investigate the resolution capabilities of the sediment layers in our inversion models on one hand and to examine the nature of the described shallow boundaries introduced from the structural density and velocity constraint models on the other hand. The sensitivity test models were produced by altering the final JI2 inversion models based on the model differences indicated from our added constraints. Forward calculations of these models allow us to observe the influence of structural-or parameter changes on the MT-and gravity data fit. The final JI2 inversion models were modified, by (a) increasing sediment resistivities (the resistivity in every cell with a resistivity below 12.5 Ωm is doubled) to challenge the need for shallow conductors with an accompanying increase of the associated density anomaly; (b) decreasing sediment thickness (cells with resistivity below 12.5 Ωm and density anomaly smaller than −100 kg/ m³ are averaged over a vertical window of 5 cells, which increases those values but ensures smooth gradients) to test whether thinner sediments could fit the data; and c) reducing the sediment thickness by about a factor of 2 while simultaneously decreasing its resistivity to ensure a constant conductivity-thickness ratio (conductance).
Density anomaly values of test model (c) are multiplied by 1.4 to account for the decrease in thickness (we tested for factors of density increase between 1.2 and 2.0). Exemplary slices through the inversion-and test models along P100 are shown in Figure 12 alongside the response data for MT station 28 on this profile and the gravity anomaly data along the profile. Changing the sediments in the resistivity model changes the MT response in all four components emphasizing the 3D effect of the sediment distribution. The MT response curves of test (a) deviate significantly from the observations, which implies that generally shallow low resistivity values are needed to fit the data. While the MT curves of test (b) seem close to the response of the inversion result for this specific station, the overall MT data RMS (summed over all stations) of 17.8 shows an insufficient data fit. Test (c) shows that the test model's MT responses are very close to the inversion model response. It emphasizes the MT method's sensitivity to conductance and its incapability to resolve both conductivity and thickness, particularly for shallow structures. The gravity data fit along profile P100 ( Figure 12, left panel, top) shows a similar behavior. Tests (a) and (b) (cyan and blue curves) show a distinct increase in the gravity anomaly for the profile kilometers most affected by the sediment alteration (250-400 km). The green curve (test c), where sediment thickness is halved and density anomaly multiplied by 1.4, fits the gravity data along the profile almost as well as the original inversion response. For this test, we continue the inversion for 25 iterations after which data misfits are reduced to 3.04 for the MT data and 1.68 for the gravity data, while the altered sediments remain thinned and less resistive and less dense. In the Supporting Information, we also present synthetic inversions, with altered sediment thickness (Figures S1, S2, S3, Table S1, and Text S1). They show, that MT inversion is in fact sensitive for variations in sediment thickness, but not able to correctly reproduce the input forward models. Vertical smoothing usually leads to an overestimation in sediment thickness. The density anomaly models for JI2 of the synthetic inversions are insensitive to a change in the resistivity model's sediment thickness, which is reflected in an increased objective function's coupling term.
Most eminent in all inversion models is the large, high resistivity body R from about 16 km depth downward, stretching from the coast seawards along Walvis Ridge. While the resistor seems continuous with depth and also laterally apart from the decrease in resistivity around profile kilometer 220 in profile P100, it becomes discontinuous in the structural density model constrained joint inversion JI1. The boundaries of these discontinuities coincide with the strong parameter gradients in the cross-model at the top (∼19 km) and bottom (∼42 km) of the high density lower crustal body (3050 kg/m³) ascribed to magmatic underplating in the structural density model. The resistor's lateral extent along the off-profile-axis in this density-constrained resistivity model of JI1 is furthermore limited to areas close to MT stations, while it stretches over the adjoining areas in all other inversion models (Figure 11b compared to Figures 11a and 11d). While the strong resistivity contrast marking the transition from continental crust to magmatically imprinted oceanic crust directly at the ridge's landfall remains, the highest resistivities in the density-constrained model are focused more seawards around 200 km on profile P100. In contrast, the MT-only inversion resistivity model has the highest values at kilometers 300-500 (comparison of Figure 11. Horizontal slices through the physical inversion models at 25 km depth. Shown are the final resistivity models for the three 3D approaches (a) MT-only, (b) JI1, (d) JI2 on the left with the according cross-models (density-and density anomaly) on the right. Anomaly R presents the large mid-to-lower crustal resistor associated to magmatic underplating. Conductor C1 coincides with the Florianopolis fracture zone. Overall RMS data misfits as well as summed cross-gradient coupling terms are shown in the white boxes. Gray stars denote the positions of MT stations, gray lines are bathymetry. Figures 9a and 9b). The joint inversion JI2 resistivity model (Figures 9d and 10d) shows a smooth resistor, almost identical to the MT-only approach. In this inversion, the distinct initial high-density body just below Walvis Ridge is smoothed out vertically (Figures 9e, 10e, and 10f). Increased densities are still focused below the ridge, but cover a larger depth range reaching below the Moho. The MT-only, JI1, and JI2 models show a decrease of resistivity around depths of 80-100 km (visible at the very bottom of Figures 9a, 9b, 9d, 10a, 10b, and 10d). However, they show no marked decrease in resistivity at Moho depths. Finally, in the velocity-constrained JI3 resistivity model along Walvis Ridge (Figure 9g), the resistor is, compared to the MT-only and JI2 model, laterally smoother and the most resistive part is shifted seawards. This resistor matches the lower crustal high velocities stretching We conducted tests to examine how much the form of this high-resistivity, high-density body could be changed without influencing the response data and thus the data misfit. The tests we conducted are: (d) reducing the feature's thickness and simultaneously increasing its resistivity and density anomaly to check to which degree its thickness can be reduced to match the velocity and density models; e) removing the resistor anomaly north-east of the profile intersection by reducing the highest resistivities; and (f) removing the resistor anomaly south-east of the profile intersection. Compressing the feature to a thinner, more resistive and more dense anomaly (test d) has a strong impact on the MT and gravity responses. We test for different thicknesses (50 and 100 km) and increase the feature's resistivity step-wise up to 10 times the original value, and increase its density anomaly with a factor between 1.1 and 2. It is not possible to achieve similar data misfits for neither the MT nor gravity data. To decrease the misfits, inversion has to be continued for up to 100 iterations and the weighting parameters w MT , w Grav , and κ have to be adapted. This continuation and adaption lead to lower crustal resistivity and density values of up to ∼75,000 Ωm and 2,950 kg/cm³, respectively. While the density values are feasible, the extremely high resistivity values are very unlikely. Therefore, it appears that high-resistivity values need to reach deep (at least 80-100 km) into the Earth's mantle and we do not expect a drastic change of resistivity at Moho depth. Synthetic inversions which are shown in the Supporting Information, back up this observation ( Figure S3 and Text S1). Tests (e) and (f) challenge the reduced off-axis extent of the resistor R indicated by the JI1 resistivity model (Figure 11b). The tests indicate the limited horizontal resolution away from the MT stations. If values in close proximity (∼30 km away from the stations) are kept constant, changes of resistivity affect the response curves only marginally. Therefore, this test emphasizes that MT inversion results far from stations have to be interpreted with great care as the results may be driven mostly by smoothing or the influence of the 1D background model. Therefore, it is likely that strong additional constraints can alter the resistivity model in area's far away from MT stations relatively freely, meaning interpretation of such features would rely mostly on the credibility of the constraining cross-model or data. We did not perform tests on the density model in this context, because the resistivity model tests demonstrated, that gravity inversion runs quite freely far away from MT stations and is therefore mostly unconstrained. Additionally, to the described sensitivity tests (d)-(f), we also estimate the inversion's responsiveness to the distinction between the highest resistivity values (i.e., differentiation between 2,000 and 10,000 Ωm). It shows that values above 2,000 Ωm are only barely distinguishable, as the MT response curves do not change substantially, when confining resistivities to a maximum of 2,000 Ωm. Therefore, we do not interpret variations in resistivity that occur within the highly resistive regions of our model. Additionally, the synthetic inversions presented in the Supporting Information confirm, that the applied inversions may not distinguish between high upper mantle resistivities of for example, 1,000 and 2,000 Ωm ( Figure S3 and Text S1) but demonstrate that high values below Moho depth are likely.
The third prominent feature in the inverted resistivity models, is the deep, narrow low-resistivity anomaly on profile P3 at ∼90 km and below station 3 at ∼10.1°E, 18.5°S (Figures 10 and 11, anomaly C1). This anomaly could be related to the Florianopolis Fracture Zone marking the northern edge of Walvis Ridge (cp. Figure 1). Water intrusions or accumulation of conductive ore minerals resulting from hydrothermal circulation can significantly reduce electrical resistivity in shear-and fault zones (Biswas et al., 2014;Unsworth & Bedrosian, 2004). All inversion approaches result in this vertical conductor. In the density-constrained resistivity model of JI1, however, the feature is significantly less pronounced and interrupted at ∼20 km depth (Figure 10b, anomaly C1). The conductor has a very small lateral extent, as seen in the top view ( Figure 11). Therefore, it seems to be backed mostly by data of one station (station 3), although the apparent resistivity of that station shows no obvious resistivity decrease compared to neighboring stations (see Figure 6, top row).
To examine resolution capabilities for this feature, we performed similar tests as with the sediment's responses. Models were modified by (g) increasing the feature's resistivity (every cell with a resistivity below 3.3 Ωm is increased by half its value) and decreasing the density anomaly value in the same cells by half of its value, to test for the general need of low resistivities and high densities; (h) drastically decreasing the feature's depth (cells below 15 km with resistivity below 10 Ωm are set to half-space resistivity of 50 Ωm, while the positive density anomaly values in the feature's area below 15 km are set to 0 kg/m³) to challenge the extreme depth of this conductor/high-density anomaly; and (i) decreasing the feature's depth extent and significantly increasing the resistivity at depth (below 15 km, all cells in the area are set to at least 1,000 Ωm), while setting the entire density anomaly model in the area to 0 kg/m³ below 15 km (positive and negative anomalies). Additionally, in test (i) the low resistivity anomaly is extended south and south-west, to form a ridge-parallel elongated feature like a seaward-extending fracture zone. Figure 13 shows vertical slices through the inversion-and test models along P3 alongside the MT response data for station 3 on this profile and the gravity anomaly along the profile. Test (g) has the strongest MT data misfit increase (the single RMS misfit for station three is 4.5, compared to 3.0 in the original inversion model), while the gravity data misfit does not change. This demonstrates the general need for a low resistivity anomaly, while the density contrast may be less pronounced than indicated by the joint inversion. Tests (h) and (i) have similar MT response curves (see Figure 13, right, dark blue and green lines). The gravity data fit of test (i) is significantly reduced and the gravity anomaly along profile P3 (Figure 13, left column, top) Figure 13. Depiction of test models and responses concerning fracture zone resistivity, and density anomaly. The left panel shows gravity anomaly along profile P3 (top) and slices through the inversion-, and the three tests models along profile P3. Colors represent resistivity models, black contours represent density anomaly models (contours are drawn at 10, 20, and 30 kg/m³ deviation from correction model). Second: result of JI2 inversion. Third: Test (g) increased resistivity and decreased density of supposed fracture zone. Fourth: Test (h) resistivity/density anomaly is confined to top 15 km. Bottom: Test (i) shallow, elongated anomaly with increased resistivity and no density anomaly beneath. The right panel shows the MT response data of station 3 to the resistivity models depicted on the left. Gray triangles denote the positions of MT stations, station 3 is highlighted with a purple triangle on the model slices.
shows a divergent curve. As the calculations of test (h) show a better data fit than those of test (i), we believe that the northern edge of Walvis Ridge is in fact a frontier of a change in resistivity and density related to the change in crustal thickness between the Angola basin and Walvis Ridge. However, the adequate data fit of test model (h) also indicates that the exceptionally low resistivity values and increased densities at depths greater than 15 km, are not required to fit the data, but anomalous values may be confined to the top 15 km of the crust. Synthetic inversions of this survey setup support this hypothesis, because the C1 feature is replicated there as well. The attenuation of the artifact due to a different crust-mantle transition or inclusion of a conductor in the synthetic data, demonstrates the strength of the effect linked to a sudden change in crustal thickness and structure ( Figure S3, Text S1). The elongation of the conductor parallel to Walvis Ridge (and the MT stations of P100) applied in test (i), results in only small changes of the response curves of nearby stations, which emphasizes once more the lack of resolution away from MT stations.

Discussion
Our analysis focuses on four different strategies of inverting the same MT data set with and without additional constraints. The MT-only inversion is a re-evaluation of the results presented in Jegen et al. (2016), for which a sign-error in the data rotation matrix was identified after publication. The most important deviation of our new resistivity model compared to the previously published model is the lack of a conductor "C" (Figure 5 in Jegen et al., 2016) between the coast and profile P3, south of Walvis Ridge. Now, we observe these low resistivity zones only close to the station's profiles (conductors C2 and C3 in Figures 9 and 10). They are correlated to the low apparent resistivities on stations 25-31 and 14-17 ( Figure 6). The sensitivity tests described above show a limited resolution capability far away from MT stations. Thus, we conclude that the rotation error has led to an exaggerated distortion of these shallow low resistivities. To discuss the quality of the three integrated inversion approaches, we use the comparison of our three resulting resistivity models with the constraining models used as well as their interpretation. We compare the joint inversion models with independent seismic models and other data in the region where possible. Other important criteria in the inversion comparison are the MT data fit and model roughness as well as the data sensitivity tests described in the previous section. All these criteria will be discussed for the different inversions JI1, JI2, and JI3 focusing on the three main model features sediment cover (especially anomalies C2 and C3), magmatic underplating (anomaly R), and Florianopolis Fracture Zone (anomaly C1). The constraining models and independent data show the following features.
Seismic data along profile P100  show that there are up to 2-km-deep sediment basins on top of Walvis Ridge that are separated by seamounts at profile kilometer ∼110 and ∼180. This broadly agrees with Goslin et al. (1974)'s interpretation of at least 2.5-km-thick sediment cover along two ridge-crossing profiles. Further east (220-340 km), Fromm et al. (2017) identified reduced seismic velocities between 2 and 6 km depth, which correlate with the connection of the northern most part of the Walvis Basin with the southern edge of Namibe Basin (Stewart et al., 2000). Along the coast-parallel profile P3, Planert et al. (2017) imaged slightly increased (∼3 km) sediment thickness north of Walvis Ridge, and a sediment cover south of the Ridge increasing in thickness from ∼0.5 km at the northern edge of Walvis Ridge to thicknesses larger than 2 km in Walvis Basin. Coast-perpendicular seismic profiles along the Namibian margin south of Walvis Ridge (Bauer & Schulze, 1996;Elliott et al., 2009;Gladczenko et al., 1998) show clear evidence for seaward dipping reflections (SDR) indicating the presence of subaerial lava flows reaching thicknesses of ∼7 km.
In the lower crust below Walvis Ridge, seismic studies reveal high seismic velocities Planert et al., 2017) and Maystrenko et al. (2013) postulate a high-density lower crustal body. The studies interpret these features as magmatic underplating. This may coincide with an increase in electrical resistivity due to its low porosity and mafic nature, that is, depleted in incompatible elements and volatiles and rich in olivines and quartz (Eldholm et al., 2000;Gernigon et al., 2004;White & McKenzie, 1989). The seismic studies image thicknesses of this feature of ∼8-15 km directly beneath the ridge and ∼2-8 km south of it. In contrast, the structural density model by Maystrenko et al. (2013) predicts thicknesses of this layer to be in the range of ∼10-30 km. Gladczenko et al. (1998) and Bauer et al. (2000) also image high-velocity underplating along the Namibian Margin south of Walvis Ridge of similar thickness. The upper boundary of underplating is located between 18 and 23 km depth in all models.
10.1029/2021JB022092 23 of 28 The last feature we compare with reference studies is the Florianopolis Fracture Zone north of Walvis Ridge. Its existence has been confirmed by various seismic studies Gladczenko et al., 1998;Goslin et al., 1974;Planert et al., 2017;Sibuet et al., 1984) as an abrupt decrease in crustal thickness and northward-increasing water depth. However, except for the lithospheric and crustal thickness variation across the fracture zone, there are no additional velocity or density anomalies.
The resistivity models of MT-only and joint MT-gravity data inversion (JI2) show almost no difference and identical MT data-and regularization misfits along with a low gravity data misfit. We attribute this observation to the fact that the gravity data has a large solution space and can be fit very freely (e.g., lack of vertical resolution), thus fails to reduce the MT solution space significantly. In fact, it is the MT inversion model which constrains the density model. Concerning the sediment cover along Walvis Ridge, we observe a deepening of sedimentary basins in the density model (e.g., profile P100 Figures 9e and 9f, 220-570 km, and profile P3 Figures 10e and 10f, 200-600 km) compared to the structural density model (Figures 9c and 10c). The shallow low-resistivity anomalies C2 and C3 are imprinted onto the density model through the cross-gradient term. On profile P100 (Figure 9d), the shallow conductive structures are laterally varying. Increased resistivities coincide with seamounts interrupting sediment cover, and conductors (e.g., C2) coincide with sediment basins observed in seismic and gravity studies Goslin et al., 1974;Stewart et al., 2000;cp. Figure 9). The conductor's thickness of up to 10 km clearly exceeds the references' values of ∼2-6 km. Low resistivity anomalies C3 along profile P3 between 240-300 and 320-370 km correspond to seaward dipping reflectors imaged on transects 2 and 3 in Gladczenko et al. (1998). The conductive anomalies reach ∼10 km thickness, slightly exceeding the ∼7 km stated by references. For the second model feature, the large mid-crustal resistor R, lateral extent of high resistivities/densities along the profile is well matched by observations of Fromm et al. (2017), Gladczenko et al. (1998), Maystrenko et al. (2013), and Planert et al. (2017). In our JI2 resistivity model, the top of R is located slightly higher (at ∼15-20 km depth) than in the reference models. While underplating is bounded by the Moho in the seismic models, no distinct bottom, and therefore thickness, can be identified in the resistivity model. This observation is consistent with our sensitivity tests and synthetic inversions that support high resistivity values at depths below Moho. The lack of an electrical resistivity contrast across the seismic Moho is a common feature indicating that the electrical resistivity is governed more by the pore space volume than chemical differences in the rocks (Jones, 2013;Wang et al., 2013). The JI2 inversion of the gravity data results in a density anomaly model, that is, the difference between the resulting inversion and the reference model. Thus, interpretation of the final absolute density models (as shown in Figures 9-11e) has to take into account, that the sharp, constant model (correction model, see Figure 4, top center) is added to a smoothed anomaly model (Figures 9-11f). This may result in artificial features caused by the fixed block boundaries in the correction model. For instance, the observed high densities (>3,300 kg/m³) directly beneath the Moho (dark blue colors in Figures 9e and 10e), could be either an indication for increased upper mantle densities, or an indication of a slightly thinner crust. The seismic velocity models by Fromm et al. (2017) and Planert et al. (2017) (black lines in the mentioned figures) image indeed a shallower Moho, which points to the latter.
The observed resistivity decrease below 80-100 km is only poorly resolved, yet these depths are in good agreement with the proposed lithosphere-asthenosphere boundary in this region (Fishwick, 2010;Maystrenko et al., 2013). The location of the vertical conductor C1 along profile P3 in the joint data inversion (JI2) coincides with the identified change in crustal thickness and composition, however, the reference studies do not reveal velocity or density anomalies. This strong vertical resistivity feature is also imprinted through cross-gradient coupling onto the density model, although previous gravity studies have not observed a density anomaly. The large elongated residual along the Florianopolis Fracture Zone (Figure 8c) shows that this density contrast generates a rather poor gravity data fit, which is why we believe it is an artifact caused by the cross-gradient coupling. For this feature, the large coupling weight κ enforces a structural similarity in spite of a decreasing data fit. Generally, many models can fit the gravity data making gravity inversion with cross-gradient coupling a rather weak constraint for MT data in joint inversion. However, this example demonstrates the power of cross-gradient coupling, as the structural constraint may lead to a strong imprint of resistivity structures on the inverted density anomaly model and a backward enhancement in the resistivity model like the slightly enhanced outline of conductor C1 in Figure 10d compared to Figure 10a.
An alternative scheme of incorporating density data in MT inversion is coupling a fixed structural density model to the MT data inversion (JI1). The resistivity model resulting from this approach shows some significant model differences compared to the MT-only inversion model, while reaching an almost identical MT data fit. The increased roughness in the resulting resistivity model is clearly visible in the vertical slices in Figures 9b and 10b, and may be explained by the sharp boundaries in the blocky density cross-model (Figures 9 and 10c). Higher roughness is also evident in the increased regularization term of the objective function compared to MT-only or joint data inversion JI2 (see Figure 5). For the sedimentary layer, the interfaces introduced in the shallow conductors along profile P100 at ∼3-5 km depth (Figure 9b) are in accordance with the sediment thickness derived from seismic imaging by Fromm et al. (2017) and Goslin et al. (1974). Seismic imaging has a very good resolution of structural boundaries and it is unlikely that the seismic velocity used for depth conversion of the reflector is very far off. Thus, the shallower seismically derived depth to basement seems more realistic than the depth derived from the MT models. As the sensitivity tests described above showed, the MT data cannot resolve the thickness and conductivity of the shallow sediments independently, but only its conductance, that is, the thickness conductivity product. Thus, the resistivity model can be matched to seismic models by decreasing the sediment layer thickness while increasing its conductivity. However, the values required to match the sediment layer thickness imaged by seismic data would require some areas to have resistivities as low as 1 to 2 Ωm down to a depth of 5 km. Such average low resistivities are difficult to conceive for old thick sedimentary basins. In this structurally constrained inversion JI1, an additional conductor is placed within the basement underneath the sediments to account for the higher conductance required by the MT data. This could indicate the presence of thin, conductive layers in the upper crust, not resolved by gravity and seismic methods. Alternatively, initial smoothing followed by cross-gradient driven introduction of boundaries and the MT method's sensitivity of conductance could lead to artifacts below the actual sediment base. As the constraint-model has no spatial parameter gradient below the sediment base (constant crustal density of 2,800 kg/m³ down to the next gradient at the transition to high-density lower crust), MT inversion may easily introduce artifacts here to compensate for the changes made at the constraint-model's structural boundaries. Along profile P3, the areas of increased thickness in the conductive layer correlate with seismic observations of seaward dipping reflections (conductors C3, Figure 10b). Therefore, the conductive incisions C3 in the continuous sediment layer south of Walvis Ridge may be caused by inter-layered magmatic flows and sediments. The high resistivity values directly beneath those anomalies might indicate former pathways for magmatic material, which erupted episodically to form the alternating magmatic-sedimentary sequences (Elliott et al., 2009;Planke et al., 2000). However, synthetic inversions of the survey setup ( Figure S3, Text S1) indicate, that these conductive incisions may be an artifact introduced through smoothing in areas with larger station spacing. The highest resistivity values associated with magmatic underplating are further seaward (Figure 9b, resistor R). However, as a clear distinction of resistivities above ∼2,000 Ωm is difficult, this apparent shift may be artificial. The structural cross model also imprints the horizontal Moho boundary onto the resistivity model (anomaly R, Figures 9b and 10b). However, the variation in resistivity is rather small, and according to our sensitivity test g) a strong resistivity contrast at the seismic Moho is not compatible with the MT data. The smaller off-axis horizontal extent of this resistor, that is, high values are confined to areas close to MT stations, cp. Figure 11b, contradicts seismic observations of high-velocity magmatic underplating all along the Namibian margin (Bauer et al., 2000;Gladczenko et al., 1998). The sensitivity tests h) and i) indicate limited resolution off-profile axis. Therefore, we conclude that cross-gradient coupling of the fixed structural cross-model used in this approach, causes this restriction due to the model's large blocks with constant density. Within these blocks, no gradients are enforced by cross-gradient coupling, and the resistivity inversion model remains mostly at starting model conditions where MT sensitivity is reduced. The observed weakening of anomaly C1, associated with the Florianopolis Fracture Zone challenges the existence of the great vertical extent of this feature in the MT-only and JI2 models. The presence of the strong low-resistivity anomaly in this region and the absence of a matching velocity or density anomaly is enigmatic. One possible explanation may be a bias in the data of the MT stations sensitive to the anomaly or insufficient bathymetric resolution to account for the strong topography in the area. Key and Constable (2011) and Worzewski et al. (2012) describe the distortion of MT response by an adjacent coast. García et al. (2015) observe similar MT response behavior linked to sediment cover thickness. We propose that a similar effect, that is, the sudden change in water depth and sediment thickness, may be responsible for the magnitude of the conductive anomaly. This "coast effect" is manifested in the measured data in form of a strong apparent resistivity cusp and phase jumps. Forward model responses of our simple starting model (including bathymetry and sediment cover) cannot account for these strong effects, which forces the inversion to include strong anomalies in the model. The synthetic inversions shown in Figure S3 and Text S1 suggest a substantial influence of the sudden change in crustal thickness and structure. In the top view (Figure 11), similar small scale, deep conductors are also visible close to stations 10 (just south of the profile crossing), and 23 (southern most), where lithological structure or water depth also change abruptly. Both of these features are significantly less pronounced in the density model constrained resistivity model (Figure 11b), supporting the assumption of a "coast effect" artifact, which is effectively suppressed by the coupling with a fixed structural cross-model. An alternative explanation for conductor C1 could be that a geological feature in this region is only associated with a resistivity anomaly, but not with a seismic or gravity anomaly. Such a process could be a difference in thermal subsidence between Walvis Ridge and the adjoining oceanic crust in the North creating fractures and allowing seawater to enter deep into the crust at the fracture zone. The already described sensitivity tests demonstrate that shallow low resistivities are needed for a reasonable MT data fit, but that those low values do not necessarily have to reach deep into the mantle but may be confined to the top 15-20 kilometers only. Crustal alteration down to this depth would be conceivable (Prigent et al., 2020;Zhang & Sagiya, 2018).
The last inversion approach, JI3, in which the fixed velocity model by Fromm et al. (2017) was used as the cross-model for a cross-gradient constrained inversion, differs from the previous coupling strategies because we performed a "quasi-2D" inversion on a narrow cube along profile P100 that strikes perpendicular to the coast. We conducted this coupling on the one hand to test the influence of different types of cross-models (i.e., blocky density vs. smooth gradient velocity), and on the other hand, to explore whether we can find an MT model which fits those seismic observations. However, the different model scope and input data for the inversions make a direct comparison of the RMS data misfit impossible. To evaluate the misfit anyhow, we cut the central part around profile P100 from the other three 3D inversion models and calculated the misfit for these "quasi-2D" models and the data of these 23 stations only. The data misfits for these MT-only, density constrained (JI1), and joint data inversion (JI2) resistivity models are all approximately 4.4, while the MT-only inversion for the narrow model reaches a minimum misfit of 3.84. We therefore conclude that the RMS of 4.03 that was reached with JI3, is acceptable. The discrepancy of ∼0.2 to the "quasi-2D" MT-only inversion indicates that this form of coupling is the strongest constraint used in our study, because we could not reach the same low misfit attained by unconstrained inversion. The velocity model may not change during inversion and since it has fewer areas of constant value (compared to the density cross model in JI1), the MT solution space is confined most. Model roughness is significantly reduced compared to the density model-coupled inversion. This increased smoothness originates from the nature of the gradient velocity model, which has fewer strong boundaries. However, the cross-model's strong gradients at the sediment-crust interface still introduce some patchiness in the upper part of the resistivity model. Just as observed in the resistivity model of JI1, the introduced boundaries in the shallow conductors indicate thinner sediment cover compared to the MT-only resistivity model (anomaly C2 in Figure 9f). They are in good agreement with seismic observations and robust according to our sensitivity analysis. At the seismically imaged Moho depth, the resistivity model shows no variation, which may be explained by the smaller velocity gradient between crust and mantle, compared to the strong gradients in the upper part of the section which corresponds to a weaker cross-gradient coupling. Based on our sensitivity test d) (reducing the resistor's thickness), we conclude that the resistivity structure does not change a lot at Moho depth, while seismic reflections clearly show a change in acoustic impedance. The weak cross-gradient coupling at this interface therefore serves our inversion results, because it does not enforce a false resistivity contrast.

Conclusion
The cross-gradient coupled inversion with a structural density model constraint (JI1) works best for the Namibia and Walvis Ridge MT data inversion. This inversion draws on the benefits from gravity modeling and includes a significant amount of information from seismic results, both directly and indirectly because Maystrenko et al. (2013) used seismic models to build their initial density model. This joint inversion results in geologically valuable model modifications such as thinner conductive sediment cover or intra-basement conductors, and a less pronounced impact of the Florianopolis Fracture Zone on electrical resistivity. Importantly, the resulting model also has a higher confidence because it is constrained by several independently derived models, while reaching almost the same misfit between the forward modeled data and the observed data.
In our study, combining gravity and MT inversions with cross-gradient coupling (JI2) results in only small modifications of the final resistivity model that will not change the geological interpretation. The large solution space of gravity data inversion cannot sufficiently constrain our MT inversion. The density model shows modifications mainly on model features, which we mistrust (i.e., up to 10 km deep sedimentary basins and density variations along a fracture zone reaching 150 km deep into the mantle). Direct parameter coupling may improve the joint MT-gravity data inversion in the future, but defining precise resistivity-density relationships in this large, complex area appears extremely difficult.
The use of a gradual velocity model rather than a blocky density model for cross-gradient constrained inversions (JI3) is promising, because the inversion resistivity model is also based on smooth gradients, thus model roughness is decreased in the resulting resistivity model. This fixed-model joint inversion poses the strongest constraint in our study. However, the assumption of a "quasi-2D" inversion is not ideal and the low misfits of a full 3D inversion cannot be reached.
All three performed joint inversions will improve geological interpretations planned for the future, by providing correlations between electrical resistivity, density, and seismic velocity. Current investigations indicate, that the joint inversion results help to distinguish between different geological units linked to different continental breakup phases. Analysis of the compiled parameter relationships hints at a differentiation between rifted continental crust, stretched and periodically intruded transitional crust south of Walvis Ridge, massively underplated transitional crust along Walvis Ridge, and an approximately normal oceanic crust north of the ridge. These considerations will be the focus of a subsequent article.

Data Availability Statement
The marine magnetotelluric data set for this research is available in the PANGAEA data repository under https:// doi.org/10.1594/PANGAEA.931440 (Jegen et al., 2021). Satellite gravity data is available from International Centre for Global Earth Models (ICGEM) (Ince et al., 2019). The joint inversion software package jif3D (Moorkamp et al., 2011) is available with the subversion system (SVN) from http://svn.code.sf.net/p/jif3d/jif-3dsvn/trunk/jif3D/. For this research, revision 1059 was used.