SASSIER22: Full‐Waveform Tomography of the Eastern Indonesian Region That Includes Topography, Bathymetry, and the Fluid Ocean

We present a new 3‐D seismic structural model of the eastern Indonesian region and its surroundings from full‐waveform inversion (FWI) that exploits seismic data filtered at periods between 15–150 s. SASSY21—a recent 3‐D FWI tomographic model of Southeast Asia—is used as a starting model, and our study region is characterized by particularly good data coverage, which facilitates a more refined image. We use the spectral‐element solver Salvus to determine the full 3‐D wavefield, accounting for the fluid ocean explicitly by solving a coupled system of acoustic and elastic wave equations. This is computationally more expensive but allows seismic waves within the water layer to be simulated, which becomes important for periods ≤20 s. We investigate path‐dependent effects of surface elevation (topography and bathymetry) and the fluid ocean on synthetic waveforms, and compare our final model to the tomographic result obtained with the frequently used ocean loading approximation. Furthermore, we highlight some of the key features of our final model—SASSIER22—after 34 L‐BFGS iterations, which reveals detailed anomalies down to the mantle transition zone, including a convergent double‐subduction zone along the southern segment of the Philippine Trench, which was not evident in the starting model. A more detailed illumination of the slab beneath the North Sulawesi Trench reveals a pronounced positive wavespeed anomaly down to 200 km depth, consistent with the maximum depth of seismicity, and a more diffuse but aseismic positive wavespeed anomaly that continues to the 410 km discontinuity.

The northeastern part of our study region has long been a point of discussion due to the complex interplay between multiple subduction zones. Convergent double subduction around the Philippine Trench has previously been suggested (e.g., Lagmay et al., 2009;Rangin, 1991), but is neither evident in the subduction zone geometry model Slab2 (Hayes et al., 2018) nor the plate tectonic boundary model by Bird (2003). Furthermore, slab penetration beneath the Philippine Trench remains enigmatic; while seismicity is observed down to ∼200 km depth, various slab depths of extensions between 100 and 400 km have been proposed (e.g., Hall & Spakman, 2015;Wu et al., 2016). Further south, a divergent double subduction zone with an inverted U-shape in the Molucca Sea has been imaged in P-wave tomographic studies (Amaru, 2007;Obayashi et al., 2013) but its extension to the north remains unclear.
In such highly heterogeneous regions, full-waveform inversion (FWI) is a particularly suitable imaging method since it embraces the full complexity of seismic wave propagation, by accurately solving the 3-D seismic wave equation numerically. Thus, it can account for the effects of anisotropy, anelasticity and full-wave effects, which can hamper other seismological methods. Furthermore, crust and mantle structure are jointly imaged as a result of the computation of 3-D sensitivity kernels (e.g., Bozdağ & Trampert, 2008). While the mathematical underpinnings of FWI have been known since the 1980s (e.g., Tarantola, 1984), its application only became feasible in the last decade and a half due to its high computational cost, which is significantly greater than more traditional imaging methods.
Most commonly, FWI is implemented as a gradient-based iterative optimization method, where an initial model is updated based on the difference between synthetic and observed waveforms. Thus, its success is strongly dependent on being able to produce realistic synthetics from potentially complex models of the Earth's interior. In particular, topography, bathymetry, and the ocean can have a significant effect on ground motion; for example, surface topography can lead to seismic wave scattering (e.g., Lee et al., 2008) and the fluid ocean elongates the Rayleigh surface wave train (e.g., Fernando et al., 2020;Todoriki et al., 2016). However, these effects are not routinely accounted for in surface wave and waveform tomography, largely due to the need for sophisticated meshing techniques, plus the added computational burden associated with simulating waves in the low-velocity fluid ocean.
Instead, one common approach for dealing with this issue is to approximate the fluid ocean by the weight of its water column ("ocean load," e.g., Lei et al., 2020;Rodgers et al., 2022) but this is only a valid assumption when the minimum wavelength considered is long compared to the water layer thickness (Komatitsch & Tromp, 2002;Zhou et al., 2016). Several studies suggest that the effects of the fluid ocean layer become noticeable at minimum periods between 10 and 20 s (e.g., An et al., 2017;Fernando et al., 2020;Yue et al., 2017). Fernando et al. (2020) include a global ocean layer of constant thickness in AxiSEM3D (Leng et al., 2016(Leng et al., , 2019Nissen-Meyer et al., 2014) in order to simulate water-depth phases at shorter periods and show that this improves the waveform match for island stations. However, they emphasize that AxiSEM3D currently does not support irregularly patched oceans and thus, path-dependent effects remain unclear. Here, we use the spectral-element wave propagation solver Salvus  to explicitly define deep oceans with acoustic elements, and account for realistic variations in surface elevation (on-shore and off-shore) using the global relief model EARTH2014 (Hirt & Rexer, 2015).
This paper presents a tomographic study of the lithosphere and underlying mantle beneath eastern Indonesia and its surroundings, which compared to other regions of Southeast Asia like the Sunda Arc, is relatively Map showing the study region (∼2,800 × 2,500 km), including the source-receiver distribution, for the eastern Indonesian full-waveform tomography. Publicly available stations are denoted by red triangles, stations with restricted access by blue triangles, and events by white circles. Note that only stations used throughout the inversion are shown. The maximum source-receiver distance is ∼2,800 km. Orange symbols indicate volcanoes with eruptions during the Holocene (the past ∼12,000 years) as taken from the Global Volcanism Program (Venzke, 2013). Plate tectonic boundaries (red lines) are taken from Bird (2003). Topographic and bathymetric variations are extracted from ETOPO1 (Amante & Eakins, 2008). Further information on the mesh generation is provided in Section 5 and in Supporting Information S1. poorly understood. The final model, SASSIER22, is defined by both P-wave and S-wave velocity structure, and constrained by inversion of both body and surface waveforms. This is achieved through the application of FWI to a region with good data coverage, which facilitates a more refined image. This paper focuses on the process and outcomes of accommodating surface topography, bathymetry, and the fluid ocean explicitly at periods of 15-150 s, and we compare the effect of a fluid ocean layer and the commonly used ocean loading on the tomographic results. Furthermore, we discuss some of the key features of the tomographic model, including a detailed comparison with other recent tomographic models of the region.

Tectonic Setting
The active tectonic setting of Southeast Asia gives rise to significant natural hazards, as evidenced by the 2018 Palu earthquake, which unexpectedly triggered a destructive tsunami (e.g., Socquet et al., 2019). This is mainly a result of its location on the junction of three primary tectonic plates: the (Indo-)Australian, Eurasian, and Philippine Sea plates. The major subduction zones occur along the Philippines in the east as well as the Indonesian volcanic arc in the south, which feature slabs descending at rates between 5 and 10 cm per year (e.g., Simons et al., 2007). Our study region comprises the eastern part of Southeast Asia (see Figure 1), which is further complicated by a system of minor tectonic plates. The region comprises several subduction zones with a diverse range of lengths, ages, and geometries, including the spectacular 180° Banda Arc, which is driven by the northward movement of the Australian plate (e.g., Audley-Charles, 1968;Carter et al., 1976;Harris, 2011). Subduction in the Banda Sea region began in the Miocene (∼20 Myr, e.g., Spakman & Hall, 2010) and most recently, the associated slab was imaged as a continuous, single but deformed slab by Wehner et al. (2022).
The west of our study region is largely comprised of the promontory of the Eurasian Plate-often referred to as the Sundaland block-which includes Borneo. The submarine portion consists of a shallow, continental shelf that is not significantly elevated with depths considerably less than 200 m and includes a large number of thick Cenozoic sedimentary basins (Hall & Morley, 2004). To the east of the Sundaland block, the k-shaped island of Sulawesi in the center of our study area is believed to have formed from several fragments sourced from the continental promontory of the Eurasian Plate and Australia ∼20 Myr ago (Hall, 2011;Katili, 1978). The North Sulawesi Trench stretches along the northern arm of the island, where southward subduction of the Celebes Sea was initiated ∼9 Myr ago (Hall, 2011). The slab supports earthquakes to a depth of 200 km (Hayes et al., 2018), which, given their dip, suggests at least 300 km of subduction (e.g., Song et al., 2022).
Subduction along the northeastern arm of Sulawesi and around Halmahera is associated with a divergent double subduction zone in the Molucca Sea. Hall (2012) suggests that the Molucca Sea was formed ∼20 Myr ago and became narrower until the present-day arc-arc collision (e.g., Soesoo et al., 1997) formed around ∼5 Myr ago. Previous models suggest that the western limb penetrates into the lower mantle, while the eastern limb extends to ∼100-150 km depth (Amaru, 2007;Fan & Zhao, 2018;Obayashi et al., 2013;Wu et al., 2016), which is consistent with the seismicity distribution. However, it remains unclear how far the double subduction extends to the north.
Further north, the westward subduction of the Philippine Sea plate generates events down to ∼200 km depth beneath the north-south trending Philippine Trench. The age of the subducting slab was inferred as >45 Myr from geochemical measurements (Ishizuka et al., 2013;Wu et al., 2016), while Fan and Zhao (2018) suggest subduction was initiated ∼20-25 Myr ago based on the maximum depth of the slab in their P-wave tomography study. Based on seismicity and previous seismic tomography models, Wu et al. (2016) suggest that the slab extends to depths of ∼230-400 km beneath the Philippine Trench. Hall and Spakman (2015), however, state that there is little evidence of a deep slab in tomography and propose a shorter slab length of slightly over ∼100 km, which is consistent with an absence of volcanoes along the trench (see Figure 1). Convergent double subduction of eastward dipping slab(s) from the Celebes Sea and the westward subduction of the Philippine Sea plate have previously been suggested (e.g., Lagmay et al., 2009;Rangin, 1991), but is neither evident in the slab model Slab2 (Hayes et al., 2018) nor the plate tectonic boundary model by Bird (2003). A more detailed tectonic map for this part of the study region can be found in Supporting Information S1.
In summary, the study region provides a unique setting to investigate a variety of primary tectonic processes such as ongoing subduction and both arc-arc and arc-continental collision. In particular, we lack constraints in key regions such as the poorly imaged island of Sulawesi, the transition from a divergent to convergent double-subduction zone south of the Philippines and the oblique subduction around the Banda Arc.

Methodology
While most tomographic methods are based on seismic ray theory, FWIsometimes referred to as adjoint waveform tomography when the sensitivity kernels are computed using adjoint methods-embraces the full complexity of seismic wave propagation by numerically solving the partial differential equation that governs 3-D seismic wave propagation. This makes the method much more computationally expensive than ray-based imaging methods; the computational cost of adjoint waveform tomography is independent from the number of receivers but scales linearly with the number of events used and to the fourth power of the highest frequency in 3-D (e.g., Tromp, 2020). This has traditionally limited its usefulness, but ongoing access to more powerful computing resources has enabled its successful application in earthquake seismology across the scales (e.g., Blom et al., 2020;M. Chen et al., 2015;Chow et al., 2022;Fichtner et al., 2009;Gao et al., 2021;Lei et al., 2020;Rodgers et al., 2022;Tape et al., 2010).
FWI is a non-linear imaging method and most commonly, it is implemented as a gradient-based optimization problem, where a seismic model is updated iteratively in order to minimize the difference between observed and predicted waveforms (see Figure 2). First, synthetic seismograms are obtained from a suitable initial Earth model for a set of specified sources by solving the 3-D seismic wave equation numerically (e.g., Igel, 2017). The synthetic waveforms are compared to the observed data using a suitable misfit measure, which can be formulated in the time domain, frequency domain or both (e.g., Yuan et al., 2020). Then, the gradient of the misfit function is constructed using adjoint techniques (e.g., Tromp et al., 2005) and a gradient-based optimization scheme is used to update the initial model in order to reduce the waveform misfit. This process is iterated until the waveform match is deemed sufficient.
FWI makes use of the information contained in the amplitudes and phases of, in theory, entire seismograms to recover the seismic structure of the Earth. There is no need to identify seismic phases (such as P-or S-wave arrivals) since FWI effectively deals with interfering phases by using information contained in the full wavefield, coherently incorporating body and surface waveforms. The method can account for effects such as wavefront healing, scattering, interference, and (de)focusing, which are not accurately modeled with ray theory (e.g., Rickers et al., 2012). As a result, FWI promises high-resolution images and a more reliable quantification of anomalies. A more comprehensive introduction to FWI is provided in Tromp (2020).

Inversion Setup
In this study, we focus on a subregion of Southeast Asia with good data coverage, which encompasses the regions around the Banda Arc, Borneo, Sulawesi, and the southern segment of the Philippine Trench. The domain extension is shown in Figure 1 and comprises an area of approximately 2,800 km in the north-south direction, 2,500 km in the east-west direction and 650 km in the vertical direction. We use seismic data filtered at periods from 15 to 150 s and run two inversions: one accounting for the fluid ocean explicitly and one with the frequently used ocean loading approximation, which will be discussed in more detail in Section 5.
We use the Salvus software package  for the mesh generation, forward and adjoint simulations and non-linear optimization, within its integrated workflow. The full 3-D wavefield through an anelastic Earth is computed using Salvus' built-in spectral-element wave propagation solver. The inversion parameters are restricted to those well-constrained by the intermediate-period waveform data, that is, isotropic P-wave velocity (v P ) and radially anisotropic S-wave velocity (v SH and v SV ). We also invert for 3-D density (ρ) variations in order to avoid artifacts (Blom et al., 2017). Full-waveform inversion workflow schematic. This process is iterated until the waveform match is deemed sufficient according to some criteria. Most often, a multi-scale approach (Bunks et al., 1995) is applied in which longest periods are inverted for first and higher frequency content is successively added.
Throughout this study, we closely follow the inversion setup described in Wehner et al. (2022) as summarized in Table 1. Waveform differences are quantified using time-frequency phase misfits (Fichtner et al., 2008) that make use of phase and relative amplitude information. Before the model update is computed, the gradient is preconditioned by removing large sensitivities around source regions and applying an anisotropic smoothing operator. This avoids the appearance of artifacts in the tomography and can accelerate convergence (e.g., Liu et al., 2022;Modrak & Tromp, 2016). The model is updated using an L-BFGS optimization scheme (Nocedal & Wright, 2006), which means the descent direction is determined by the negative gradient, preconditioned by the approximation of the inverse Hessian. This allows us to obtain curvature information on the misfit landscape and is generally regarded as efficient for FWI applications (e.g., Modrak & Tromp, 2016). The step length is determined by solving the trust-region subproblem (Conn et al., 2000), where the misfit function is locally approximated and the step length is automatically adjusted based on the quality of the approximation that was observed in the previous iterations. Table 2 presents an overview of the technical parameters used in the multi-scale approach (Bunks et al., 1995) employed in this study, which involves a gradual increase in the shorter-period content as the iterations progress (from 20-150 s to 15-150 s). This avoids cycle skips and mitigates against the risk of entrapment in local minima-a well-known problem of all gradient-based optimization methods. We use 1.2 elements per minimum wavelength for all simulations and our mesh interpolation between period bands accounts for 3-D variations introduced throughout the inversion.

Starting Model
We use SASSY21 (Wehner et al., 2022)-a recent 3-D full-waveform tomographic model of Southeast Asia-as a starting model, which was obtained using seismic data filtered in the period band from 20-150 s. Surface topography and bathymetry were accounted for using Earth2014 (Hirt & Rexer, 2015) and the fluid ocean was approximated by the weight of its water column (see Section 5) to obtain SASSY21. Starting model SASSY21 (Wehner et al., 2022) Topography and bathymetry Earth2014 (Hirt & Rexer, 2015) Frequency-dependent attenuation Approximated with five linear solids (e.g. Afanasiev et al., 2019;van Driel & Nissen-Meyer, 2014) Misfit function to quantify waveform differences Time-frequency phase misfits (Fichtner et al., 2008) Data selection Manual, accounting for body-wave arrivals separately

Optimization scheme
Trust-region based L-BFGS (e.g. Conn et al., 2000) Inversion parameters v P , v SH , v SV , and ρ Note. For a more detailed explanation, see Wehner et al. (2022). Note. The compute time per event in minutes in the last column is obtained by using the values in this table, but does not include storing wavefield snapshots for the adjoint simulation (Anderson et al., 2012). Note that absolute run times are indicative since a simulation with exactly the same settings can have very different run times depending on the specifications of the computer it was executed on. The wavefield simulation was run in parallel on up to 56 cores for the actual inversion.

Data Availability and Selection
To date, data from only a relatively small proportion of permanent network stations in Southeast Asia have been made publicly available, which were downloaded using obspyDMT (Hosseini & Sigloch, 2017). However, we have been able to include data from several networks with restricted access, resulting in an unprecedented data set that comprises recordings from 255 stations within the study region (see Figure 1). In addition to the stations used to obtain the starting model SASSY21, data from two recent deployments with restricted access on Sulawesi and in the Celebes Sea were added, including 11 ocean-bottom seismometers (Rawlinson et al., 2020).
Our event selection process aims to produce as uniform a distribution of earthquakes throughout the study region as possible. At the same time, we avoid events potentially affected by interference with other events, mitigate finite-source effects by removing large-magnitude earthquakes and review source time functions using SCAR-DEC by Vallée et al. (2011), since we find that long or suspicious source time functions correlate with large event misfits throughout the inversion. Furthermore, recent events are preferred due to the deployment of temporary networks within the region, which increases the number of recordings and thus, enhances the efficiency of the adjoint-based inversion (see Section 3). In order to balance the effect of dense networks, we implement a geographical station weighting following Ruan et al. (2019), which means that a station is assigned a larger weight throughout the inversion if it has few nearby stations, and vice versa. This helps to remove an imprint of the data coverage from the gradient, as presented in more detail in Supporting Information S1.
The final event catalog for this study contains 67 earthquakes (see Figure 1) of magnitude 5.4 ≤ M w ≤ 6.0, which occurred between April 2014 and December 2020. It should be noted that only 20 out of the 67 events were used in the construction of SASSY21. Event locations and moment tensors are taken from the GCMT catalog (Ekström et al., 2012) and remain constant throughout the inversion. A detailed overview of the selected events, available stations and how waveform data was accessed is available in Supporting Information S1.
In FWI, we aim to invert three-component seismograms, but noisy waveform segments and cycle skips can contaminate the tomography and need to be removed. Depending on the chosen misfit function, it is common practice to define sections of a seismogram that are suitable for waveform comparison. Several toolboxes have been published that aim to automate the selection of suitable measurement windows (e.g., Krischer et al., 2015;Maggi et al., 2009), but there is still a trade-off between including as much signal as possible and avoiding noisy data. Here, we manually select these windows in order to 1. Avoid noisy portions of the seismogram. 2. Exploit as many waveforms as possible. 3. Optimize for depth sensitivity by accounting for small-amplitude body-wave arrivals separately. Note that windows around body wave arrivals are not treated differently from windows around surface wave arrivals.
More information on the data selection, including an example, can be found in the Supporting Information of Wehner et al. (2022).
In summary, we use ∼6,500 windows gathered from ∼7,100 unique source-receiver pairs resulting in >200 hr of analyzed waveform data per period band for the fluid ocean inversion (see Table 3). For the ocean load inversion, we use the same source-receiver pairs but the analyzed window length is ∼22% shorter due to the match between predicted and observed surface wave trains being of shorter duration on many components. Figure 3. Comparison between the starting and final models of this study, and various regional and global tomographic models for a north-south cross-section through Sulawesi and the Indonesian volcanic arc. An overview of the different models can be found in Supporting Information S1. Note that some models are v P (bottom two panels) and some are v S and thus, a direct comparison is not always warranted. Perturbations are in % relative to the depth-average within the region. The limits of the colorscale X are shown in the lower left corner of each plot. Red lines in the sections indicate slab depths taken from Slab2 (Hayes et al., 2018), which is derived from the distribution of earthquake locations in this region. Note. We use the same source-receiver pairs for the ocean load inversion.

Accounting for Surface Topography, Bathymetry, and the Fluid Ocean
We aim to produce realistic synthetic waveforms that properly account for model complexities that might otherwise manifest as artifacts in our inversion results. However, surface topography, bathymetry, and the effect of the fluid ocean are not routinely accounted for in surface wave and adjoint waveform tomography, mainly due to the challenge of incorporating them in finite-difference methods and the added computational burden required to simulate waves in the low-velocity fluid ocean. The spectral-element method provides greater flexibility than conventional finite-difference methods in terms of meshing and thus, inherits a natural ability to handle complex topographic media and fluid-solid boundaries, such as the ocean-crust boundary (e.g., Komatitsch & Tromp, 1999;Komatitsch & Vilotte, 1998).
Shorter period seismic waves are expected to be particularly sensitive to variations in surface elevation and the fluid ocean due to increased sensitivity to the upper few kilometers; consequently, we investigate their frequency-dependent effects on synthetic waveforms. Furthermore, we explore path-dependent effects, for example, we would expect source-receiver pairs lying on either side of oceanic regions to produce waveforms that are more sensitive to an ocean layer compared to source-receiver pairs that are not separated by significant regions of ocean.

Surface Topography and Bathymetry
Large topographic contrasts are present in the study region including several mountain ranges, with the tallest peak being Mount Kinabalu (4,095 m) in northern Borneo, Malaysia. At the opposite end of the scale, the deepest point within the region-the so-called Galathea Depth-is located along the Philippine Trench at 10,540 m below sea level. Throughout this study, surface topography and bathymetry are implemented using the global relief model Earth2014 (Hirt & Rexer, 2015) by distorting the upper layer of elements, accommodating the highest frequency considered within each period band (see Supporting Information S1 for further information). This requires sophisticated meshing techniques but does not result in a notable increase in computational cost. For this reason, surface topography and bathymetry were implemented across all period bands to obtain the starting model SASSY21 (see Section 4.1) and are also included in this study. Figure 4 presents the frequency-dependent effect of surface topography and bathymetry on synthetic waveforms. For this simulation setup, we observe that implementing surface elevation can result in a considerable phase advance and change in amplitude of the surface wave train at periods ≤20 s. Further synthetic tests show that both horizontal components and the vertical component are likewise affected, and that the effect is stronger for shallow events and source-receiver paths passing through oceanic regions for this simulation setup. This is not surprising as bathymetry exhibits larger variations than topography in Southeast Asia, and we thus observe a stronger effect from ocean floor profiles than mountain ranges. Further synthetic tests and several waveform comparisons at 15 s are presented in Supporting Information S1.

The Fluid Ocean
For spectral-element methods, the free-surface boundary condition is implicitly taken into account (e.g., Komatitsch & Vilotte, 1998), which implies that the ocean is ignored entirely. For our purposes, two options to account for the ocean are available: 1. The ocean can be approximated by the weight of its water column, which means the ocean does not need to be meshed explicitly and is implemented via a modification of the mass matrix. This accurately reproduces the effect of the ocean on the seismic wave, provided the thickness of the ocean layer is small compared to the minimum wavelength considered (Komatitsch & Tromp, 2002;Zhou et al., 2016). In the following, we will refer to this as "ocean loading" (sometimes referred to as the "water column approximation"). 2. The volume of the ocean can be replaced by acoustic elements to represent it as a fluid medium. This allows reverberated phases associated with the water layer to be simulated, but it requires solving a coupled system of the acoustic and elastic wave equations, which is computationally more expensive. In the following, we will refer to this as "fluid ocean." We use Salvus  to mesh the fluid ocean explicitly for this study, where we broaden the period band to 15-150 s. In fact, a combination of the methods described above is implemented, where deep ocean areas are modeled with acoustic elements, while the fluid ocean is replaced by an equivalent load when the ocean becomes too shallow (here: <1.5 km). The seismic properties of the fluid ocean are set to v P = 1,450 m/s, v SH = v SV = 0, and ρ = 1,020 kg/m 3 . A more detailed figure showing the distribution of elastic and acoustic mesh elements, including which parts of the domain are approximated by an equivalent ocean load, is provided in Supporting Information S1. Table 2 illustrates the computational cost associated with implementing an ocean load versus a fluid ocean for this simulation setup. Note how the number of mesh elements increases by <10% but the time step for the wavefield simulation is significantly reduced as a result of the low velocities introduced in the fluid ocean, thus leading to a considerable increase in computational cost. During the inversion, we do not allow for any updates in the fluid ocean and in the mesh element layer below to avoid potential artifacts in the gradient from the smoothing preconditioner (see Table 1) and the interface condition.
The ocean was approximated by an equivalent ocean load to obtain the starting model SASSY21 (see Section 4.1). We find this to be a valid approximation at the periods considered for SASSY21 (20-150 s) compared to explicitly modeling the fluid ocean, as demonstrated in Figure 4. At shorter periods, however, simulating the fluid ocean has a noticeable effect on the later parts of the seismogram (phase delay, amplitudes, and duration of surface wave train). We observe the strongest effect on the radial and vertical components, which suggests that the ocean's effects are confined to the source-receiver plane, as previously suggested by Fernando et al. (2020).
In order to investigate path-dependent effects of the fluid ocean, we compute full-trace time-frequency phase misfits for a synthetic source and receiver distribution as shown in Figure 5. While the misfit calculation is non-linear, misfit values tend to be higher at larger epicentral distances and near the Philippines, where strong bathymetry values and gradients are present. Note that we can already observe a path-dependent effect for the  (Dziewonski & Anderson, 1981) with the crust represented by CRUST1.0 (Laske et al., 2013). Vertical lines indicate predicted P-(blue) and S-wave (green) first arrival times obtained from the TauP toolkit (Crotwell et al., 1999) for PREM (Dziewonski & Anderson, 1981).
deep ocean near the Philippines at 50 s, even though there is only a minor difference in synthetic waveforms at these periods (see Figure 4), which would be below the noise level for any real-world application. At 15 s, we observe an increased sensitivity to other oceanic regions such as the Banda Sea and Celebes Sea. Further synthetic tests and several waveform comparisons at 15 s are presented in Supporting Information S1.

Results and Discussion
A total of 34 L-BFGS iterations divided over two period bands (15 iterations at 20-150 s and 19 iterations at 15-150 s) are carried out, accounting for the fluid ocean explicitly. In the following, we will refer to this model as SASSIER22. Furthermore, we ran 19 iterations at 15-150 s with the fluid ocean approximated by its weight ("ocean load") using the same source-receiver pairs, which allows for a direct comparison of the tomographic results. In this section, we present the inversion results, including a model validation, some key highlights of the final model SASSIER22, and a comparison with the model obtained from ocean loading.

Model Validation
Once a model has been determined, we would like to assess the robustness of the solution. However, obtaining reliable information on model uncertainty information remains an active area of research in adjoint waveform tomography (e.g., Liu et al., 2022) since most traditional approaches are currently computationally not feasible in FWI. Here, we pursue more data-driven approaches toward validating the final model as presented below. Furthermore, a synthetic checkerboard recovery test is presented in Supporting Information S1, which shows that we have good constraints around the Indonesian volcanic arc, Sulawesi and south of the Philippines, using the given data coverage and inversion setup described in this study.

Misfit Development
For both the fluid ocean and ocean loading, the waveform difference is successively reduced across all iterations but ocean loading shows a ∼10% higher misfit value, even though the analyzed window length is ∼22% less compared to the fluid ocean inversion. The overall misfit decreases by 39.8% and 34.0% for the entire data set for  (Fichtner et al., 2008) are normalized for each plot. Background model: PREM (Dziewonski & Anderson, 1981) with the crust represented by CRUST1.0 (Laske et al., 2013). Topographic and bathymetric variations are taken from ETOPO1 (Amante & Eakins, 2008).
the fluid ocean and ocean load, respectively, as shown in Figure 6. The misfit development is discussed in more detail in Supporting Information S1. Table 3 shows an increase in the window length per event as the period band is broadened, thus indicating that we are successively including more data per event as the iterations progress. For comparison, we use the automated data selection algorithm FLEXWIN (Maggi et al., 2009) to suggest windows for the starting and final model for both ocean load and ocean layer; FLEXWIN selects a similar total window length for the starting model for both ocean types, which reinforces that ocean loading is a valid approximation at 20 s. However, the window length for ocean layer increases by 63%, while the increase for ocean load is only 39% for the final model, indicating that the model with an ocean layer matches observed waveforms better at 15 s.

Misfit Increase per Parameter
For SASSIER22, we test each inversion parameter separately by replacing the corresponding parameter in the final model by the values of the starting model and evaluate the misfit increase. This shows that shear-wave velocity drives the inversion (v SV : +24% and v SH : +18%) but the updates in v P (+7%) and density (+4%) are also required to explain observed waveforms.

Validation Data Set
We test the validity of SASSIER22 for three events not used in the inversion as shown in Figure 7. This also shows that SASSIER22 is able to explain true-amplitude waveforms from network 7G (Passarelli et al., 2016)-see source-receiver pair #1 -, which only recently became available and was thus not used to obtain the starting model SASSY21 or SASSIER22. Note that the wavefield gets significantly more complex at these periods (compared to ∼30, 40 s) and we have only decreased the minimum period from 20 to 15 s.

Final Model: SASSIER22
The final tomographic model is obtained in a period band for which the wavefield is surface-wave dominated. However, P-wave velocity shows meaningful updates down to the mantle transition zone; for example, we observe higher velocities beneath the Celebes Sea and hints of subduction along the Indonesian volcanic arc, North Sulawesi Trench and Philippine Trench. Nonetheless, the P-wave model is not as clear as the S-wave model, which we largely attribute to the broader sensitivity kernels of P-waves. As a consequence, the following interpretation is based on shear-wave velocity v S , which is computed using the Voigt average: S = √ (2 2 + 2 ) ∕3 (e.g., Babuska & Cara, 1991;Panning & Romanowicz, 2006). The results for all inversion parameters are presented in Supporting Information S1. We also present radial anisotropy depth slices from 50 to 600 km but it should be noted that a detailed interpretation of radial anisotropy patterns is complicated by the differing sensitivities of Love and Rayleigh waves (e.g., Fichtner et al., 2013). The results show overall positive radial anisotropy in the lithosphere and negative radial anisotropy at greater depths, which is consistent with the discussion in Wehner et al. (2022). Figure 8 shows v S depth slices for the starting model SASSY21 and the final model SASSIER22. The depth slices at 25 and 50 km show that the final model is significantly more sensitive to shallower structure as a result of the shorter-period data considered. For instance, we can distinguish between high velocities arising from oceanic lithosphere (e.g., beneath the Celebes Sea) and low velocities arising from continental lithosphere (e.g., beneath Borneo). At greater depths (≥100 km), we observe significant updates around the Philippine Trench. Furthermore, the starting and final model agree on high velocities along the northern arm of Sulawesi and the Indonesian volcanic arc, including a hole at ∼300 km depth. Wehner et al. (2022) suggest the hole may be associated with the  (Fichtner et al., 2008) in the upper left corner of each trace. Vertical lines indicate predicted P-(blue) and S-wave (green) first arrival times obtained from the TauP toolkit (Crotwell et al., 1999) for PREM (Dziewonski & Anderson, 1981). pertusion of continental lithosphere from the northward moving Australian plate (e.g., Keep & Haig, 2010). The 180° curvature around the Banda Arc is still imaged as a single bent and deformed slab. Figure 3 presents a comparison between the starting and final models of this study, and other continental-and global-scale tomography models for a north-south cross-section through the North Sulawesi Trench. For the starting, fluid ocean and ocean load model considered in this study, we track a steeply dipping slab down to the mantle transition zone and not just 200 km as suggested by Slab2 (Hayes et al., 2018), which only uses the earthquake location distribution for this region to derive slab depths. The slab appears to have a small dip angle at shallower depths before subducting steeply from ∼100 km onwards, similar to a deep aseismic slab observed beneath Alaska (Gou et al., 2019). Aseismic slabs have been observed worldwide (e.g., Fan & Zhao, 2018;Gou et al., 2019;Huang et al., 2013) and have been associated with high temperatures, which facilitate a slab descent generating little seismicity (Huang et al., 2013). The deep subduction along the North Sulawesi Trench is observed from ∼120-123°E longitude, with further cross-sections provided in Supporting Information S1. Figure 9 presents a comparison between the starting and final models of this study, and other continental-and global-scale tomography models for a west-east cross-section through the southern end of the Philippine Trench. For SASSIER22, the convergent double-subduction zone in this region is visible, which extends from ∼1.5 to 6°N latitude and was not apparent in the starting model SASSY21. In particular, we observe a westward dipping slab beneath the southern end of the Philippine Trench and an eastward dipping slab that exhibits a reversal in subduction polarity from 100 km down to the mantle transition zone. Such a subduction polarity flip has previously been suggested for settings where two oceanic plates collide (Gasser et al., 2021;Su et al., 2019). Furthermore, P.-F. Chen et al. (2020) suggest an east-dipping slab from 80 to 300 km depth at 1-6°N latitude based on the ISC-EHB (Engdahl et al., 2020) earthquake distribution for this region. Other shear-wave models do not show a clear high velocity anomaly but a divergent double-subduction zone is visible in P-wave models (Amaru, 2007;Obayashi et al., 2013). Further south, we observe hints of this inverted U-shaped subduction zone from −0.5 to 1.5°N latitude, which is consistent with UUP07 (Amaru, 2007) and GAP-P4 (Obayashi et al., 2013). Further cross-sections from 0 to 8°N latitude showing the transition from a divergent to a convergent subduction system are presented in Supporting Information S1. Figure 8 presents a comparison of the tomographic models obtained from ocean loading versus ocean layer at iteration 19 and 34, respectively, for depth slices from 25 to 300 km. The main differences appear to be in the upper ∼50 km, with higher velocities obtained beneath the Celebes and Banda Sea for the fluid ocean model, as expected for oceanic lithosphere. However, Figure 9 shows that there are also significant differences at greater depths, as illustrated via a comparison of cross-sections through the convergent double-subduction zone around the southern segment of the Philippine Trench, which was not apparent in the starting model. Both ocean types are able to recover the double-subduction, but the slab associated with subduction along the Philippine Trench is clearer in SASSIER22.

The Effect of the Ocean on the Tomographic Model: Ocean Loading Versus Fluid Ocean
We mesh irregularly patched oceans (rather than a global ocean layer of constant thickness as in Fernando et al., 2020), which allows us to investigate waveforms along different source-receiver paths. Figure 10 presents several waveform matches for observed data with synthetics obtained from both ocean types. This shows that there is a minor difference between synthetic waveforms for a source-receiver path that mostly passes through a continental shelf (#3), while the synthetics from the final fluid ocean model are able to explain true-amplitude observed surface wave trains significantly better than synthetics from the final ocean loading model for source-receiver paths passing partially (#2) or entirely (#1) through oceanic regions. Note that the receiver for source-receiver pair #1 is an ocean-bottom seismometer.

Limitations
An obvious limitation of FWI is the high computational cost of the forward problem, which translates to the use of a smaller data set and relatively long periods compared to what is typically exploited in ray tomographic studies. Ideally, we would also invert for other physical parameters such as attenuation, more complex forms of anisotropy or source parameters as discussed in more detail in Wehner et al. (2022). However, this is a challenging task, mainly due to the difficulty in determining the optimal observables for constraining a specific parameter type and the lack of data constraints, in particular azimuthal data coverage. While we acknowledge the potential for source errors to map as artifacts in the tomographic result, Blom et al. (2022) investigate the issue of source parameter errors in adjoint waveform tomography and suggest that strong, local source effects can be mitigated by removing a near-source volume from the event gradient, as employed in this study (see Section 4).
Many ray-based tomography studies employ crustal corrections to account for the behavior of seismic waves in the upper few kilometers. Bozdağ and Trampert (2008) suggest that this is not necessary in FWI since crust and mantle are jointly imaged as a result of the computation of 3-D sensitivity kernels. Wehner et al. (2022) did not account for crustal structure explicitly to obtain the starting model for this study (SASSY21) and demonstrate that it is not meaningful to add a crustal model in hindsight since the final model has already recovered crustal information at the periods we are sensitive to. While implementing a crustal model can provide useful prior Figure 9. Comparison between the starting and final models of this study, and various regional and global tomographic models for an east-west cross-section through the Philippine Trench. An overview of the different models can be found in Supporting Information S1. Note that some models are v P (bottom two panels) and some are v S and thus, a direct comparison is not always warranted. Perturbations are in % relative to the depth-average within the region. The limits of the colorscale X are shown in the lower left corner of each plot. Red lines in the sections indicate slab depths taken from Slab2 (Hayes et al., 2018), which is derived from the distribution of earthquake locations in this region. The map in the top right corner shows the section's location and seismicity (M w > 5) taken from the ISC catalog (International Seismological Centre, 2016), colored by depth. The dark red box indicates the study area for SASSIER22. Figure 10. Left: Map of three earthquakes (M w 5.6-5.8). The white area indicates where the fluid ocean is >1.5 km deep and thus, meshed explicitly to obtain SASSIER22. Right: Vertical and horizontal seismogram traces showing the waveform match in the period band from 15 to 150 s between the observed waveforms (black), synthetics from the final model obtained using the ocean load approximation (blue) and synthetics from the final model that meshes the fluid ocean explicitly (SASSIER22, red) for the three source-receiver pairs shown on the left. Epicentral distances are given in the bottom left corner of each trace. Vertical lines indicate predicted P-(blue) and S-wave (green) first arrival times obtained from the TauP toolkit (Crotwell et al., 1999) for PREM (Dziewonski & Anderson, 1981). information, Southeast Asia is very complex and a comparison of CRUST1.0 (Laske et al., 2013) model predictions against 25 s Rayleigh wave group velocity maps indicate that the region has some of the largest misfits on the globe. For a more detailed discussion of this issue and corresponding synthetic tests, we refer the interested reader to Wehner et al. (2022).
We only investigate the effects of surface topography, bathymetry, and the fluid ocean for a Southeast Asian simulation setup at 15 s at a regional scale. Thus, the effects for other regions and different scales remain unclear, but our results are in good agreement with Fernando et al. (2020), who consider epicentral distances of up to 90° across the Pacific Ocean. While we only decrease the minimum period band from 20 to 15 s compared to SASSY21, we expect that the effects of surface elevation and the fluid ocean become even more significant at shorter periods and when running additional iterations.

Conclusions
We have imaged the lithosphere and underlying mantle beneath eastern Indonesia and its surroundings at periods between 15 and 150 s using multi-scale FWI, and explored the influence of surface elevation (topography and bathymetry) and the fluid ocean, which is usually ignored in other studies. We conclude that their effects on synthetic waveforms become important at periods ≤20 s. In particular, surface elevation can result in a considerable phase advance and change in amplitude of the surface wave train, and has an effect on both horizontal and the vertical seismogram components. The fluid ocean results in a phase delay as well as a change in amplitudes and duration of the surface wave train, and affects the radial and vertical components. At periods ≤20 s, accounting for the fluid ocean explicitly can lead to more realistic lithospheric velocities and a more refined image compared to the frequently used ocean load approximation, even at greater depths. Furthermore, it allows for an improved waveform match for source-receiver paths passing partially or entirely through oceanic regions. Our final model reveals a convergent double-subduction zone with a reversal of polarity of the western limb at depths ≥100 km around the southern segment of the Philippine Trench, which was not apparent in the starting model. A more detailed illumination of the slab beneath the North Sulawesi Trench subduction zone (∼120-123°E) reveals a pronounced positive wavespeed anomaly down to 200 km depth, consistent with the maximum depth of seismicity, and a more diffuse but aseismic positive wavespeed anomaly that continues to the 410 km discontinuity.