Enhancing regional seismic velocity model with higher-resolution local results using sparse dictionary learning

We use sparse dictionary learning to develop transformations between seismic velocity models of diﬀerent resolution and spatial extent. Starting with results in the common region of both models, the method can be used to enhance a regional lower-resolution model to match the style and resolution of local higher-resolution results while preserving its regional coverage. The method is demonstrated by applying it to two-dimensional Vs and three-dimensional VP and VS local and regional velocity models in southern California. The enhanced reconstructed models exhibit clear visual improvements, especially in the reconstructed VP/VS ratios, and better correlations with geological features. We demonstrate the improvements of the reconstructed model relative to the original velocity model by comparing waveform simulation results to observations. The improved ﬁtting to observed waveforms extends beyond the domain of the overlapping region. The developed dictionary learning approach provides physically interpretable results and oﬀers a powerful tool for additional applications for data enhancement in earth sciences.

and such procedures disregard useful information from the low-resolution results in the area covered by the high-resolution model.In the present study we develop a method for merging multi-scale velocity models by establishing a transformation between models of different resolutions.Specifically, we develop a transformation by comparing high-and low-resolution imaging results in a given region and then utilize this transformation to enhance the low-resolution model both in the overlapping region as well as outside it.
The developed methodology is inspired by progress made in machine learning algorithms for super-resolution (Dong et al., 2015) and style transfer (Gatys et al., 2016) of images.To merge seismic velocity models, we use a data-driven approach based on sparse dictionary learning (Mairal et al., 2009;Yang et al., 2008).Since the available amount of data is limited, we opt for this approach over Convolutional Neural Networks (CNN) that demand a large number of trainable parameters.Dictionary learning involves a linear decomposition of an input signal utilizing a small set of basis signals, referred to as atoms, which are learned from the data.This technique has achieved impressive outcomes in various image and video processing tasks.In seismology, dictionary learning was used for signal denoising (Beckouche & Ma, 2014) and for seismic tomography (Bianco & Gerstoft, 2018).Dictionary learning has the added benefits of being physically interpretable (in contrast to CNN), and better generalization capability than CNN when the data set is limited (Sulam et al., 2022).
In the following sections, we first introduce a framework for representing seismic velocity models using dictionaries and subsequently devise a method for developing a transformation between models that can enrich the lower-resolution model.As initial applications, we implement the technique on 2D and 3D local and regional V P and V S models in southern California with different resolution and spatial extent.We generate an enhanced reconstructed version of the regional community velocity model CVM-S4.26(Lee et al., 2014) utilizing a higher-resolution model around the San Jacinto fault zone (Fang et al., 2019), and demonstrate by comparing waveform simulations with observations that the reconstructed regional model outperforms the original version.We note that the goal of the paper is to present a new method for merging multi-scale velocity models, rather than to produce the best available V P and/or V S models in any given subregion.The derived multi-scale models can be improved by merging iteratively within the enhanced regional models additional local higher-resolution results.

Dictionary Representation of Velocity Models
Dictionary Learning involves identifying a sparse representation of input data in the form of a linear combination of basis atoms inferred from the input data (Mairal et al., 2009), and it provided important results in compressed sensing and signal recovery.In this section, we present a novel method for applications on merging and enhancing seismic velocity models.
Seismic velocity models are represented as 2D or 3D grids that contain information about P wave velocity V P and/ or S wave velocity V S at a set of specific spatial locations.Given a seismic velocity model   , we can utilize a sliding window which has the same dimension as the model to extract patches from it, resulting in N patches p 1 , …, p N that represent values inside subsections of the model (Figure 1a).To simplify the description, we flatten each patch into a 1D array with length L representing the number of grids within the window (Dong et al., 2015).For example, L = 9 if the size of a 2D window is 3 × 3, and L = 90 if the window is a 3 × 3 × 10 cube.The set of flattened patches is denoted as The dictionary representation of the seismic velocity model is used to find a linear relationship among the patches in P and their low-dimensional projections, that is, a dictionary   * = [1, . . .,  ] ∈ ℝ × contains K prototype atoms (with K < N usually), each with the same length as the patches (Figure 1b).Each patch    ∈ ℝ  can be represented as a sparse linear combination of the atoms in D*: (1) where

𝑐𝑐𝐾𝐾,𝑗𝑗
]  and c i,j represents the sparse coefficients for patches p j for atoms d i (Figure 1c).These linear combinations can be written in a matrix form as  et al., 2003).
To solve the dictionary D* and coefficients C* from patches, we use a sparse coding algorithm (e.g., Rubinstein et al., 2010) that seeks to minimize the difference between the patches and the reconstruction from linear superposition  2019) model in the same region as (a) but with higher resolution.(f) A dictionary learned from the velocity model in (e).All atoms are reshaped to the window size for better visualization.
where ‖ • ‖ 2 and ‖ • ‖ 0 are the second and zero order norm, respectively.The regularization parameter λ controls the tradeoff between the sparsity and the minimization error.Various algorithms, such as Orthogonal Matching Pursuit (OMP) (Mallat & Zhang, 1993) and Least Absolute Shrinkage and Selection Operator (LASSO) Tibshirani (1996), can be used to solve this optimization problem Equation 3.After obtaining the dictionary D* and corresponding representation matrix C* using the sparse coding algorithm, the reconstructed patches is calculated as  P =  *  * .The updated velocity model can be then developed by stitching the reconstructed patches together in their original spatial order (Dong et al., 2015).

Transformation Between Two Seismic Velocity Models
The dictionary representation technique can facilitate the transformation between two different seismic velocity models by establishing a correspondence between their respective dictionaries.To solve the problem described above, we divide it into three steps.In the first step, we extract patches . Such pairs of patches, for example,    1 and    2 , extracted from the same spatial locations are aligned by their index in P 1 and P 2 .The length of the patches, L 1 and L 2 , may differ due to the different resolution of  1 and  2 .In the second step, we transfer the patches from   U 1 to  P to ensure that under this transformation the transformed P 1 is consistent with P 2 .Finally, the transformed patches  P are put back to their original spatial location to rebuild the transformed velocity model   .
The dictionary representation technique described in Section 2.1 is applied in the second step, that is, the transformation of patches from   U 1 to  P .First, we represent patches in P 1 as sparse linear combination of K atoms in a This produces a sparse representation   ∈ ℝ × 2 that contains the sparse coefficients for each patch in P 1 (Figure 1c).The dictionary D 1 and the sparse representation C can be obtained simultaneously using sparse coding algorithms (Section 2.1).We also represent the patches in P 2 as a linear combination of atoms in a second (higher-resolution) dictionary  2 with the same sparse representation C (Figures 1d  and 1f) (5) The atom    2 in D 2 have one-to-one corresponding to    1 in D 1 due to the duality between P 1 and P 2 .Therefore, we can first represent   U 1 on D 1 using a representation matrix C U and then construct the transferred patches In practice, we randomly divide the patches P 1 and P 2 into training set , and put each pair of patches with the same location in the same data set.The training set is utilized to learn the dictionary and sparse representation matrix, while the validation set is used to select optimal parameters.We use the OMP algorithm to obtain the dictionary D 1 for the train set   T 1 and the corresponding sparse representation C T from following optimization: For the derived sparse representation C T , we calculate the corresponding dictionary 10.1029/2023JB027016 5 of 15 from   T 2 using the Least Squares Method.Then we use the validation set to evaluate the performance of the trained dictionaries D 1 and D 2 .The patches in   V 1 are represented on the trained D 1 using a sparse representation We then compare the difference between the transferred validation set  PV = 2 V and   V 2 .The difference between patches P and the baseline patches P 0 is defined as The optimal hyper-parameters in the training procedure (e.g., the K number of atoms in a dictionary, the patch size, and the sparsity control parameter λ) are found by minimizing the difference . If the velocity model has intricate internal structures a larger dictionary with more atoms is needed to capture the fine details, while for a simpler velocity model a smaller K may suffice.

Capability of the Dictionary Representation
As the basis of our methodology for transforming seismic velocity models, we first demonstrate the capability of representing a seismic velocity model using a dictionary with minimal error.We use a horizontal slice of V P at 2 km depth from the CVM-S4.26model (Lee et al., 2014) as an example (Figure 2a).This waveform-based community velocity model for southern California was shown to have a good overall performance in a validation study using waveform simulations (Lu & Ben-Zion, 2022).With a horizontal resolution of 0.09° × 0.09°, there are 30 grid points along the longitude direction and 24 grid points along the latitude direction in this horizontal slice.
We extract 616 patches from this model using a sliding window of size 3 × 3 and a stride (amount of grid points of window's movement over the model in one direction) of 1.In the training of the dictionary using OMP, we set the regularization parameter λ as 0.1 and the number of atoms in the dictionary to be 20 (Figure 1b).The resulting sparse representation of the model in the dictionary is then used to reconstruct the model (Figure 2b).The average error between the reconstruction and the original model is 2.2%, while the peak error at a single grid is 7% (Figure 2c), demonstrating accurate reconstruction using the dictionary representation.We applied the dictionary representation also to models from Zigone et al. (2015) and Fang et al. (2019) utilized in this study, and their resulting errors are 2.6% and 1.9%, respectively.

Transformation Between 2D V S Models
In this section, we evaluate the performance of the method for 2D sections of V S results.Zigone et al. (2015) derived a 3D V S model around the San Jacinto and San Andreas faults in southern California from Rayleigh wave velocities constructed from the ambient seismic noise.The model, referred to below as Z2015 and chosen as the target model  2 , has high-resolution in the top 7 km.As illustrated in Figure 3a with a horizontal slice at a depth of 2 km, the results exhibit informative lateral variations including low velocity zones and velocity contrasts across the main faults.On the other hand, the regional model CVM-S4.26used as the initial model  1 shows clearly low velocity sediments in the Salton Trough (Figure 3a), outside the spatial extent of Z2015.The horizontal resolution of  1 and  2 are 0.03° × 0.03° and 0.01° × 0.01°, respectively, and the range of CVM-S4.26used in the analysis is larger than that of  2 .
Considering the spatial scale of geological units and the number of training samples, patches are extracted from the overlapping area using a 6 × 6 sliding square window on  1 with a stride of 1 and a 18 × 18 sliding window on  2 with a stride of 3. To have efficient training of dictionaries, we compute the cross-correlation (CC) between upsampled low-resolution patches and their corresponding high-resolution patches.We only utilize patch pairs with CC > 0 for both training and validation.This screening step excludes outlier model values and ensures an efficient convergence of the training process.The specific CC threshold should be tuned based on the training process to retain most patch pairs.We also test other CC thresholds (0.1 and 0.2) and obtain similar results.Following this selection process, 1,131 of 1,212 patch pairs are available for analysis in this example application, from which we randomly select 1,000 pairs as the training set   T 1 and   T 2 , while the remaining pairs are allocated for the validation set   V 1 and   V 2 .Out of 40 combinations of hyperparameter configurations, the optimal number of atoms in D 1 and D 2 is determined to be 20, with an optimal sparsity control factor of 0.1.The dictionaries that are trained are presented in Figures 3b  and 3c.Notably, there is a coherence between the value ranges of atom pairs in the low-resolution dictionary D 1 and the high-resolution dictionary D 2 .This similarity does not appear explicitly in the loss function, but is rather implicitly required by the design of the algorithm.However, textures of atom pairs are not entirely consistent (Figure S1 in Supporting Information S1), allowing the transformation of the model's style.The difference between   T 2 and the transformed  PT 1 is 6.1%, while the difference between   V 2 and the transformed  PV 1 is 7.6%.To evaluate the performance of trained dictionaries, we upsample  1 to the same resolution as  2 using bicubic interpolation, and the difference between  2 and the upsampled  1 in the overlapping region is 11.6%.The difference between models decreases on both training and validation sets, indicating the effectiveness of the method.Subsequently, the transformation is applied to the entire region of  1 , and the reconstructed model is shown in Figure 3a.
The reconstructed model is smoother than CVM-S4.26 in the entire domain and more consistent with the Z2015 model in the common region.In most areas of the model, the reconstructed results show a relative decrease in velocity values compared to CVM-S4.26,aiming to approach the systematically lower velocities of Z2015 relative to CVM-S4.26.In regions with extreme low velocity anomalies in CVM-S4.26, the reconstructed results yield higher velocity values, making the reconstruction smoother.For the grids within the area of Z2015, the differences between the reconstruction and CVM-S4.26 are positively correlated with the differences between Z2015 and CVM-S4.26(Figure S2 in Supporting Information S1).Additionally, the reconstructed model shows a good consistency with geological features, for example, low velocities around both the San Jacinto and San Andreas faults.

Transformation Between Both 3D V P and V S Models
To demonstrate more general application scenarios, we show the efficacy of our methodology by analyzing 3D velocity models that incorporate both V P and V S .Merging and enhancing 3D models can improve derivation of earthquake source properties, simulations of ground motion, and other applications.Additionally, the V P /V S ratio is an important parameter that provides key information on various geological and mechanical aspects such as lithology, rock damage, partial melting and water saturation.A seismic velocity model of high quality should display coherent geological features in the V P /V S ratios.Fang et al. (2019) developed a joint inversion technique that utilizes both body and surface wave data to improve the resolution of V P (Figure 4, first row) and V S (Figure 5, first row).The resulting velocity model (referred to as F2019) exhibits distinct regions of V P /V S ratios along the coast, in the vicinity of the San Jacinto and San Andreas faults, and the Salton Sea regions (Figure 6, top row).In contrast, the regional seismic tomography model CVM-S4.26lacks coherent regions of V P /V S ratios consistent with geological features (Figure 6, middle row).When transforming an under-constrained V P /V S ratio model to a well-constrained model, it is important to verify that the resulting V P /V S ratios correlate well with geological features.
In this case study, the initial seismic velocity model  1 is taken from velocities at 10 different depth sections (1-10 km) of the CVM-S4.26,while the target model  2 is taken from velocities of the F2019 model at the same depths.Both models have two sets of velocities, V P and V S , which are trained simultaneously with 20 layers in total representing values at different depths.The horizontal resolution resolutions of  1 and  2 are 0.09° × 0.09° and 0.03° × 0.03°, respectively.To ensure the smoothness of the transformed model, we employ bicubic interpolation to upsampled  to match the horizontal resolution of  2 .We then extract patches from both models using a 9 × 9 × 20 sliding window (20 being the number of layers) with a stride of 1.We employ a 3D window because velocities at different depths are not entirely independent and are likely to include correlated variations.Utilizing 3D reconstruction ensures continuity and smoothness in depth for the reconstructed model.This approach is similar to applying super-resolution techniques to videos, where not only individual frames are considered but also information from neighboring frames is utilized (Kappeler et al., 2016).
The resulting patch pairs total 980, from which 800 are randomly selected for the training set and the remaining pairs are used for validation.Each of our trained dictionaries contain 20 atoms with a dimension of 9 × 9 × 20.

10.1029/2023JB027016
9 of 15 scale model.The method is illustrated using V P , V S , and V P /V S tomographic results of local and regional models in southern California.The utilized sparse dictionary learning is amenable to interpretation and generalizations.Reconstructed 2D slices and 3D volumes of the regional CVM-S4.26model (Lee et al., 2014) exhibit visual enhancements and resemblance to results of the local higher-resolution target models (Fang et al., 2019;Zigone et al., 2015), which are especially clear for V P /V S ratios.The superiority of the reconstructed regional model relative to the original version is demonstrated further below by examining correlations of results with geological features, and model validation involving comparisons between numerical simulations of earthquake waveforms with observed data.
Since our method combines information from existing models, rather than developing a velocity model from seismic data, it is difficult to assess directly the resolution of the reconstructed model.However, it is reasonable to assume that the resolution of the reconstructed model in the overlapping region is similar to that of the more detailed model, while in the outer region it falls between the resolutions of the two used models.The increased resolution of the reconstructed model in the outer region relative to the original model is supported by the improved fitting of simulated waveforms to observed data illustrated in Section 4.2.

Interpretability of the Dictionary-Based Method
Sparse dictionary learning is an interpretable approach that produces a set of fundamental atoms that can be visualized, and the sparse representation can itself be physically meaningful.For instance, we can define the dominant atom d m for a given patch p j as the atom with the largest coefficient in the sparse representation of the patch, that is, c m,j = max(c i,j ), and assign the index of the dominant atom (m) to the center location of the patch.This procedure may be used to classify different regions and to obtain a map of dominant atoms distribution which can be further analyzed.
To illustrate, we compare the distribution of the dominant atoms in the results of Section 3.3 with velocity profiles and geological features (Figure 8).Patches from areas with low V P , V S and high V P /V S ratios, such as the north end  of simulation results for Event 1 (ID: 38460311) to observed seismograms at five stations located both inside and outside the overlapping region.The simulated results based on the reconstructed model exhibit a closer fit to the observations than those produced using the baseline model for both the body and surface waves portions of the seismograms.The improvement is evident by an increase of the CC coefficient between synthetic and observed waveforms denoted below the seismograms in Figure 9b.Similar improvements are found for simulations of other events (Figures S5-58 in Supporting Information S1) and a wider period range of 1-10 s (Figure S6 in Supporting Information S1).The average CC between synthetic and observed waveforms for the 5 events and stations indicated in Figure 9a is 0.33 using the reconstructed model, while for the original model it is 0.12.The closer fit of the simulated arrival times and amplitude of multiple phases to data (with an average CC gain factor of 2.75) indicates that the reconstructed model can be used to improve predictions of earthquake ground motion.
Due to the significant modifications of low-velocity values within the LA basin in our reconstructed model, we further analyze the similarity between synthetic and observed waveforms using data recorded by stations inside the LA basin (CI.USC, CI.DJJ, CI.RPV).The improvements in fitting observed data in the LA basin using the reconstructed model is reduced to an average CC gain factor of 1.2 (Figures S9 and S10 in Supporting Information S1).The regional baseline model produces better results on some portions of waveforms, but the performance of the reconstructed model is still better overall than that using the CVM-S4.26model.To assess further the ability of the different models to fit observed data, we also calculate the average normalized mean squared error (NRMSE) between synthetic and observed waveforms.The results indicate an overall decrease in the average NRMSE from 2.06 for the regional baseline model to 1.24 for the reconstructed model.We note that the relatively high NRMSE values for both models stem from the high frequency band included in the analysis.While this illustrates again the potential of the presented method, using additional metrics for evaluation can provide additional information and may be used in future work.

Limitations and Outlook for Future Research
The developed sparse dictionary learning method for transferring high-resolution seismic velocity results to lower resolution regional models is shown to be effective, but it has several limitations.First, the method requires a local seismic velocity model of high quality, and the overall consistency between the regional and local models can influence the quality of the trained dictionaries and the reconstructed models.Moreover, it has been demonstrated that achieving a factor 4 improvement in image super-resolution poses a challenge (e.g., Dong et al., 2015;Yang et al., 2008).This likely produces a corresponding limitation in the resolution differences between the used highand low-resolution models in our method.Additionally, the selection of hyper-parameters (Section 2.2) may vary for different models and should be tuned through experimentation and validation.
The validation results of Section 4.2 demonstrate quantitatively that the enhanced multi-scale Vp, Vs models outperform overall the baseline regional models.Including additional metrics for evaluation (e.g., Taborda et al., 2016), and comparing separately waveforms of different phases and different frequency can provide additional information on the quality of the models.The reduced validation performance in the LA basin motivates additional work to further improve the reconstructed multi-scale models in locations characterized by low velocities.This may be achieved by merging iteratively within the enhanced models detailed tomographic results for basin structures and the shallow crust, and may be the subject of a follow up work.
The developed method for merging and enhancing velocity models can be applied to other regions that have well-developed regional seismic velocity models and local high-quality models, such as central and northern California, Europe and Japan.In addition, the method may be applied in future studies hierarchically to include higher-resolution smaller-scale results of the type derived from dense deployment data (e.g., Mordret et al., 2019;Qiu et al., 2021).The improved multi-scale velocity models can have numerous applications in the field of seismology.Examples include more accurate derivation of earthquake source properties and more realistic simulations of earthquake ground motion over a wider range of frequencies leading to better estimates of seismic hazards.The method has also a potential application in providing V P values for noise-based models that are typically derived only for V S (e.g., Zigone et al., 2015) by building a transformation with a model that has both V P and V S values.
The concept of dictionary representation and transformation presented in this paper can be applied to additional earth science data sets.For instance, high-resolution surface deformation observed from InSAR is limited to areas with high correlations not including, for example, highly vegetated regions (e.g., Wei & Sandwell, 2010).An approach similar to the one developed here can enhance regional low-resolution deformation observations with higher-resolution local results.Furthermore, the low temporal resolution of InSAR data can be improved by high temporal resolution GPS data regarding the timeline as an extra dimension (e.g., Xu et al., 2022).The interpretability of the dictionary learning method can facilitate discovering spatial and temporal correlations between different physical quantities, such as geological features, seismic velocity structure, and surface deformation.

Figure 1 .
Figure 1.Dictionary representation of seismic velocity models and transformation between them.(a) Horizontal slice of CVM-S4.26 at 2 km depth with a resolution of 0.09° × 0.09°.The red square indicates the patch used in (c).Black lines show major fault traces.SAF-San Andreas Fault, SJF-San Jacinto Fault, EF-Elsinore Fault.(b) A dictionary consisting of 20 atoms learned from the velocity model in (a).(c) Each patch extracted from the velocity model can be represented as a linear superposition of atoms in the dictionary.(d) The patch extracted from the model in (e) at the same location can be represented using another dictionary with the same sparse representation.(e) Horizontal slice of the Fang et al. (2019) model in the same region as (a) but with higher resolution.(f) A dictionary learned from the velocity model in (e).All atoms are reshaped to the window size for better visualization.

Figure 2 .
Figure 2. (a) Horizontal slice of V P at 2 km depth from CVM-S4.26.The sampled resolution is 0.09° × 0.09°.(b) Reconstruction of the model in (a) using a dictionary representation.(c) The difference (error) between the reconstruction and the original model normalized at each grid by the value in the original model.Black lines mark major fault traces.

Figure 3 .
Figure 3. (a) Horizontal slices of V S at 2 km depth from the Zigone et al. (2015) model (top left), CVM-S4.26(top right), the reconstructed model (bottom left), and the difference between the reconstructed and CVM-S4.26models (bottom right).The resolution of the Zigone et al. (2015) and reconstructed models is 0.01° × 0.01°, while the resolution of CVM-S4.26 is 0.03° × 0.03°.Black lines show major fault traces.ST denotes the Salton Trough.(b) Low-resolution dictionary D 1 and (c) High-resolution dictionary D 2 learned from the training data.All atoms are reshaped to the window size for better visualization.

Figure 5 .
Figure 5. Same as Figure 4 for V S .

Figure 6 .Figure 7 .
Figure 6.Same as top 3 rows of Figure 4 for V P /V S .

Figure 9 .
Figure 9. (a) Locations of events with focal mechanisms indicated by beach balls and stations used for validation with waveform simulations.Black lines show major fault traces.The light-brown polygon represents the coverage of the Fang et al. (2019) model.Triangles mark stations of the CI network; larger red triangles denote stations used in (b) and Figures S5-S10 in Supporting Information S1; station CI.CRY is also used but not shown since it is far to the east at (34.68°N, 119.63°W).Stations CI.USC, CI.DJJ and CI.RPV are inside the LA basin.(b) Vertical components of observed (black lines) and synthetic waveforms from event E1 at five stations.The synthetic waveforms for the reconstructed and CVM-S4.26models are plotted with red and blue lines, respectively.All waveforms are bandpass filtered from 0.2 to 1 Hz and normalized by the maximum amplitude of the observed waveform.The CC coefficients between observed and synthetic results are denoted below the waveforms.