Machine Learning Analysis of Seismograms Reveals a Continuous Plumbing System Evolution Beneath the Klyuchevskoy Volcano in Kamchatka, Russia

Volcanoes produce a variety of seismic signals and, therefore, continuous seismograms provide crucial information for monitoring the state of a volcano. According to their source mechanism and signal properties, seismo‐volcanic signals can be categorized into distinct classes, which works particularly well for short transients. Applying classification approaches to long‐duration continuous signals containing volcanic tremors, characterized by varying signal characteristics, proves challenging due to the complex nature of these signals. That makes it difficult to attribute them to a single volcanic process and questions the feasibility of classification. In the present study, we consider the whole seismic time series as valuable information about the plumbing system (the combination of plumbing structure and activity distribution). The considered data are year‐long seismograms recorded at individual stations near the Klyuchevskoy Volcanic Group (Kamchatka, Russia). With a scattering network and a Uniform Manifold Approximation and Projection (UMAP), we transform the continuous data into a two‐dimensional representation (a seismogram atlas), which helps us to identify sudden and continuous changes in the signal properties. We observe an ever‐changing seismic wavefield that we relate to a continuously evolving plumbing system. Through additional data, we can relate signal variations to various state changes of the volcano including transitions from deep to shallow activity, deep reactivation, weak signals during quiet times, and eruptive activity. The atlases serve as a visual tool for analyzing extensive seismic time series, allowing us to associate specific atlas areas, indicative of similar signal characteristics, with distinct volcanic activities and variations in the volcanic plumbing system.


Introduction
Volcanoes and their eruptions are surface manifestations of dynamics of underlying complex plumbing systems.Most volcanoes are fed by the magma generated in the upper mantle that is then arising through the crust toward the surface.Large volumes of magma can be stored at intermediate depths and exhibit chemical transformations via crystallization, degassing, differentiation, and interaction with surrounding rocks resulting in formation of complex plumbing systems.Our knowledge about their deep structure remains limited."End-member" concepts of this structure (e.g., Holness et al., 2019) are: magma storage occurring mainly in relatively shallow and longlived reservoirs known as "magma chambers," and trans-crustal systems largely composed of a mixture of melt, crystals, and volatiles known as "crystal mush" (e.g., Cashman et al., 2017).
Modern volcano monitoring employs various surface and satellite-based observations to detect and anticipate volcanic unrest.When possible, efforts are made to interpret the observations in terms of the physical processes occurring inside the volcanic plumbing systems, which lead to eruptions.The latter task is particularly challenging, given our limited understanding of the structure of the mentioned plumbing systems.Consequently, monitoring relies heavily on the empirical recognition of patterns in accumulated observational data sets.In this regard, modern machine learning methods, both supervised and unsupervised, are actively utilized.
Volcanoes produce a wide range of seismic signals (e.g., Chouet & Matoza, 2013;Journeau et al., 2022;McNutt, 2005;Wilding et al., 2022).Volcano-seismologists have classified seismic signals with volcanic origin into distinct classes based on the source mechanism and signal characteristics.These classes include various types of impulsive events such as volcano-tectonic earthquakes, long-period (LP) events, hybrid events, tornillos, rockfalls, and a large family of nearly continuous signals known as volcanic tremors (e.g., Hotovec et al., 2013;Julian, 1994;Konstantinou & Schlindwein, 2003;Shapiro et al., 2023;Unglert & Jellinek, 2015).Some studies observed a continuous transition from discrete LP events to tremor episodes and back, making the boundary between these two signal classes blurry (e.g., Beroza & Ide, 2011;Fehler, 1983;Latter, 1979).Very often, many different seismo-volcanic sources act simultaneously, with potential interactions, resulting in non-stationary signals (e.g., Chouet & Matoza, 2013;Konstantinou & Schlindwein, 2003) composed of a mixture of impulsive "earthquake" and continuous "tremors." Tools adapted from earthquake seismology can detect the short-duration impulsive signals in continuous seismograms and most often locate their underlying sources, resulting in a discrete event catalog.In recent years, supervised learning strategies have been used extensively for event detection and classification tasks (e.g., Curilem et al., 2009;Falcin et al., 2021;Hibert et al., 2017;Lara et al., 2020;Maggi et al., 2017;Malfante et al., 2018;Titos et al., 2018).Applying such approaches to long-duration continuous signals that contain volcanic tremors and environmental noise and have a varying appearance in frequency and amplitude is, however, more problematic.The complex nature of these continuous signals makes it difficult to link them to a single volcanic process and challenges the applicability of the classification.
In this study, we propose a new unsupervised machine learning approach for monitoring the state of active volcanic plumbing systems based on continuous seismological observations.An important idea behind our approach is that a modification in the state of a plumbing system will result in a modification of properties of continuous seismic signals recorded in the vicinity of volcanoes.These latter are composed of seismic waves generated by a variety of internal and environmental sources and propagating through different parts of the plumbing system.A change in the volcanic activity will affect the properties of the seismo-volcanic sources while a change in the plumbing structure will affect the media through which the seismic waves propagate.This implies that these signals are very sensitive to the state of the plumbing system.Therefore, even in the absence of precise knowledge regarding the specific changes and locations within a plumbing system, we can identify such changes through observed modifications in the properties of seismic signals.
We employ a mathematical decomposition and transformation of segments from continuous signals to represent them in parametric spaces.Subsequently, we investigate a potential mapping between these signal parameter spaces and the states of the volcanic plumbing system.The underlying idea is that a particular state of the plumbing system, characterized by a combination of plumbing structure and activity distribution, generates signals with distinct properties, positioning them in a unique location within the signal parameter space.Although the precise mapping and its connection to physical processes may elude exact description, we can still analyze it to identify changes in the plumbing system state over time.In practical terms, if the signal segments of an observation period are positioned at the same location in the signal parameter space, the plumbing system state is considered "stationary."Conversely, any modification in volcanic activity distribution or plumbing structure lead to a shift in the signal parameter space position.
We employ mathematical tools like scattering networks (Andén & Mallat, 2014), independent component analysis (ICA; Comon, 1994), and Uniform Manifold Approximation and Projection (UMAP; McInnes et al., 2018).Given the rich and various appearance of seismicity in a volcanic environment, we test the hypothesis that the data-driven analysis of continuous seismograms offers new and different insights into the inner workings of a volcano than what current discrete event catalogs or supervised classification schemes can provide.ICA retrieves continuous features from the seismic time series, describing the temporal evolution of signal patterns.We are motivated by the results presented in Steinmann, Seydoux, and Campillo (2022) where the authors capture the signal-altering effect of surface freezing and thawing onto a single independent component.In a similar mindset, Hyvärinen et al. (2010) applied ICA to the Short-Term Fourier Transform of electroencephalography (EEG) and magnetoencephalography (MEG) time series data, revealing interesting information related to brain activity.Additionally, ICA has shown successful applications in analyzing various types of time series data, such as the examination of InSAR image time series (Ebmeier, 2016;Gaddes et al., 2018;Ghosh et al., 2021).Besides the interpretation of independent components, the seismogram atlas -a two-dimensional data representation of the seismic time series obtained using UMAP-offers a novel way to visualize the signal content of large seismic time series.By avoiding clustering and focusing on the analysis of the features and the seismogram atlas, we can observe the continuous evolution of the signal characteristics over time, providing a more complete picture of the mixing of different non-stationary seismic sources in seismo-volcanic signals.
We apply these machine learning methods to individual year-long continuous seismograms recorded in the vicinity of Klyuchevskoy volcano (Kamchatka, Russia) during an eruption and its preparation period.Both the results of the ICA analysis and the seismogram atlases clearly show continuously evolving signals with many pattern changes.To interpret these changes and to try to understand their origin, we compare them with known eruptive history and the satellite-inferred lava emission rates as well as with the catalog of volcanic tremors of Journeau et al. (2022) that provides key information about the depth where the observed changes are occurring.

Klyuchevskoy Volcano Group (KVG) and Its Seismic Activity
The KVG is one of the largest and most active clusters of subduction volcanoes in the World (e.g., Fedotov et al., 2010;Shapiro, Sens-Schönfelder, et al., 2017).Its origin is related to the unique tectonic setting at the corner between the Kuril-Kamchatka and Aleutian trenches.The enhanced supply of the melt from the mantle might be caused by the around-slab-edge asthenospheric flow (Levin et al., 2002;Yogodzinski et al., 2001) and related crustal extension (Green et al., 2020;Koulakov et al., 2020) or by fluids released from the thick, highly hydrated Hawaiian-Emperor crust subducted beneath this corner (Dorendorf et al., 2000).There is also evidence that the distinct volcanoes of KVG interact with each other on various time scales, affecting their steady state regimes and magma output (Coppola et al., 2021).

An Ideal Data Representation With a Scattering Network
In the following, we want to outline how we create data-driven features with ICA and seismogram atlases with UMAP from continuous seismograms.For two main reasons, waveform data is a poor data representation to perform those tasks.First, waveform data is sensitive to translation, meaning it encodes information about the signal's position in time.Second, waveform data is sensitive to small signal deformations, meaning that small deformations cause large variations in the data representation.To address these challenges, it is essential to find a representation that is both translation-invariant and stable to small deformations.While the amplitude spectrum of the Fourier transform is translation-invariant, it lacks stability to signal deformations, particularly at higher frequencies (Bruna & Mallat, 2013).The wavelet transform replaces the non-localized sine waves of the Fourier transform with localized waveforms, offering stability to deformations.However, the wavelet transform is not inherently translation-invariant.By adding non-linear averaging operators to the wavelet transform, we achieve both translation invariance and stability and create an architecture similar to Convolutional Neural Networks (CNNs).However, it's worth noting that the non-linear averaging operator can potentially remove important signal information.To mitigate this information loss, we repeat the wavelet transform in combination with the averaging non-linear operators and extract information at each layer.This iterative approach allows us to recover most of the lost information, resulting in a representation that is both translation-invariant and stable to small deformations.Bruna and Mallat (2013) and Andén and Mallat (2014) introduced this architecture and it has been recently applied to continuous seismograms, capturing intriguing patterns (Barkaoui et al., 2021;Moreau et al., 2022;Rodríguez et al., 2021;Seydoux et al., 2020;Steinmann, Seydoux, Beaucé, & Campillo, 2022;Steinmann, Seydoux, & Campillo, 2022).Moreover, scattering coefficients performed better for classification and data exploration tasks in comparison to spectral coefficients from the Fourier transform (Andén & Mallat, 2014;Steinmann, Seydoux, Beaucé, & Campillo, 2022).We apply a scattering network with a sliding window to the continuous three-component seismograms to retrieve the scattering coefficients (Figure 2).Each layer produces an output and the convolutional filters, classically learned in the case of CNNs, are restricted to a set of predefined wavelets.Considering a mother wavelet ψ(t), we can define a set of filter bank ψ λ (t) = λψ(λt) by dilating the mother wavelet ψ(t) with a set of dilation factors λ ∈ R. In the frequency domain, the set of wavelet banks would be ψλ (ω) = ψ(ω/λ).The dilation factor λ can then be defined as with Q ∈ N being the number of wavelets per octave and J ∈ N being the number of octaves.This definition of the dilation factor provides a logarithmic grid of the center frequencies for the set of wavelet filter banks.
By convolving a time series x(t) ∈ R with a set of wavelet filter banks ψ λ (t) and taking the modulus (which plays the role of an activation function), we obtain a real-valued time-frequency representation U λ (t) of the time series called a scalogram such as defining the first convolutional layer of the scattering network with ⋆ standing for convolution operation.In Andén and Mallat (2014) the authors introduce a low-pass filter ϕ(t) to retrieve the first-order scattering coefficients, as where the low pass filter ϕ(t) smooths the representation and makes it more stable for small deformation of the signal.However, it also removes other small-scale structures of the signal which might be important for pattern recognition tasks.This information is recovered by repeating the convolution and modulus operation, retrieving higher-order scattering coefficients.Note that the set of dilation factors λ differs with the layer of the scattering network.With two sets of wavelet filter banks, ψ λ 1 (t) at the first layer and ψ λ 2 (t) at the second layer, we can calculate the second-order scattering coefficients By repeating this operation many times, we can retrieve higher-order scattering coefficients which add more and more information.However, Andén and Mallat (2014) already concluded that the information gain beyond second-order scattering coefficients is marginal compared to the increasing computational costs.Therefore, we build a two-layer scattering network recovering first-and second-order scattering coefficients.The wavelets of the scattering network are Gabor wavelets as initially proposed in Andén and Mallat (2014).The Gabor wavelet ψ (t) with a center frequency f is a complex exponential multiplied with a Gaussian window, defined by (5) While f are the center frequencies defining the modulation of the Gabor wavelet, a defines the exponential dropoff of the waveform.We define a as a function of the wavelet's bandwidth d and its center frequency f, which in turn depends on the Nyquist frequency f N of the signal x(t) and the dilation factor λ In this work, we design a two-layered scattering network with a sliding window of 20 min and an overlap of 10 min, resulting in a temporal resolution of 10 min of the scattering coefficient matrix.The first layer wavelets are adapted to the possible frequency content of the tremors; the wavelet's center frequencies range from 0.78 to 10 Hz with a logarithmic grid.The second layer wavelets start at much lower frequencies since they gather information about the modulation and shape of the signal.The first layer covers four octaves and is densely spaced with four wavelets per octave.The second layer covers eight octaves and is sparsely sampled with one wavelet per octave.With three-component seismograms, this results in a scattering coefficient matrix of 480 dimensions.

Pooling as a Low Pass Filter in the Scattering Network
We use a pooling operation with a one-dimensional kernel as the low-pass filter ϕ(t) retrieving the scattering coefficients from the scalogram at each layer.The pooling operation retrieves a single value for each scale in the scalogram and, thus, acts as a low pass filter and downsampling operation (Dumoulin & Visin, 2016), ensuring a stable and translation-invariant representation.There are many different types of pooling operations, filtering different types of information and preserving different signal characteristics.In Seydoux et al. (2020), the authors applied the scattering network with an average pooling and other possibilities are maximum pooling or median pooling, taking the maximum or median value of each scale in the scalogram.As an example case, we analyze the scattering coefficient retrieved with maximum and median pooling for a 20 min seismogram recorded at station SV13 (Figure 1).The dominant signal in this 20 min seismogram is a broadband transient event arriving after 800 s and lasting for 100 s.Moreover, there are also persistent harmonic signals, possibly of volcanic origin, around 0.8 and 2 Hz with a lower amplitude than the transient event.Besides the broadband transient and the harmonics, we can identify changing amplitudes at frequencies around 10 Hz.This example shows the variety of signals of a single seismogram and we must acknowledge that any representation without time information -such as the Fourier spectrum or scattering coefficients -will simplify the data.The information retrieved by the scattering coefficients depends largely on the settings of the scattering network: number of wavelets, frequency range of wavelets, and pooling operation.Figures 1c-1e show the median and maximum pooled scattering coefficients together with the Fourier spectrum.The first-order maximum pooled coefficients resemble a smoothed Fourier spectrum (Figure 1c).The first-order median pooled coefficients are lower in amplitude and contain different local maxima and minima.They show larger amplitude for the two harmonics and lower amplitudes in between.The transient event with large amplitudes between 0.2 and 10 Hz seems to have no influence on the median pooled coefficients.In contrast, the maximum pooled coefficients have an amplitude distribution that matches much better the transient event.The type of pooling operation, which transforms the scalogram into scattering coefficients, filters the data and stores different types of information.Median pooled coefficients contain the information of the background wavefield and ignore any short-lived transients in the seismogram.Maximum pooled coefficients are sensitive to any type of short-lived transient in the seismogram which mask the background wavefield.Note also that maximum pooling would save the information of two transient events, if they appear in different frequency ranges.Thus, it could be a representation of a mixture of large amplitude events with different frequency content.Both pooling operations are valid, however, we need to acknowledge that both representations are biased and simplify the nature of the seismic data, an important fact to consider for exploratory data analysis.In this work, we will consider median pooling to mitigate the effects of impulsive short-term signals.

Feature Extraction With Independent Component Analysis
The scattering coefficients are collected in a data matrix X in such a way that the rows contain the scattering coefficient time series and the columns contain all scattering coefficients for one sliding time window (Figure 2).We refer to the time series of the independent sources as features for the following text.The aim of ICA is the separation of multivariate signals into independent, non-Gaussian source signals, which can be formalized in the following way where X ∈ R F×N are the N observations of dimension F, A ∈ R F×C is the mixing matrix, and S ∈ R C×N are the C independent sources.ICA estimates S by applying the inverse or pseudo-inverse of the mixing matrix, called unmixing matrix, W ∈ R C×F to the observed data in X S = WX.
ICA solves this equation by maximizing the statistical independence of the sources in S. The independence is estimated by a measurement of non-Gaussianity such as the kurtosis or negentropy (Hyvärinen & Oja, 2000).
The number of sources C is not known and is one of the most important parameters impacting the results of ICA.Often, this parameter is set according to a measurement estimating the information loss such as the explained variance score.Note that ICA is often described as a generalization of principal component analysis (PCA), since the independent components (sources) have no constraints of orthogonality (Comon, 1994).Also in contrast to PCA, the sign and amplitude of the independent sources cannot be determined, because both S and A are unknown and a scaling factor can always be canceled out.Therefore, ICA does not provide any  ranking to the retrieved sources.It is common practice to center and whiten the data in X since it constrains the unmixing matrix to be orthogonal and therefore the number of free parameters reduces (Hyvärinen & Oja, 2000).

Seismogram Atlases With Uniform Manifold Approximation and Projection (UMAP)
UMAP is a manifold learning technique, which has been introduced in the work of McInnes et al. (2018).Similar to ICA, UMAP is a tool to reduce the dimensions of a high-dimensional data set for downstream tasks such as visualization.Since we are interested in a visualization of the high dimensional scattering coefficient matrix, we restrict the number of dimensions to two.Any dimensionality reduction technique comes with a loss of information and the loss depends on the objective of the dimensionality reduction technique.Because ICA performs a linear mapping, it preserves well the pair-wise distances, but it loses information about local structures.UMAP learns the manifold of the given data and, thus, performs better in preserving local structures at the price of distorting the global structure.Hence, the distances between neighboring points are more reliable than distances between clusters of data points or the area of a cluster.& Hinton, 2008).However, compared to UMAP, t-SNE performs poorly in preserving global structures and its computation time is much slower (Becht et al., 2019;McInnes et al., 2018).Despite its relatively recent introduction, UMAP has been already utilized in many scientific domains to create a two-dimensional representations, simplifying the visualization of large and highdimensional data sets.The resulting two-dimensional UMAP spaces have been coined "atlases" such as the activation atlas of neural networks (Carter et al., 2019), the mouse organogenesis cell atlas (Cao et al., 2019), or the metagenomic atlas (Lin et al., 2022).
UMAP comes with a set of hyperparameters to tune such as the number of neighbors and the minimum distance, drawing the focus either toward preserving local or global structures.The number of neighbors limits the number of neighboring points when UMAP learns the local manifold structure.A low number draws the focus to the local structure while losing the bigger picture.A large number draws the focus on the global structure while losing finer details.The minimum distance controls how closely UMAP is allowed to bring data points together.A low number results in a more dense and clumpier representation and preserves better the local structure of the data.A large number avoids putting points close to each other and draws a broader picture of the data.

The Data: Continuous Seismograms, Tremor Catalog, and Lava Discharge Rate
In this work, we apply exploratory data analysis to the continuous seismograms of a joint Russian-German-French temporary seismic experiment named KISS (Klyuchevskoy Investigation-Seismic Structure of an Extraordinary Volcanic System; Shapiro, Sens-Schönfelder, et al., 2017), including short period and broadband sensors and covering the time period between August 2015 and July 2016.We focus on continuous three-component seismograms recorded by six individual broadband stations (Figure 3).The seismograms are demeaned, detrended, and down-sampled to a sampling rate of 25 Hz.Additional data such as a tremor catalog from Journeau et al. (2022) and the time-averaged lava discharge rate (TADR) from Coppola et al. (2021) will support the exploratory data analysis, connecting signal patterns to known state changes of the volcanic system.Due to the remoteness of the KVG, seismic and satellite data are the only available data sources.

The Scattering Coefficients
To visualize the continuous seismograms and introduce the concept of scattering coefficients, we depict the firstand second-order coefficient time series of the east channel of station SV 13 in Figures 4a and 4b.We choose station SV13 since it is located directly above the cataloged tremor activity (Figure 3).The first-order scattering coefficient time series resembles a spectrogram with spectral coefficients based on Fourier analysis (Figure 4a).The second-order scattering coefficient time series appear as multiples of the first-order scattering coefficients due to the application of the second-order wavelet transform to the first-order scalogram (Figure 4b).
Journal of Geophysical Research: Solid Earth 10.1029/2023JB027167

The Tremor Catalog
The authors of Journeau et al. (2022) used the network's spectral covariance matrix (Journeau et al., 2020;Seydoux et al., 2016;Soubestre et al., 2018Soubestre et al., , 2019) ) to detect and locate the coherent signals in a continuously moving time window and built a catalog of volcanic tremors including locations of their sources.We want to emphasize here that the catalog is a valuable source of information in validating the results of our work, however, it does not hold the ground truth, either.Figure 4c 4c).

Lava Discharge Rate Time Series and Daily Activity Level
We also use the time-averaged lava discharge rate (TADR) time series of Coppola et al. (2021), indicating the timing and strength of the eruption.The TADR data is estimated from infrared satellite data, assuming that the radiated energy of a lava body is linearly correlated to the bulk erupted volume.In the first half of April 2016, an eruption unfolded at the Klyuchevskoy volcano, indicated by TADR values above 0.1 m 3 s 1 and lasted throughout the remaining recording time of the KISS experiment (Figure 4d).However, the exact starting time is not known.The scattering coefficients depict the largest amplitudes during the first half of April 2016 and show elevated amplitudes for the remaining recording time.Moreover, the Kamchatka Branch of the Russian Geophysical Survey (KBGS) determines the daily activity level for the Klyuchevskoy volcano based on detected seismic activity combined with visual and satellite observations, when available.The orange color in Figure 4d corresponds to an ongoing eruption.

Hierarchical ICA With Increasing Number of Components
The complete scattering coefficient matrix of station SV13, including the east, north and vertical channels, is the input for the ICA model.We apply ICA models with four (M 4 ), 12 (M 12 ), and 50 (M 50 ) independent sources to explore the impact of dimensionality.M 4 reaches an explained variance score of 94 %, M 12 reaches an explained variance score of 98 % and M 50 reaches an explained variance score of 99.6 %. Figure 5 shows the smoothed time history of the independent sources (features) of each model.The features show negative and positive values of arbitrary units centered around zero due to the centering and whitening of the scattering coefficient matrix.We sort the features according to their maximum absolute amplitude appearance in time, helping the visualization of any time-dependent processes.
The features of the three models show very different time series and in the following, we want to use the models to understand better the underlying seismic data.First of all, we provide a qualitative comparison between the features of the three models.While there is no single feature matching between M 4 and M 50 (Figures 5b and 5d), we can find similar features between M 4 and M 12 such as feature 2 in both models (Figures 5b and 5c).M 50 is very different from the other two models, since its features appear more sparse, that is, they are mostly centered around zero except for a short duration.Moreover, it is striking that if one feature shows large amplitudes in a negative or positive direction (saturated blue and red colors), almost every other feature is centered around zero.These characteristics of M 50 together with the sorting of the features result in a this color-saturated diagonal line in the time-feature space (Figure 5d).In contrast, the data points represented by the features of M 4 and M 12 do have nonzero values for more than one dimension.The M 50 model indicates that the seismic time series witness an everchanging seismic wavefield with new signal characteristics throughout the recording time.The comparison with  5d).This suggests more stable signal patterns lasting for longer periods compared to the plumbing system reactivation in early December.Interestingly, periods of a high tremor count correlate with a positive sign on the diagonal line of the M 50 model Overall, we find correlations between feature changes-indicating signal pattern changes in the seismograms-and state changes of the Klyuchevskoy volcano.Moreover, we see that the number of components has an effect on the retrieved features, showing different time histories.The different ICA realization can be seen as a hierarchical ICA, where a model with a larger number of components-such as M 50 -can account for smaller differences in the signal characteristics.

Interpreting M 4 Features With the Mixing Matrix
To understand better what the features represent, we recall the equation of ICA (see Equations 7 and 8).The whitened and centered scattering coefficient matrix X is estimated as the sum of rank-1 matrices, resulting from the outer product of a feature (rows in S) with the corresponding columns in the mixing matrix A. Hence, the columns of A reveal how each feature contributes to the estimation of X.The visualization of the columns of A and its outer product with the corresponding feature can help to understand better the underlying signal characteristic of each feature.In Steinmann, Seydoux, and Campillo (2022), we used this method to reveal the changing signal patterns due to the freezing and thawing of the near subsurface.In Figure 6, we visualize the mixing weights of the M 4 model according to the center frequencies of the first-and second-order wavelets.We can use the shown mixing weights to attribute signal characteristics to the features of M 4 in Figure 5b.For example, feature 3 in Figure 5b shows a general correlation with the occurrence of tremors.The corresponding mixing weights show mainly negative amplitudes peaking at f 1 = 2 Hz in all components and for both first-and second-order scattering coefficients (Figure 6).Figures 7b-7d show the reconstructed first-order scattering coefficients, resulting from the outer product of feature 3 with its mixing weights.We disregard the second-order scattering coefficients for visualization purposes, however, we want to emphasize that they contain important signal information.We also add the mean over the scattering coefficients, which we subtracted before the ICA during the whitening process.The reconstruction makes clear that the tremor periods are characterized by a broadband amplitude increase peaking around 2 Hz (Figures 7b-7d).Note that both the mixing weights (source 3 in Figure 6) and the feature amplitudes during tremor-active periods (source 3 in Figure 5b) are negative, resulting in positive amplitudes of the reconstructed scattering coefficients due to the matrix multiplication (Figures 7b-7d).This example shows the ambiguity of the sign attached to the sources: any change of sign in feature 3 can be equalized with a change of sign of the corresponding column vector of the mixing matrix, resulting in the same  7).For visualization purposes, we reshaped the mixing matrix to display the weights related to the first-order coefficients in the left column, and the weights related to the second-order coefficients are split into three different subplots according to the seismometer's component.
rank-1 matrix.The reconstruction shows that feature 3 of M 4 , correlating with general tremor occurrence, relates to broadband amplitude changes.We do not need machine learning approaches to observe a correlation between a broadband amplitude increase and tremor occurrence, however, we want to show that the features represent meaningful patterns.
A more interesting example is feature 2, which correlates only with the tremor sequences in August and September 2015 (feature 2 in Figure 5b).The corresponding mixing weights show positive and negative amplitudes depending on the frequencies f 1 and f 2 and the channel (source 2 in Figure 6).Similar to before, we can visualize the first-order scattering coefficients of the obtained rank-1 matrix by the outer product of the mixing weights with feature 2 (Figures 7e-7g).We see a clear anti-correlation for scattering coefficients below and above 1 Hz for the east channel (Figure 7e): an amplitude increase above 1 Hz occurs together with an amplitude decrease below 1 Hz (e.g., the tremor-dominated periods in August and September 2015).Similarly, an amplitude increase below 1 Hz occurs together with an amplitude decrease above 1 Hz (e.g., October-December 2015).This anti-correlation can be already observed by the negative and positive weights of the mixing matrix (source 2, Figure 6).Weights with the same sign indicate the scattering coefficients which correlate with the corresponding independent source.The observed anti-correlation is nothing physical and this rank-1 matrix reflects only a part of the data without taking into account the other independent sources.Nonetheless, Figures 7e-7g suggest that the deep tremor activity in August and September 2015 is different from the other tremor episodes mainly due to different patterns at the east channel around 1 Hz.The reconstruction can indicate the underlying pattern changes in the seismogram and seems to be useful for ICA models with a low number of components.However, this becomes unfeasible for ICA models with a large number of components such as for M 50 .

Comparing M 50 Models of Multiple Seismic Stations
The M 50 model of station SV13 (Figure 5d) pictures a seismic time series with many pattern changes, indicating an ever-changing seismic wavefield.This seems surprising and it might be a particular characteristic of the data recorded close to the active Klyuchevskoy volcano.By retrieving M 50 models from different stations with an increasing distance to the volcano, we can verify this assumption.The considered stations-named SV13, IR12, IR18, SV7, OR18, and ESO in Figure 3-are located between 5 and 122 km away from the active volcano.
Figure 8 shows the corresponding M 50 models, revealing a diagonal line in the time feature space degrading with increasing distance to the volcano.This confirms our assumption that the large amount of pattern changes, represented by the diagonal line in the time-feature space, are only recorded in direct proximity to the active volcano.The activation of the whole plumbing system in December and the eruption period are characterized by dominant features even for stations further away such as SV7 and OR18.However, the large number of features makes it cumbersome to understand the pattern changes and relate them to physical processes in the volcano.
Manifold learning techniques such as UMAP might help to overcome this issue by capturing more information on fewer dimensions.

Seismogram Atlases of Individual Stations
We obtain seismogram atlases for the six station with UMAP and color-code the data points with their corresponding calendar time, highlighting the temporal evolution of the seismic time series (Figure 9).The seismogram atlases share the same hyperparameters: the minimum distance is set to 0.5 and the number of neighbors to 50.We tested different hyperparameters for the data of station SV13 (see Figure 10).The atlases depict different cluster shapes and distances with regard to the hyperparameters, emphasizing the trade-off between local and global structures in the atlas.However, all the examples confirm the smooth time gradient and little to no overlap of different periods.We opted for a minimum distance of 0.5 and 50 neighboring points, a decision primarily guided by visual assessment, much like the process commonly used in many other dimensionality reduction techniques.The chosen parameters seem to be a good choice for preserving local and global structures without resulting in too many disjoint clusters, as observed with 10 neighboring points.The seismogram atlases of SV13, IR18, and IR12 picture a variety of shapes with many linear and curved structures, where data points with different colors hardly overlap (Figures 9a-9c).Many data points with a similar color seem to be located close to each other, which gives rise to a smooth color gradient across the atlas.Therefore, neighboring data points in the atlas are likely neighboring data points in time, suggesting smooth and slow signal changes.However, there are also isolated or disconnected structures, indicating more sudden signal changes from time to time, especially for stations IR12 and IR18.The atlases of the data recorded at SV7, OR18, and ESO look different: there are less linear or curved structures and different colors overlap more often (Figures 9e and 9f).The two-dimensional seismogram atlases confirm our interpretation of the 50-dimensional ICA models: close to the active volcano, the seismograms witness many pattern changes, which are not visible by stations further away.

Connecting the Seismogram Atlas to Known Physical Processes
To provide a meaningful interpretation, we color-code the data points in the atlas with other physical parameters such as the TADR and the depth of the located tremors.Since the TADR time series are sampled irregularly, we interpolate the TADR data linearly, matching a TADR value with a data point in the atlas.We do that for the station closest to the volcano -SV13 -and for the station furthest from the volcano -ESO (see Figures 11a and  11b).For station SV13, we can identify the eruption activity (TADR >0.1 m 3 s 1 ) in the eastern area with multiple

Discussion
The seismogram atlases and the ICA features reveal a permanently evolving seismic wavefield with many pattern changes in the vicinity of the active Klyuchevskoy volcano.This suggests that the continuous seismograms from close stations contain relevant information about the dynamic processes occurring in the volcano plumbing system.We found particularly interesting the difference between the atlases obtained for stations located in the close vicinity of the active Klyuchevskoy volcano and those from distant stations.The latter look like "diffuse" clouds of points without distinctive structures.The former contains many well-defined "lineaments."We hypothesize that these "lineaments" in the seismogram atlases of nearby stations are associated with dynamic processes occurring within the volcano plumbing system.During such periods of "dynamic activity" the system evolution is characterized by certain "continuity" that is reflected in the "continuity" of the seismogram atlas "lineaments."In order to quantify this continuity, we use two parameters.We start with computing all vectors (steps) connecting two atlas points consecutive in time.Then, we compute the amplitudes of these steps.Small steps correspond to a more continuous atlas evolution.Therefore, we average the step amplitudes over N consecutive points and call the inverse of this value the "seismogram atlas continuity."Then we evaluate if consecutive steps follow a preferential direction.For this, we compute an Euclidean distance between points separated by N steps and compare it with the sum of amplitudes of these N steps.If all steps are perfectly aligned in the same direction these two quantities are perfectly equal.If the directions of steps are random, the distance between the first and last points is much smaller than the sum of step amplitudes.We compute the ratio between these two quantities and call it "seismogram atlas gradient continuity".Its value maximizes at 1 for a perfectly straight atlas "lineament".

Interpretation of the SV13 Seismogram Atlas
Our data analysis showed that station SV13, the closest station to the active Klyuchevskoy volcano, witnesses many pattern changes and we have seen correlations with other data sources such as the event catalogs and the TADR time series.In the following, we try to synthesize all data sources to identify meaningful areas within the SV13 seismogram atlas.Moreover, we visualize the spectrograms of some data points, showing characteristic examples of various atlas areas (Figures 12 and 13).In both Figures, we connect the data points in the atlas to known cataloged volcanic tremors from Journeau et al. (2022).The example spectrograms focus on tremor active periods and we connect the different data points with arrows, highlighting the temporal evolution of the tremor signals.
The SV13 seismogram atlas shows an interesting data landscape of continuous and isolated linear and curved structures with temporal smooth gradients, correlating with the occurrence of tremors (Figure 11c).The time series of atlas continuity and atlas gradient continuity for the three stations in the vicinity of the active volcano are depicted in Figures 14a-14c.As a comparison, Figure 14d shows the daily count of shallow tremors in blue (above 10 km depth) and deep tremors in red (below 10 km depth), highlighting periods of deep or shallow activity.Figure 15 provides an interpretation with the same labels for the SV13 seismogram atlas by connecting certain periods to volcanic activity.The interpretations of both Figures are based on the temporal evolution depicted in Figure 9a, the catalog association with example spectrograms shown in Figures 12 and 13, the TADR and tremor depth association in Figures 11a and 11c, and the Movie S1, highlighting the temporal evolution of the atlas.We also used information from the daily reports of the Kamchatka Branch of the Russian Geophysical Survey available at their website (in Russian): http://www.emsd.ru/˜ssl/monitoring/main.htm and publications about the Klyuchevskoy activity in 2015-2016 (Girina et al., 2019(Girina et al., , 2023)).In the following, we relate the pattern evolution of the continuous seismograms with distinct phases of the Klyuchevskoy volcano.From August to the end of September 2015, tremor activity occurs mainly at greater depth (marked with a blue 1 in Figure 14d).The same period corresponds to a connected point cloud with multiple linear and curved structures in the atlas (marked with a blue 1 in Figure 15).The spectrograms in that area picture narrow-banded continuous tremor signals centered around 1 Hz, which is different from the other tremor signals (blue framed spectrograms in Figure 12).This confirms our interpretation of the reconstructed scattering coefficients in Figure 7, indicating a signal difference around 1 Hz between the early deep tremor period and the tremors after the reactivation in December 2015.
At the end of September 2015, the deep tremor activity decreases and only a few isolated days denote tremor activity.Since there is no strong seismic activity in this period (marked with a gray 2 in Figure 14d), we can assume that it is mainly dominated by weak seismo-volcanic signals and ambient seismic noise.For the same period, the atlas depicts a gray point cloud below the deep tremor activity (marked with a gray 2 in Figure 15).Interestingly, this period is placed close to the deep tremor activity with overlapping structures.In fact, the gray linear structures reaching into the blue area are related to the deep tremor activity in October and November (see Movie S1).
After a period of quiescence, the tremor catalog shows a short phase of increased deep tremor activity at 4 December 2015, starting a longer period of increased tremor activity at various depths (Figure 14d).That deep reactivation of the plumbing system at 4 December 2015 is characterized by a jump in the atlas from the seismic noise area to a linear structure in the northern area (marked by a blue 3 in Figure 15), suggesting a sudden pattern change in the seismograms.A high continuity value on all three stations reflects the temporal-spatial disconnection in the seismogram atlas (blue 3 in Figures 14a-14c).After that date the atlas depicts a complex trajectory where we can identify shallow and deep tremor phases (marked with 4, 5, and 6 in Figures 14 and 15), indicating an ever-changing seismic wavefield with continuous pattern changes.In the same period, the spectrograms in Figure 12 indicate a transition from pure continuous signals to continuous signals with impulses.We hypothesize that the continuous pattern change of the tremor signals indicates a continuously changing plumbing system.However, we can not link the signal changes directly to physical processes.Either a shift in the sources generating tremor signals, alterations in the plumbing structure sensed by seismic waves, or potentially a combination of both may be occurring.The tremor catalog in Figure 14d indicates a deceasing tremor activity in February 2016, marked by neon-green 6 and gray 7.
From the deep reactivation in December 2015 until the quiet period in February 2016, the atlas depicts a continuous trajectory, ending close to the area of low seismo-volcanic activity in October/November 2015 (Figure 15).The purple-framed spectrograms of Figure 12 confirm the gradual decrease of continuous signals, which are characteristic of volcanic tremors.It is interesting to note that the linear trajectory in the atlas continues, even after the catalog does not show any tremor detections (see Movie S1).This suggests continuing and not cataloged weak tremor activity.Since station SV13 is located close to the center of tremor activity, this station likely records weak tremor activity, which does not generate a coherent signal across an array of stations.It is also interesting that the data points of the quiet period in February/March 2016 (marked with a gray 7) do barely mix with the data points of the October/November 2015 period (marked with a gray 2), indicating a different type of ambient seismic noise or different weak seismo-volcanic signals.We can exclude the oceanic microseisms or large-scale meteorological phenomena for this behavior since the center frequencies of the first-order wavelets do not cover frequencies below 0.78 Hz.Other studies have shown that the signal properties of the ambient seismic noise can change due to volcanic activity (Glynn & Konstantinou, 2016;Ichihara et al., 2023).
Toward the end of the quiet period, the data points move away from the noise area toward an area where we have mainly pre-eruptive shallow tremors (yellow 9 in Figure 15 and Movie S1), indicating slowly emerging and not cataloged tremor signals.This slow transition is interrupted by a deep reactivation starting on 17 March 2016, indicated by a sudden increase of deep tremors after a period of quiescence (Figure 14d).During that period, the atlas depicts a jump in the atlas to the curved purple structure marked with an 8.In fact, the deep reactivation is characterized by a spatial-temporal disconnection in the atlas for the three stations close to the volcano, suggesting a sudden pattern change on the three stations similar to the 4 December 2015 reactivation (Figure 14).The deep reactivation is characterized by pure broadband continuous signals compared to the times before and after (blueframed spectrograms in Figure 13).After the sudden pattern changes, the SV13 atlas indicates a more continuous pattern change until the start of the eruption (yellow 9 and magenta 10 in Figure 15).The spectrograms of the same period show a continuous transition from repeating impulsive signals to more continuous tremor signals (orange-framed spectrograms in Figure 13).After the eruption started on 4 April 2016 (Girina et al., 2019(Girina et al., , 2023)), the spectrograms show mainly continuous tremor signals (green-and red-framed spectrograms in Figure 13).The April/May 2016 co-eruptive tremor period is located on the far left in the atlas and a neighboring linear structure characterizes a period of shallow tremor activity with an increase in the lava discharge rate occurring at the end of May 2016.After this event, the co-eruptive tremor signals build a point cloud with no linear structures (light-pink 14 in Figure 15), indicating no continuous pattern evolution for a longer time.This behavior is very different from the previous tremor-dominated periods, characterized by mainly linear or curved structures and continuous or sudden pattern changes.During that time in June 2016, most tremor sources are at a shallow depth (Figure 14d).The significant increase of the atlas continuity and gradient continuity seen at all three stations at the beginning of July 2016 (Figure 14) marks a sudden pattern change before the strong explosion that took place on 7 July 2016 at the crater of the active Klyuchevskoy volcano.On this date, an ash column reached 10 km in altitude according to KBGS and the ash cloud traveled more than 400 km (Girina et al., 2019).The atlas denotes a jump back to the eruption area around 4 July 2016, indicating similar types of signals before the explosion (red-yellow 15 in Figure 15).
The seismogram atlas of station SV13 shows that the seismograms follow a complex signal evolution with many pattern changes due to volcanic activity and a changing plumbing system.The atlas shows no overlap of the various tremor periods, indicating changing tremor signal patterns throughout the recording time.We see both smooth and sudden transitions between different activities, reflecting smooth and sudden regime transitions of the volcanic system.The interpretation shows that UMAP preserves global and local structures.Globally, we can identify seismograms with continuous signal characteristics in the outer circle of the atlas and seismograms with impulsive characteristics in an area inside the circle.Locally, we can identify certain periods of volcanic activity such as shallow or deep tremor activity, and see continuous or sudden changes between these different periods.In some areas we have continuous transitions between continuous and impulsive signals, indicating seismograms containing both types of signal characteristics.Journeau et al. (2022) observed a similar signal separation and transition within the variables space obtained from the network's covariance matrix.In our case, the second-order scattering coefficients contain information about impulsive and continuous signal characteristics and, therefore, the separation of seismograms according to these characteristics is reasonable.Our findings show that the majority of the recording time is dominated by tremor signals.This agrees with the findings of Makus et al. (2023) where their reference correlation function for the same data set is dominated by tremor activity.

Conclusion
With data-driven features and seismogram atlases, we have analyzed the signal content of continuous seismograms recorded during the KISS experiment close to the active Klyuchevskoy volcano.A scattering network transformed the continuous seismograms into a stable data representation (scattering coefficient matrix) for exploratory data analysis.With an ICA, we extracted features describing data-driven signal characteristics of the seismogram.These features have shown a continuously evolving seismogram with many pattern changes, in particular for stations a few kilometers away from the active Klyuchevskoy volcano.A larger number of features was necessary to observe this behavior.Simultaneously, the large number made the interpretation and analysis more difficult.Therefore, we utilized a non-linear dimensionality reduction technique called UMAP to create a two-dimensional representation of the scattering coefficient matrix, which we call a seismogram atlas.In the seismogram atlas, we find various structures with closeby data points representing similar seismograms and distant data points representing dissimilar seismograms.The atlases offer a visual tool to analyze long seismic time series and they confirmed our observation of an ever-changing wavefield.We were able to relate certain atlas areas, representing similar signal characteristics, to different volcanic activities and changes in the volcanic plumbing system.While the seismic wavefield seems to change throughout the recording time, the atlas helped us to identify sudden and continuous pattern changes.Deep reactivations are characterized by sudden pattern changes and tremor depth changes are mainly characterized by continuous pattern changes.We also want to emphasize that each tremor period seems to have its own signal characteristics, resulting in distinct structures in the seismogram atlas.

Figure 1 .
Figure 1.Comparison between spectral and scattering coefficients of a seismic signal: panel (a) shows an example seismogram with normalized amplitude in time domain; panel (b) shows its corresponding Fourier spectrogram; panel (c) shows the Fourier amplitude spectrum and the first-order median and maximum pooled scattering coefficients of the signal shown in (a); panel (d) shows the second-order maximum pooled scattering coefficients and panel (e) shows the second-order median pooled scattering coefficients as a function of the wavelet's center frequencies f 1 and f 2 .

Figure 2 .
Figure 2. Detailed view of a two-layered scattering network applied to continuous three-component seismograms with a sliding window.The dashed line in the firstorder scalogram indicates that each first-order scale is convolved with the second-layer wavelet banks, creating multiple second-order scalograms.The final scattering coefficient matrix contains the first-and second-order scattering coefficients from the three-component seismogram.
depicts the time-depth evolution of the catalog, showing periods of shallow and deep activity.Before December 2015, the tremors are mainly located in the deeper part of the Klyuchevskoy plumbing system with periods of increased tremor activity in August and September 2015.The whole plumbing system becomes active after 4 December 2015 with shallow and deep tremor sources.After a short absence of tremors, they restart in March 2016 and last until the end of the cataloged time period.All scattering coefficient amplitudes are elevated during the presence of tremors (Figures 4a-

Figure 3 .
Figure 3. Map of the Klyuchevskoy volcano group with the seismic stations (SV13, IR18, IR12, SV7, OR18, and ESO) considered in this study, shown with white triangles.The orange triangle shows the location of the Klyuchevskoy volcano.The averaged spatial density of the tremor source location according to Journeau et al. (2022) is shown with a colormap.Black circles and purple crosses indicate hypocenters of individual detections of tremors and deep long-period earthquakes (DLP), respectively.

Figure 4 .
Figure 4. Time series of SV13 scattering coefficients and other data sources: panel (a) shows the time series of first-order scattering coefficients of the east channel of SV13; panel (b) shows the time series of second-order scattering coefficients of the east channel of SV13.The y-axis represents the center frequency of the first-order wavelets f 1 and the center frequency of the second-order wavelets f 2 lies between two f 1 values (details in Section 3.1).However, due to reasons of clarity, we do not show the f 2 ticks.Panel (c) show the localized tremors from Journeau et al. (2022) as a function of calendar time and depth; panel (d) shows the TADR time series from Coppola et al. (2021) as black dots and the daily activity level from the KBGS as color-coded dots.Green corresponds to low volcanic activity, yellow corresponds to medium volcanic activity, orange corresponds to ongoing eruption, and red corresponds to an explosion on 7 July 2016.

Figure 5 .
Figure 5. Hierarchical Independent Component Analysis (ICA) for station SV13: in panel (a) the gray histogram describes the daily number of localized tremors based on Journeau et al. (2022) and the colored circles indicate the daily activity level of the Klyuchevskoy volcano, where green represents low activity, yellow represents medium activity, orange represents an ongoing eruption, and red represents an explosion on 7 July 2016.The black dots correspond to the time-averaged lava discharge rate (TADR) from Coppola et al. (2021).By applying multiple ICA models to the SV13 scattering coefficient matrix, we retrieve time histories of the independent sources (features), describing signal patterns in the seismogram.Panel (b) shows the features of the 4-component model M 4 , panel (c) shows the features of the 12component model M 12 , and panel (d) shows the features of the 50-component model M 50 .Note that the features were sorted with respect to their absolute maximum value in time for better visualization.

Figure 6 .
Figure 6.Mixing weights for the M 4 model for its four independent sources at station SV13.The matrix multiplication of the mixing weights with the M 4 features in Figure 5b estimates the scattering coefficient matrix (see Equation7).For visualization purposes, we reshaped the mixing matrix to display the weights related to the first-order coefficients in the left column, and the weights related to the second-order coefficients are split into three different subplots according to the seismometer's component.

Figure 7 .
Figure 7. Reconstructed first-order scattering coefficients based on the outer product of an M 4 feature with its mixing weights for station SV13: panel (a) shows the localized tremors; panels (b), (c), and (d) show the reconstruction of the first-order coefficients of the east, north, and vertical channel, based on feature 3 (Figure5b); panels (e), (f), and (g) show the reconstruction of the first-order coefficients of the east, north, and vertical channel, based on feature 2 (Figure5b).

Figure 8 .
Figure 8.By applying a 50-component Independent Component Analysis model to the scattering coefficient matrices, we retrieve M 50 features for the continuous seismograms recorded at station SV13 (a), IR18 (b), IR12 (c), SV7 (d), OR18 (e) and ESO (f).Each feature represents a signal pattern, which dominates when the color is saturated.The results are ordered according to the distance to the active Klyuchevskoy volcano mentioned in the title of the subfigures.Panel (g) shows the number of localized tremors per day as a gray histogram, the time-averaged lava discharge rate (TADR) from Coppola et al. (2021) as black dots and the daily activity level from the KBGS as color-coded dots.Green corresponds to low volcanic activity, yellow corresponds to medium volcanic activity, orange corresponds to ongoing eruption, and red corresponds to an explosion on 7 July 2016.

Figure 9 .
Figure 9.By applying the manifold learning technique UMAP to the scattering coefficient matrices, we obtain individual seismogram atlases for the continuous seismograms recorded at station (a) SV13, (b) IR18, (c) IR12, (d) SV7, (e) OR18, and (f) ESO.Each data point in the two-dimensional representation corresponds to 20 min three-component seismograms and the color-code reflects the calendar starting time of the 20 min window.The distance in the atlas reflects the similarity of the three-component seismograms.The panels are ordered according to the distance to the active Klyuchevskoy volcano mentioned in their titles.

Figure 10 .
Figure 10.Seismogram atlases obtained with changing UMAP hyperparameters for the data recorded at SV13.Each data point in the two-dimensional representation corresponds to 20 min three-component seismograms and the color-code reflects the calendar starting time of the 20 min window.The results shown in Figure 9a correspond to min_dist = 0.5 and n_neighbors = 50.

Figure 11 .
Figure 11.Seismogram atlases color-coded with interpolated time-averaged lava discharge rate (TADR) and localized tremors for two stations.Seismogram atlas with interpolated TADR for station SV13 in (a) and ESO in (b), highlighting areas in the atlas with strong eruptive activity.Seismogram atlas with shallow and deep tremors for station SV13 in (c) and ESO in (d), highlighting areas with rather shallow or deep volcanic activity.We color-code a data point with tremor activity in case they matched in time.Deep and shallow tremors are separated at 10 km depth.

Figure 12 .
Figure 12.The seismogram atlas with example spectrograms of tremor signals from the period between August 2015 and February 2016.The upper image shows the SV13 seismogram atlas.Data points matching in time with localized tremors are black and data points with weak or unknown signals are gray.Each data point represents 20 min of three-component seismograms.For some data points (marked with the colored arrows) we visualized the east-component spectrograms with their calendar date on the top of the subfigure.The color-coding of the arrows matches the color-coding of the spectrogram's frame and the arrows point toward the next data point in time.

Figure 13 .
Figure 13.The seismogram atlas with example spectrograms of tremor signals from the period between February 2016 and July 2016.The upper image shows the SV13 seismogram atlas.Data points matching in time with localized tremors are black and data points with weak or unknown signals are gray.Each data point represents 20 min of three-component seismograms.For some data points (marked with the colored arrows) we visualized the east-component spectrograms with their calendar date on the top of the subfigure.The color-coding of the arrows matches the color-coding of the spectrogram's frame and the arrows point toward the next data point in time.

Figure 14 .
Figure 14.Time series of the atlas continuity and atlas gradient continuity of the three stations (a) SV13, (b) IR18, and (c) IR12.The continuity has been averaged over 250 consecutive steps (1.74 days).The gradient continuity has been averaged over 500 steps (3.47 days).Signal RMS amplitudes (averaged over 250 steps and normalized) are shown with light gray areas, for reference.Panel (d) shows the tremor count per day.Deep (below 10 km) and shallow (above 10 km) tremors are shown with blue and red shaded areas, respectively.The black continuous line shows the difference between the deep and shallow tremor counts.Colored numbers indicate different events and episodes of the activity of the Klychevskoy volcano plumbing system.The vertical dashed lines correspond to specific events and the horizontal numbered bars correspond to specific periods in the interpreted SV13 atlas in Figure 15.

Figure 15 .
Figure 15.Synthesis of the results: interpreted seismogram atlas of SV13 with identification of volcanic activity and its relation to the plumbing system of the Klyuchevskoy volcano.
Without going into further details, the inner workings of UMAP are based on topological data analysis and Riemannian Geometry, providing a complex but safe and sound mathematical background (see the original work of McInnes et al., 2018, for more details).It shares similarities with the t-distributed Stochastic Neighbor Embedding (t-SNE), which has been used extensively for visualizations since its appearance in the 2000s (Van der Maaten