GRAPES: Earthquake Early Warning by Passing Seismic Vectors Through the Grapevine

Estimating an earthquake's magnitude and location may not be necessary to predict shaking in real time; instead, wavefield‐based approaches predict shaking with few assumptions about the seismic source. Here, we introduce GRAph Prediction of Earthquake Shaking (GRAPES), a deep learning model trained to characterize and propagate earthquake shaking across a seismic network. We show that GRAPES’ internal activations, which we call “seismic vectors”, correspond to the arrival of distinct seismic phases. GRAPES builds upon recent deep learning models applied to earthquake early warning by allowing for continuous ground motion prediction with seismic networks of all sizes. While trained on earthquakes recorded in Japan, we show that GRAPES, without modification, outperforms the ShakeAlert earthquake early warning system on the 2019 M7.1 Ridgecrest, CA earthquake.


Introduction
The competing goals of accuracy and speed in earthquake early warning (EEW) obey an uncertainty principle: initial shaking alerts will be inherently inaccurate due to the constraint of using limited data from the start of rupture, whereas accurate ground motion predictions, based on longer records of real-time waveform data and more sophisticated rupture models, arrive after they are needed (Minson et al., 2018;Wald, 2020).Traditionally, EEW systems first estimate an earthquake's magnitude and location and then transform that information into a prediction of future shaking (Allen & Melgar, 2019).An EEW system does not necessarily need to locate or determine an earthquake's magnitude to predict ground shaking in real time (Hoshiba, 2013).Instead, wavefieldbased approaches use the current seismic wavefield to predict future shaking.This type of approach redirects the focus of ground shaking prediction from earthquake source physics to seismic wave propagation but still requires assumptions about attenuation (Kodera et al., 2018) or scattering physics (Hoshiba & Aoki, 2015).
Here, we introduce a DL model to continuously predict ground motions across a seismic network.We build upon previous work using Graph Neural Networks (GNN) (van den Ende & Ampuero, 2020; Bloemheuvel et al., 2022) and transformer models (Münchmeyer et al., 2021a(Münchmeyer et al., , 2021b) ) for EEW.Our algorithm, GRAph Prediction of Earthquake Shaking (GRAPES; Clements (2023b)), predicts future earthquake shaking using the previous 4-s of acceleration waveforms across a seismic network (Figure 1).At the heart of GRAPES is a GNN, which propagates high-dimensional representations of seismic waveforms, which we call "seismic vectors", across a seismic station graph (Figure 1c), and empirically learns the intrinsics of seismic wave propagation, attenuation, and scattering.

Data and Methods
We interpret a seismic network as a graph of connected seismometers embedded in the continuous seismic wavefield, where the graph's nodes are co-located with seismometers, the graph's edges are connections between nearby seismometers, and the graph's features are seismic waveforms (McBrearty & Beroza, 2023).We create seismic station graphs by virtually connecting each seismometer to its 20 nearest neighboring seismometers and to all seismometers within 30 km (Kodera et al., 2018).We additionally add 1 random, long-range connection to each station, where the probability of adding a long-range connection at a distance R away scales as P(R) = R 2 (Kleinberg, 2000).Such "small-world" shortcuts increases signal propagation distance across a seismic station graph (Figure S1 in Supporting Information S1) (Watts & Strogatz, 1998).
We characterize EEW as a spatio-temporal graph learning problem.We create a data set of seismic station graphs from 3,759 earthquakes with magnitudes between 3 and 9.1, recorded by the K-NET and KiK-net networks (Aoi et al., 2011) in Japan from 1997 to 2018 (Figures S2-S4 in Supporting Information S1).Our data set is a subset of Münchmeyer et al. (2021a)'s data set: we selected earthquakes that had at least 30 stations recording a P-wave arrival and only used surface stations.For each earthquake, we created overlapping seismic station graphs starting with the P-wave arrival at nearby stations and ending 5.0 s after the surface waves arrived at the furthest stations.We chose random subsets of 30-100 stations for each timestep and used a 4-s input window to balance computational efficiency and capture longer-period ground motion (text S1-S3 in Supporting Information S1).The use of variable subsets of stations for each time step makes the approach flexible to station distributions and station availability.This resulted in 59,991 unique seismic station graphs.
We create an additional 14,992, or 20% of total, station graphs that contain solely ambient noise data with the intention that training on ambient noise would allow GRAPES to predict ground motion continuously in a production setting.For our set of noise station graphs, we sample 4-s windows of ambient noise recorded before the P-wave arrival from earthquakes rejected by our selection criteria above (Text S4 in Supporting Information S1).We split the combined earthquake plus noise station graph data set into a training, validation, and test set by time to mimic how operational EEW systems learn and adjust after subsequent earthquakes.The training, validation and test set spans earthquakes and noise samples from 1997 to 2014, 2014-2017, and 2017-2018 and account for 70%, 20%, and 10% of the data set, respectively.GRAPES predicts future Peak Ground Acceleration (PGA) across a seismic network using continuous (i.e., updated each second) seismic station graphs as input.GRAPES contains 4 sequential neural networks: a convolutional neural network (CNN) for extracting time-frequency features from input waveforms, a dense neural network (DNN) for encoding time-frequency and amplitude features into a "seismic vector", a GNN for virtually propagating seismic vectors across a seismic station graph, and a final DNN for decoding aggregated seismic vectors into a future PGA prediction at each station.
At first order, the CNN distinguishes if the normalized waveform input at a particular station is a P wave, S wave or other type of wave, for example, ambient noise, P-wave coda or surface wave (Movie S1).The DNN then combines this phase information with the current log 10 (peak amplitude) at each station (Münchmeyer et al., 2021a) into a 128-dimensional "seismic vector".
GRAPES uses a set of 5 graph convolution layers in its GNN to propagate station-level seismic vectors across the seismic station graph (Z.Wu et al., 2020).These graph layers work like a game of seismic telephone: survey your neighbors' seismic vectors, take the features with the highest activations, and pass those features to your neighbors GRAPES predicts ground motion using a DNN, as opposed to traditional Ground Motion Models (GMMs), which use a set of task-specific features, such as distance from the fault and earthquake magnitude, to predict earthquake shaking (Joyner & Boore, 1981).GRAPES' DNN has an easier task than most GMMs: predict future shaking at each station using a combination of amplitude and phase features at nearby stations (Figure 1d; Figure S5 in Supporting Information S1).
We train GRAPES in a supervised fashion: given the current state of the seismic station graph, the model predicts the logarithm of the PGA at each station over the next 40 s.We train GRAPES end-to-end to minimize the mean squared error between recorded and predicted log 10 acceleration across the network (Text S6; Figure S6 in Supporting Information S1).End-to-end training forces each parameter of the network to optimize for predicting ground motion rather than intermediate tasks such as determining an earthquake's magnitude or location as in traditional EEW systems.We evaluate GRAPES's predictive performance using the real-time seismic intensity measure (Ir), which approximate the Japanese Meteorological Agency's (JMA) seismic intensity (I JMA ) (Kodera et al., 2018).

Results
While we trained GRAPES solely to predict ground motion, GRAPES's station-level neural network (Figure 1b) developed an internal vector space of seismic embeddings with a nuanced relationship to the seismic wavefield.Ambient noise, body waves (P waves and S waves), the P-wave coda, surface waves and later arriving coda waves activate similar sets of features across distinct earthquakes and recording locations (Figures 2a-2c, Figures S7-S9 in Supporting Information S1).Some features are purely phase-based (Figure 2b), whereas others solely encode amplitude (Figure 2c).About half of the 128 features in GRAPES's seismic vector space activate on either preevent ambient noise, P-wave coda, or surface wave coda, rather than on direct P or S waves.We infer that distinguishing noise from coda is important for GRAPES to intuit whether peak ground motion has already occurred at a particular location.
We illustrate the tight linkage between input waveforms, seismic vectors, and shaking predictions with a M6.1 earthquake from the test set (Figure 3).The P wave arrives 3 s after origin time at the closest station to the epicenter (SMN006, 8 km from epicenter; Figures 2a-2d; Figure 3).Within a half a second of the P-wave arrival, SMN006's seismic vector switches from an ambient noise to earthquake state.Even though GRAPES's intensity prediction (I pred ; Figure 2e, purple curve) quickly predicts the peak intensity (I max ; Figure 2e, dashed horizontal line), if GRAPES had been operating as an EEW system during this earthquake, SMN006 would have received little to no advance warning, as I pred goes above Ir 5, the intensity level at which JMA provides alerts (Hoshiba et al., 2008), only two seconds before that threshold is exceeded (Figure 3).This exemplifies the limitation of EEW for shallow crustal earthquakes: areas of strongest shaking may not receive a warning before strong shaking occurs (Minson et al., 2018;Wald, 2020).At station HRS008 (66 km from epicenter, Ir 3.7), the warning time exceeds 10 s (Figure 2j).
We test GRAPES's timeliness and accuracy on a test set of 641 mostly crustal earthquakes recorded in Japan from 2017 to 2018 (NIED, 2019).The average magnitude of the test set is M4.3, while the largest event is the 2018 M6.7 Hokkaido Eastern Iburi earthquake (Figure S3 in Supporting Information S1).We evaluate GRAPES's performance using the warning time framework of Meier et al. (2020) using a warning threshold of Ir 3.0, which is roughly equivalent to Modified Mercalli Intensity (MMI) 4.5.We calculate a maximum possible warning time for each station as the time between when GRAPES predicts shaking above the threshold and ground motion above that threshold occurs.We include 1 s of processing time in our warning time estimates but do not account for alert delivery times (McBride et al., 2023).We compare GRAPES's performance to the Propagation of Local Undamped Motion (PLUM) algorithm (Kodera et al., 2018), a ground motion-based EEW algorithm currently in use in Japan (Kodera et al., 2020), and to a hypothetical and unrealistic (Minson et al., 2018) "perfect" algorithm, which exactly predicts shaking at all locations at the instant rupture begins (Figures 4a-4c).We use the perfect algorithm as an overly optimistic estimate of the maximum available warning time, which for the test set falls between the P-wave and surface wave travel times.For instance, within 20 km from the hypocenter, there is a minimum of 4 and maximum of 10 s of hypothetically possible warning time (Figure 4a).For Ir 3.0 shaking, the perfect algorithm warns 50% of locations at least 19 s before shaking arrives, whereas GRAPES and PLUM warn 50% of locations with at least 7 and 3 s of warning, respectively (inset Figure 4a).Within a single earthquake, though, GRAPES's percent of maximum possible warning increases with hypocentral distance out to 100 km (Figure 4d).GRAPES generally overpredicts ground motions below Ir 3, which fall below the warning threshold,  I max ), where I max is the maximum Ir at each station throughout the earthquake.
but is unbiased at higher ground motions after site amplification factors are removed (Figure 4e, Text S7 in Supporting Information S1).We additionally compare GRAPES and PLUM's alerting accuracy using precision and recall (Figures 4b and 4c).For the test set, GRAPES and PLUM achieve a precision of 0.48 and 0.21 and recall of 0.78 and 0.88, respectively, on a station-wise basis.This suggests GRAPES is less prone to false alerts but may miss slightly more alerts than PLUM.For smaller events (M < 6), GRAPES takes on average less than 2 s to achieve within 1 Ir unit of prediction accuracy (Figure S10 in Supporting Information S1).This is more than half the rupture duration (3 s) of a M6 earthquake (Trugman et al., 2019).For the 2018 M6.7 Hokkaido event, which had an estimated rupture duration of 15 s (Kobayashi et al., 2019), GRAPES's intensity predictions grow with the earthquake, taking 5 s to predict the shaking to within 1 Ir intensity unit.
To demonstrate that GRAPES generalizes well to different areas, we compare GRAPES, without any modifications, to the U.S. Geological Survey's ShakeAlert EEW system on the 2019 M7.1 Ridgecrest, CA earthquake (Figure 5).GRAPES works without modification with Southern California waveform data because while its time (4-s waveform) and channel (3-component) input dimensions are fixed, its graph network can accept any station geometry.ShakeAlert, which was operational at the time, estimated a M5.5, 1.6 magnitude units less than the true magnitude, 6.9 s after the origin time, and grew to M6.3, 0.8 units less than the true magnitude at 22 s after the origin time.ShakeAlert's underestimate of the magnitude led to underpredicted ground motion intensities in real time.Consequentially, ShakeAlert did not send EEW alerts to Los Angeles County, even though the system criteria then called for an alert to be sent at a predicted shaking level of 25 cm/s 2 or MMI 4 (Chung et al., 2020).In our retrospective test, which considered real-time data transmission latency (Stubailo et al., 2021) but not processing or alert delivery times (Text S7 in Supporting Information S1), the P wave arrived at the first operational seismometer 4 s after origin time.GRAPES could have predicted MMI 4+ shaking in portions of LA county within 6 s of origin time and at a majority of locations in LA county within 9 s of origin time, at which time GRAPES's average error across the seismic network was 0.1 MMI units (Figure 5).This highlights that the model generalized well to a new region with different station distributions than the original training set.
Our retrospective analysis suggests that GRAPES can rapidly and accurately predict peak ground motions across a region through a basic understanding of the seismic wavefield.However, behavior during small magnitude (M < 3) earthquakes and problematic data (e.g., noise spikes) is unknown, as these data were not included in training and testing; thus, additional analyses may be necessary prior to implementation in real time.We trained GRAPES to predict PGA, which is useful for human perception of shaking, but not peak ground velocity, which is important for predicting building damage (Y.-M.Wu et al., 2003), or peak ground displacement from Global Navigation Satellite Systems (Goldberg et al., 2021), which does not saturate.We did not optimize GRAPES' preprocessing for computational efficiency but it can predict shaking for a few thousand stations in less than a second on a graphical processing unit (Figure S11 in Supporting Information S1).Finally, GRAPES has not been tested on any M > 8 earthquakes, but predicted Ir 5+ within 5 s of origin time on a retrospective test of the 2024 M7.6 Noto Peninsula, Japan earthquake (Figure S12 in Supporting Information S1).

Conclusions
We provide evidence that GRAPES, a deep learning model trained on full earthquake waveforms, from ambient noise to coda waves, encodes seismic phases in its seismic vector space.We show how GRAPES' seismic vector representations are linked both spatially and temporally to shaking predictions.With a P-wave arrival at a nearby station, GRAPES can rapidly and accurately predict future ground motion intensities at farther stations in the seismic network, aided by long-range connections embedded in its graph neural network.Further, we find that GRAPES learned to use not only P waves and S waves, but additionally P-wave coda, surface waves, and ambient noise to understand the seismic wavefield.GRAPES' performance on a geographically out-of-distribution test event, the 2019 Ridgecrest, CA earthquake, suggests that its seismic vector space generalized rather than memorized the training data.
Writing -review & editing: T. Clements, E. S. Cochran, A. Baltay, S. E. Minson, C. E. Yoon and neighbors' neighbors (Text S5 in Supporting Information S1).As shown in Figure1c, GRAPES's GNN passes seismic vectors from a local to regional scale.

Figure 1 .
Figure 1.The Graph Prediction of Earthquake Shaking (GRAPES) model (a) GRAPES takes in acceleration waveforms across a seismic network in real time.(b) Station-level network extracts seismic features vectors from three-component waveforms.(c) Graph neural network propagates seismic vectors across seismic station graph.Features are shared within k = 5 hop neighborhood.(d) Future shaking predicted in real time at each station using deep neural network.

Figure 2 .
Figure 2. Internal representations of the seismic wavefield.(A-C) Activations of station-level network's features 1, 12, and 76, respectively, on 9 April 2018 M6.1 earthquake in Shimane/Hiroshima Prefectures, Japan (JMA event ID 2018040901323081; NIED (2019)).Seismograms are plotted with increasing epicentral distance and scaled to unit amplitude.Feature activations in (A-D) and (F-I) are scaled between 0 and 1. (D) East-west acceleration waveform and station-level feature vector representation at nearest station (8 km) from earthquake.Peak ground acceleration (PGA) is 504 cm/s 2 .Dashed vertical lines indicate arrival time of P-wave, S-wave, and PGA, respectively.(E) GRAPES' predicted future (purple), observed real-time (solid gray), and peak observed (dashed horizontal gray; I max ) intensities 8 km from epicenter.(F-H) Activation of graph-level network's features 2, 46, and 27, respectively, on same earthquake.Dashed gray vertical line indicates time of P-wave arrival at station closest to the epicenter.(I) East-west acceleration waveform and graph-level feature vector representation of regional wavefield at station 66 km from same earthquake as in (D).Dashed vertical lines indicate arrival time of P-wave, S-wave, and PGA at stations 8 and 66 km, respectively, from the epicenter.(J) GRAPES' predicted future (purple), observed real-time (gray), and peak observed (dashed gray; I max ) intensities 66 km from the epicenter.

Figure 3 .
Figure 3. Shaking predictions 3 6 s after origin time for earthquake in test set.(a) Shaking intensity map of the 8 April 2018 M6.1 earthquake in Shimane/Hiroshima Prefectures, Japan (JMA event ID 2018040901323081; NIED (2019)).(b) Example seismic station graph for this earthquake.Stations are circles colored by the number of hops (connections shown by lines between stations) from the nearest station (dark blue circle) to the epicenter.An arrow points to a small-world connection (green circle, bottom) located only one hop away, despite the longer distance from the nearest station.(c) Intensity observations (Ir) within 100 km of epicenter.Closed circles are locations of seismic sensors, colored by Ir.Black and blue dotted lines indicate approximate location of P-wave and S-wave, respectively, at each time step.(d) GRAPES real-time intensity predictions (I pred ).(e) GRAPES intensity prediction error.Error value at bottom right is mean prediction error at that timestep across entire seismic network, where the prediction error at a single station is given by (I pred (t) I max ), where I max is the maximum Ir at each station throughout the earthquake.

Figure 4 .
Figure 4. Comparison of warning times on test set.(a)-(c) Maximum possible warning times for hypothetical "perfect" algorithm (Minson et al., 2018), GRAPES, and Propagation of Local Undamped Motion (PLUM) algorithm (Kodera et al., 2018) using a warning threshold of seismic intensity 3.0.Inset of (a) shows empirical cumulative distribution functions (CDF) of warning times for seismic intensity 3.0 for each algorithm.Precision and recall values for GRAPES and PLUM alerts are shown in (b) and (c), respectively.(d) GRAPES and PLUM's percentage of maximum warning times for two largest earthquakes in test set.(e) Comparison of GRAPES predicted and observed intensities on the test set at station level using warning threshold of seismic intensity 3.0.Site amplification factors have been removed from GRAPES predictions(Kodera et al., 2018).Dashed line is 1:1 line for reference.

Figure 5 .
Figure 5.Comparison of GRAPES and ShakeAlert (Chung et al., 2020) predictions for the 2019 M7.1 Ridgecrest, CA earthquake from t = 4, 6, 8, and 10 s after origin time.(a) Observed Modified Mercalli intensity (MMI) near the epicenter.Closed circles are locations of seismic sensors, colored by MMI.Black and blue dotted lines indicate approximate location of P-wave and S-wave, respectively, at each time step.Seismic sensors with >1 s of real-time data transmission latency are removed from input (Stubailo et al., 2021).(b) GRAPES MMI predictions (MMI pred ).(c) ShakeAlert MMI predictions.Error value at bottom left in (b) and (c) is calculated in similar fashion to Figure 3 using MMI.