Modern Seismic Monitoring Systems: How ML and the Cloud Build Earthquake Catalogs

Utpal Kumar   9 minute read      

A tour of how modern earthquake-monitoring systems turn continuous seismic waveforms into earthquake catalogs — the ML pipeline of picking, association, and location, and the cloud systems (QuakeFlow, LOC-FLOW, and more) that run it at scale.

Every day, seismic networks stream terabytes of ground-motion data from thousands of stations. Buried in that continuous hum are earthquakes — most of them tiny, far more of them than any analyst could ever pick by hand. A seismic monitoring system is the machinery that turns those raw waveforms into an earthquake catalog: a list of events with their times, locations, and magnitudes. Over the last few years, machine learning and cloud computing have rebuilt this machinery from the inside out, and the payoff is dramatic — modern systems routinely find an order of magnitude more events than traditional catalogs [1][2].

The one mental model

A monitoring system is a pipeline, not a single algorithm:

continuous waveforms → pick P/S arrivals → associate picks into events → locate (and relocate) → catalog.

Machine learning has supercharged the first two stages, and the cloud lets the whole pipeline scale to years of data across thousands of stations.

Why this matters

Traditional catalogs are built with hand-tuned detectors (like STA/LTA) and human analysts, so they miss the smallest events. That matters because small earthquakes illuminate fault structure, aftershock behavior, induced seismicity, and volcanic plumbing. When Zhu and colleagues reprocessed three years of data from Puerto Rico with their cloud workflow QuakeFlow, they found more than ten times the events in the standard catalog — and in Hawaii the extra events revealed the deep structure of the magmatic system [1]. This “more complete catalog” story now repeats across the literature [2][3].

Stage 1 — Phase picking: finding the P and S arrivals

The first job is to scan each station’s waveform and mark when the P-wave and S-wave arrive. Classic pickers used simple energy ratios; deep-learning pickers learn the shape of an arrival from millions of analyst-labeled examples.

Two models dominate the field:

  • PhaseNet — a U-Net that reads three-component waveforms and outputs probability curves for P arrival, S arrival, and noise. It was trained on 30+ years of Northern California data and beats classic methods on both accuracy and recall [4].
  • EQTransformer — an attention-based network that does detection and P/S picking simultaneously, which improves both tasks by sharing information across the full waveform [5].

Newer entrants keep pushing accuracy and deployability — a compact transformer, EQCCT, reports production-ready performance in the Texas network [6] — but PhaseNet and EQTransformer remain the workhorses, and independent benchmarks confirm they (with GPD) are the strongest general-purpose pickers [7].

Check your understanding

What does a phase picker like PhaseNet or EQTransformer actually output?

Stage 2 — Phase association: grouping picks into events

A picker running on a 200-station network produces a blizzard of arrivals — but which picks belong to the same earthquake? That is phase association: link picks across stations whose arrival times and amplitudes are consistent with one source. It sounds easy until an aftershock sequence produces many events per minute that overlap in time.

There is a whole zoo of associators, and picking the right one matters:

  • GaMMA treats association as unsupervised clustering with a Gaussian Mixture Model, jointly estimating each event’s location, time, and magnitude — no grid search, no training [8]. It’s the associator inside QuakeFlow.
  • PhaseLink learns to link picks with a neural network trained on synthetic arrival-time sequences [9].
  • GENIE uses a graph neural network over the station and source geometry, and re-detects ~96% of USGS events while finding ~4× more over a 100-day test [10].
  • PyOcto takes a classic 4D space–time partitioning approach and is at least 10× faster than other associators while matching or beating their sensitivity [11].
Check your understanding

Why is association the hard step during a dense aftershock sequence?

Stage 3 — Location, magnitude, and relocation

Once picks are grouped, classic physics-based tools take over: an initial hypocenter and origin time from packages like HYPOINVERSE, VELEST, or NonLinLoc, then high-precision relocation with double-difference methods such as hypoDD or GrowClust to sharpen fault geometry [12]. Magnitude usually comes from the amplitudes the associator already collected. The trend now is to fold picking, association, and even polarity into a single multitask model and hand the clean measurements to these physics-based locators [13][14].

Deep dive: the full modern pipeline, stage by stage

A representative end-to-end workflow (this is essentially what LOC-FLOW assembles) looks like:

  1. Pick — PhaseNet (or STA/LTA) on continuous waveforms.
  2. Associate — REAL (or GaMMA / PyOcto) groups picks into events.
  3. Locate — VELEST + HYPOINVERSE for absolute locations.
  4. Relocate — hypoDD + GrowClust for double-difference precision.

Applied to 16 days around the 2004 Parkfield sequence, LOC-FLOW recovered 3.7× more earthquakes than the reference catalog — “hands-free,” directly from continuous data [12]. The modularity is the point: each box can be swapped for a better model without rebuilding the rest.

Putting it together: several real systems

“Seismic monitoring system” can mean anything from a research script to a 24/7 operational service. Here’s how a few notable ones compare — QuakeFlow is one design point, not the only one:

System Core approach Scale / deployment
QuakeFlow [1] PhaseNet + GaMMA, each step containerized Kubernetes auto-scaling on the cloud; Kafka/Spark for real-time streaming
LOC-FLOW [12] PhaseNet + REAL + HYPOINVERSE/VELEST + hypoDD/GrowClust Modular, end-to-end, local–regional scale
MALMI [15] EQTransformer + waveform migration (no association step) Great for low-SNR microseismicity (geothermal, induced)
BPMF [16] Backprojection + matched filtering, with ML detectors GPU/C-accelerated; very low completeness magnitude
RT-MEMS (Taiwan) [17] SeisBlue picker + PhasePAPY, fed by SeedLink Real-time, operational microearthquake monitoring
SCSN post-processing [14] PhaseNet + GaMMA bolted onto a real-time network Regional operational network (Southern California)

A useful contrast: traditional operational software like SeisComP still anchors many observatories, and ML workflows are often measured against it — MALMI, for instance, found 36% more events than a SeisComP reference catalog in an Icelandic geothermal field [15]. And not everything ML-adjacent is deep learning: matched filtering (template matching) remains a complementary powerhouse, often catching the very smallest events that ML pickers miss [16].

The cloud angle: why containers and Kubernetes

Two things make the cloud the natural home for these pipelines. First, the workload is embarrassingly parallel — each station-day of data can be processed independently. Second, demand is bursty — you might reprocess a decade of archives once, then idle. QuakeFlow leans into both by containerizing every stage and running on Kubernetes, so the cluster auto-scales up for a big archive job and back down afterward [1].

How far does this scale? In 2025, a cloud-native workflow on AWS launched ~145,000 containerized jobs to extract 4.3 billion P/S picks from 1.3 petabytes of continuous data across 47,354 stations — finishing in under three days [18]. That is a different universe from a single workstation. For individual researchers, step-by-step guides now exist for running detection-and-association pipelines on commercial clouds, with the honest caveat that the learning curve is steep but the cost is modest [19].

Tip: Many of these models are distributed through SeisBench, a common framework that standardizes datasets and pretrained pickers — so you can swap EQTransformer for PhaseNet with a one-line change and benchmark them on your own data [7].

The catch: generalization and quality control

The headline results are real, but so are the failure modes — and this is where practitioners get burned.

Pretrained models don’t transfer for free. A picker trained in one region can lose a lot of recall in another: one study saw recall drop 13–56% when models trained elsewhere were applied to the Yangbi and Maduo sequences in China [20]. Independent benchmarks show regional models especially fail to transfer to teleseismic distances [7]. The usual fix is fine-tuning on local labeled data (as done for the USTC-Pickers set for China) — and monitoring agencies increasingly conclude that reliable, consistent performance needs location-specific training data plus human QC to weed out false positives [21].

In other words: an ML catalog is a starting point, not gospel. Real deployments still lean on quality control (Wadati diagrams, probability thresholds, template matching for the smallest events) before an event reaches an official catalog.

Check your understanding

You run a US-trained picker in a new region and recall drops. What's the usual fix?

Recap

Without scrolling up — can you name the pipeline? A modern seismic monitoring system is:

  • Pick P/S arrivals with deep learning (PhaseNet, EQTransformer),
  • Associate picks into events (GaMMA, PhaseLink, PyOcto, GENIE),
  • Locate and relocate with physics-based tools (NonLinLoc, hypoDD, GrowClust),
  • all wired into a system — from research workflows (LOC-FLOW, MALMI, BPMF) to cloud-native, auto-scaling services (QuakeFlow) and real-time operational monitors.

The reward is catalogs an order of magnitude more complete; the price is careful attention to generalization and quality control. That trade-off — more events, more vigilance — is the defining tension of modern earthquake monitoring.

Where to go next

References

  1. QuakeFlow: A Scalable Machine-learning-based Earthquake Monitoring Workflow with Cloud Computing — Zhu et al., 2022, Geophysical Journal International.
  2. Machine Learning in Earthquake Seismology — Mousavi & Beroza, 2022, Annual Review of Earth and Planetary Sciences.
  3. Recent advances in earthquake monitoring II: Emergence of next-generation intelligent systems — Li, 2021, Earthquake Science.
  4. PhaseNet: A Deep-Neural-Network-Based Seismic Arrival Time Picking Method — Zhu & Beroza, 2018, Geophysical Journal International.
  5. Earthquake Transformer — an attentive deep-learning model for simultaneous earthquake detection and phase picking — Mousavi et al., 2020, Nature Communications.
  6. EQCCT: A Production-Ready Earthquake Detection and Phase-Picking Method Using the Compact Convolutional Transformer — Saad et al., 2023, IEEE TGRS.
  7. Which Picker Fits My Data? A Quantitative Evaluation of Deep Learning Based Seismic Pickers — Münchmeyer et al., 2021, JGR: Solid Earth.
  8. Earthquake Phase Association Using a Bayesian Gaussian Mixture Model (GaMMA) — Zhu et al., 2021, JGR: Solid Earth.
  9. PhaseLink: A Deep Learning Approach to Seismic Phase Association — Ross et al., 2018, JGR: Solid Earth.
  10. Earthquake Phase Association with Graph Neural Networks (GENIE) — McBrearty & Beroza, 2022.
  11. PyOcto: A high-throughput seismic phase associator — Münchmeyer, 2023, Seismica.
  12. LOC-FLOW: An End-to-End Machine Learning-Based High-Precision Earthquake Location Workflow — Zhang et al., 2022, Seismological Research Letters.
  13. Towards End-to-End Earthquake Monitoring Using a Multitask Deep Learning Model (PhaseNet+) — Zhu et al., 2025.
  14. Improvements from incorporating machine learning algorithms into near real-time operational post-processing — Tepp et al., 2025, Scientific Reports.
  15. MALMI: An Automated Earthquake Detection and Location Workflow Based on Machine Learning and Waveform Migration — Shi et al., 2022, Seismological Research Letters.
  16. BPMF: A Backprojection and Matched-Filtering Workflow for Automated Earthquake Detection and Location — Beaucé et al., 2023, Seismological Research Letters.
  17. A Deep-Learning-Based Real-Time Microearthquake Monitoring System (RT-MEMS) for Taiwan — Sun et al., 2025, Sensors.
  18. A Global-scale Database of Seismic Phases from Cloud-based Picking at Petabyte Scale — Ni et al., 2025, Seismica.
  19. Seismology in the cloud: guidance for the individual researcher — Krauss et al., 2023, Seismica.
  20. Comparison of the Earthquake Detection Effects of PhaseNet and EQTransformer (Yangbi and Maduo earthquakes) — Jiang et al., 2021, Earthquake Science.
  21. Challenges and Opportunities of Machine Learning Earthquake Detection for Regional Monitoring — Noel et al., 2025, BSSA.

Disclaimer of liability

The information provided by the Earth Inversion is made available for educational purposes only.

Whilst we endeavor to keep the information up-to-date and correct. Earth Inversion makes no representations or warranties of any kind, express or implied about the completeness, accuracy, reliability, suitability or availability with respect to the website or the information, products, services or related graphics content on the website for any purpose.

UNDER NO CIRCUMSTANCE SHALL WE HAVE ANY LIABILITY TO YOU FOR ANY LOSS OR DAMAGE OF ANY KIND INCURRED AS A RESULT OF THE USE OF THE SITE OR RELIANCE ON ANY INFORMATION PROVIDED ON THE SITE. ANY RELIANCE YOU PLACED ON SUCH MATERIAL IS THEREFORE STRICTLY AT YOUR OWN RISK.