Probabilistic Earthquake Location with NonLinLoc

Utpal Kumar 10 minute read GEOPHYSICS July 03, 2026

Ask a seismologist “where was the earthquake?” and the honest answer is rarely a single point. Picks have timing errors, the velocity model is imperfect, and the station geometry may leave whole directions poorly constrained. The classic locators paper over that by handing you one hypocenter and an error ellipse. NonLinLoc takes a different stance: it computes the entire probability density of where the source could be, then lets you summarize it. In this post I’ll walk through the method — the Bayesian idea, how it fits the arrival times, how it searches, and how to read the result.

The one idea

NonLinLoc doesn’t return a point — it returns a probability density function (PDF) over a 3-D volume. The hypocenter is wherever that density piles up. Everything else in this post is machinery for computing and summarizing that cloud.

The classic way — and where it hurts

The traditional approach is Geiger’s method: guess a starting hypocenter, linearize the travel-time equations around it, solve the least-squares system for the correction, and iterate until the residuals stop shrinking. It’s fast and it powers workhorses like Hypo71, HYPOINVERSE, and LOCSAT. (If you want the mechanics with code, see my earlier post on Geiger’s method.)

The trouble is what linearization assumes:

The misfit surface is locally smooth and Gaussian near the current guess — often false, especially with sparse or one-sided networks.
Following the gradient from a starting point can settle in a local minimum; the answer depends on where you start.
Because the misfit sums squared residuals, one bad pick can drag the whole solution.
You get one point ± an ellipse, which hides genuinely irregular, multi-peaked, or “banana-shaped” uncertainty.

Check your understanding

Why can a Geiger-style linearized locator converge to the wrong hypocenter?

Location as a probability density

NonLinLoc reframes the question. Instead of iterating toward one point, it maps the full solution over a search volume using Bayes’ rule. We want the hypocenter $\mathbf{x}=(x,y,z)$ given the observed arrival times $\mathbf{t}^{\mathrm{obs}}$:

\[\underbrace{p(\mathbf{x}\mid \mathbf{t}^{\mathrm{obs}})}_{\text{posterior PDF}} \;\propto\; \underbrace{p(\mathbf{t}^{\mathrm{obs}}\mid \mathbf{x})}_{\text{likelihood (data fit)}} \;\cdot\; \underbrace{p(\mathbf{x})}_{\text{prior}}\]

The answer is that posterior density over the volume. There’s no linearization, no starting model, and no iteration to convergence — it’s a global, non-linear search grounded in the Tarantola–Valette inverse-theory framework [1][5]. And because it’s a density rather than a point, it represents irregular, multi-peaked uncertainty naturally.

The forward problem: theoretical travel times

To evaluate the likelihood at any trial source $\mathbf{x}$, we need the predicted travel time $T_i(\mathbf{x})$ from that point to each station $i$, for each phase (P, S, …). Recomputing ray paths on the fly for millions of trial points would be hopeless, so NonLinLoc precomputes a 3-D travel-time grid — one per station, per phase — in two steps:

Vel2Grid      velocity model  ->  slowness grid
Grid2Time     slowness grid   ->  travel-time grid   (Eikonal finite-difference solve)
NLLoc         travel-time grids + picks  ->  posterior PDF

Grid2Time solves the eikonal equation with the finite-difference scheme of Podvin & Lecomte [3], which handles strong velocity contrasts without two-point ray shooting. The grids are built once and reused for every event, so the actual location step is fast.

NonLinLoc’s grid-building pipeline. Each program runs once to produce a travel-time grid per station and phase; the location step then reuses those grids for every event.

Why precompute grids instead of ray-tracing per event?

A global search evaluates the posterior at a huge number of trial locations. If each evaluation required tracing rays from that trial point to every station, the search would be prohibitively slow. Precomputing turns the expensive physics into a one-time cost:

Build once: run Vel2Grid then Grid2Time per station/phase. This is where the wave physics lives.
Reuse forever: every earthquake in that network just interpolates the stored grids — a table lookup, not a ray trace.

The trade-off is disk space and the up-front build, in exchange for fast, repeatable location. It also means the location is only as good as the velocity model baked into those grids.

Fitting the data: two likelihood functions

The likelihood measures how well a trial source explains the picks. NonLinLoc offers two.

The classic $L_2$ / Gaussian fit. With residual $r_i = t_i^{\mathrm{obs}} - t_0 - T_i(\mathbf{x})$ and pick uncertainty $\sigma_i$:

\[p_{L_2}(\mathbf{t}^{\mathrm{obs}}\mid\mathbf{x}) \;\propto\; \exp\!\Big[-\tfrac{1}{2}\textstyle\sum_i r_i^2 / \sigma_i^2\Big]\]

The origin time $t_0$ is recovered analytically for each trial $\mathbf{x}$, so the search is over space only. But squaring the residuals makes this a product of Gaussians — one outlier pick with a large $r_i$ can dominate the sum and pull the peak away from the truth.

A single mis-picked phase can dominate the $L_2$ misfit and bias the location. That’s exactly what the EDT likelihood is designed to survive.

The EDT (Equal Differential Time) likelihood. EDT compares pairs of observations rather than absolute times. For a true source, the difference of two observed times should equal the difference of the two predicted times:

\[\big(t_a^{\mathrm{obs}}-t_b^{\mathrm{obs}}\big)\;\approx\;\big(T_a(\mathbf{x})-T_b(\mathbf{x})\big)\] \[p_{\mathrm{EDT}}(\mathbf{x})\;\propto\; \sum_{a,b}\frac{1}{\sigma_{ab}}\, \exp\!\Big(-\tfrac{(\,\Delta t^{\mathrm{obs}}_{ab}-\Delta T_{ab}(\mathbf{x})\,)^2}{2\,\sigma_{ab}^2}\Big), \qquad \Delta t_{ab}=t_a-t_b\]

Two properties make this powerful [4]:

Origin time cancels in the difference, so there’s one fewer unknown and no trade-off between depth and $t_0$.
It’s a sum of Gaussians over pairs, not a product. A bad pick only spoils the pairs it belongs to; all the consistent picks still “vote” together, so the PDF stays peaked at the true location. That’s what makes EDT robust to outliers — a real asset for automatic picks.

Check your understanding

Why is the EDT likelihood more robust to a single bad pick than the $L_2$ fit?

Searching the volume

With a likelihood in hand, how do we actually find where the posterior piles up? NonLinLoc gives three engines:

Systematic grid search — evaluate the posterior at every node of a regular 3-D grid. It’s complete (it can’t miss a peak the grid resolves) but exhaustive: cost grows like $N^3$ with resolution. A good baseline or coarse scan.
Oct-tree importance sampling — start from a coarse grid of large cells, evaluate the posterior at each cell center, subdivide the most probable cell into eight, and repeat. Sampling automatically densifies around the peaks [2]. It’s orders of magnitude cheaper than a full grid yet still global, which is why it’s the usual default.
Metropolis–Gibbs — a directed random walk (MCMC) that accepts/rejects proposed steps so the density of samples reproduces the posterior. It yields a scatter of samples directly, but is generally slower and less complete than oct-tree for routine work [1].

Three ways to explore the posterior volume. Grid search evaluates every node (complete but slow); oct-tree refines cells toward the peaks (fast yet global — the usual default); Metropolis–Gibbs walks the volume, leaving samples in proportion to probability.

What this looks like in an NLLoc control file

You choose the engine and the likelihood with a couple of statements in the NLLoc control file:

LOCSEARCH  OCT  ...        # oct-tree importance sampling (init cells + stopping rules)
LOCMETH    EDT_OT_WT  ...  # EDT likelihood, origin-time-variance weighted
LOCGRID    ...             # the search grid geometry

Swap LOCMETH EDT_OT_WT for LOCMETH GAU_ANALYTIC to use the classic $L_2$/Gaussian fit instead, or LOCSEARCH GRID/LOCSEARCH MET to change engines. The concepts in this post map one-to-one onto those keywords.

Check your understanding

Which search engine is both fast and global — NonLinLoc's usual default?

The solution is a PDF, not a point

The search returns a scatter cloud of samples that traces the posterior. NonLinLoc then summarizes it:

Maximum-likelihood (ML) hypocenter — the single best-fit point, the peak of the PDF.
Expectation hypocenter — the probability-weighted mean of the cloud.
68% confidence ellipsoid — from a singular-value decomposition of the sample covariance matrix.
Quality metrics — RMS residual, azimuthal gap, number of stations, depth control.

The posterior as a scatter cloud with its 68% confidence ellipse. The maximum-likelihood hypocenter (the peak) and the expectation (the probability-weighted mean) coincide for a symmetric cloud but pull apart when it is skewed — the shape carries information a single point cannot.

Here’s the payoff of the whole approach: when the PDF is a clean blob, the ML point, the expectation, and the ellipsoid agree, and you’ve basically reproduced the classical answer. But when the network geometry makes the true uncertainty irregular, the ML and expectation hypocenters differ, and the cloud’s shape tells you how the location is poorly constrained — something a single ellipse can never show. The results are written to a .loc.hyp hypocenter–phase file plus a binary scatter file you can inspect in NonLinLoc’s Seismicity Viewer.

Read the cloud, not just the dot

If the ML and expectation hypocenters sit on top of each other and the ellipsoid is tight, trust the point. If they separate or the cloud is banana-shaped, the geometry is telling you the depth or epicenter is weakly constrained — investigate before you believe the number.

The whole pipeline at a glance

Every piece we have covered fits into one flow. Read it left to right: inputs on the left, the posterior cloud on the right.

The whole method end to end. A velocity model and phase picks feed precomputed travel-time grids and the EDT likelihood; an oct-tree search maps the posterior; the output is a scatter cloud summarized by a maximum-likelihood hypocenter and a confidence ellipsoid. The grids are built once and reused for every event.

When to reach for NonLinLoc — and the costs

Strengths. Honest, non-Gaussian uncertainty (clouds and ellipsoids, not just a point); outlier-robust via EDT, which suits automatic picks; works with 3-D / heterogeneous velocity models; no starting model and no local-minimum trap.

Costs and caveats. You must precompute travel-time grids (Vel2Grid → Grid2Time) for every station and phase, which needs disk and an up-front build. The location quality is bounded by the velocity model you put in. And a global search over a volume is more computation per event than a quick linearized step — although oct-tree keeps that very manageable.

The rule of thumb: reach for NonLinLoc when you need trustworthy uncertainties, robustness to bad picks, or a 3-D model — and you can afford to build the grids up front.

Recap

Without scrolling up, can you reconstruct the method?

NonLinLoc treats location as a posterior PDF over a volume — global, non-linear, no starting model — instead of iterating to one point.
Travel times are precomputed on grids (Grid2Time, eikonal / Podvin–Lecomte) and reused every event.
The EDT likelihood compares pairs of picks, so origin time cancels and one bad pick can’t dominate — unlike the $L_2$ product of Gaussians.
Oct-tree importance sampling makes the global search fast while staying global (the usual default).
The output is a scatter cloud → ML hypocenter, expectation, and a 68% confidence ellipsoid; when ML and expectation disagree, read the cloud’s shape.

Where to go next

NonLinLoc software and documentation: the current docs live at nlloc.clam.moe (and the classic site at alomax.free.fr/nlloc), including the control-file reference and the Seismicity Viewer.
Related posts here: Locating Earthquakes using Geiger’s Method for the linearized baseline this post contrasts with, Monte Carlo methods and the earthquake location problem for another sampling-based view, and Modern Seismic Monitoring Systems for where NonLinLoc fits in an ML-driven catalog pipeline.

References

Probabilistic Earthquake Location in 3D and Layered Models — Lomax, Virieux, Volant & Berge-Thierry, 2000, Advances in Seismic Event Location (Springer).
Earthquake Location, Direct, Global-Search Methods — Lomax, Michelini & Curtis, 2009, Encyclopedia of Complexity and Systems Science (Springer).
Finite difference computation of traveltimes in very contrasted velocity models — Podvin & Lecomte, 1991, Geophysical Journal International.
Hypocentre determination offshore of eastern Taiwan using the Maximum Intersection method — Font, Kao, Lallemand, Liu & Chiao, 2004, Geophysical Journal International.
Inverse problems = quest for information — Tarantola & Valette, 1982, Journal of Geophysics 50, 159–170.

Disclaimer of liability

The information provided by the Earth Inversion is made available for educational purposes only.

Whilst we endeavor to keep the information up-to-date and correct. Earth Inversion makes no representations or warranties of any kind, express or implied about the completeness, accuracy, reliability, suitability or availability with respect to the website or the information, products, services or related graphics content on the website for any purpose.

UNDER NO CIRCUMSTANCE SHALL WE HAVE ANY LIABILITY TO YOU FOR ANY LOSS OR DAMAGE OF ANY KIND INCURRED AS A RESULT OF THE USE OF THE SITE OR RELIANCE ON ANY INFORMATION PROVIDED ON THE SITE. ANY RELIANCE YOU PLACED ON SUCH MATERIAL IS THEREFORE STRICTLY AT YOUR OWN RISK.

Subscribe to our weekly newsletter

The classic way — and where it hurts

Location as a probability density

The forward problem: theoretical travel times

Fitting the data: two likelihood functions

Searching the volume

The solution is a PDF, not a point

The whole pipeline at a glance

When to reach for NonLinLoc — and the costs

Recap

Where to go next

References

Disclaimer of liability

Leave a comment