Solving a Toy Earthquake Location Problem Using a Genetic Algorithm

Utpal Kumar 13 minute read DATASCIENCE September 07, 2024

Discover how Genetic Algorithms can be applied to solve the earthquake location problem in seismology. This post walks through generating synthetic seismic data, implementing a GA to estimate earthquake locations, and visualizing results.

Introduction

Determining the exact location of an earthquake based on seismic data is a complex problem that plays a crucial role in seismology. This challenge, known as the earthquake location problem, involves estimating the earthquake’s origin coordinates and depth using observed arrival times of seismic waves at various stations. Traditional methods often require simplifications or assumptions that can limit accuracy.

However, more sophisticated approaches like the Genetic Algorithm (GA) offer a robust alternative by iteratively improving estimates based on natural selection principles.

In this article, I walk through the earthquake location problem and demonstrate how to solve it using a Genetic Algorithm. If you are unfamiliar with the basics of Genetic Algorithms, check this post first: Genetic Algorithm: a highly robust inversion scheme for geophysical applications.

Complete Script

import os
import numpy as np
import pandas as pd
import numpy.linalg as linalg
from deap import base, creator, tools, algorithms
from mayavi import mlab
import noise
import matplotlib.pyplot as plt

np.random.seed(0)

# Define the boundaries of the grid to be used in simulations
minx, maxx = -5, 5
miny, maxy = -5, 5

# Set constraints on depth, reflecting realistic limits
depth_min = -3
depth_max = 0

# Create a directory to save intermediate steps of the genetic algorithm
output_dir = "ga_steps"
os.makedirs(output_dir, exist_ok=True)


# Part 1: Generate synthetic data representing an earthquake and seismic stations
def generate_data():
    error = 0.05  # Introduce a 5% error in the initial guess
    numstations = 30  # Define the number of seismic stations

    # Randomly place seismic stations within the defined grid boundaries
    stn_locs = []
    xvals = minx + (maxx - minx) * np.random.rand(numstations)
    yvals = miny + (maxy - miny) * np.random.rand(numstations)
    for num in range(numstations):
        stn_locs.append([xvals[num], yvals[num], 0])

    # Define the actual earthquake location
    eq_loc = [2, 2, -2]  # Coordinates in km
    vel = 6  # Velocity in km/s
    origintime = 0  # Origin time of the earthquake

    # Function to calculate arrival times of seismic waves at the stations
    def calc_arrival_time(eq_loc, stnx, stny, stnz, vel, origintime):
        eqx, eqy, eqz = eq_loc
        dist = np.sqrt((eqx - stnx) ** 2 + (eqy - stny) ** 2 + (eqz - stnz) ** 2)
        arr = dist / vel + origintime
        return arr

    # Generate an initial guess of the earthquake location with some error
    eq_initial_guess = [
        eq_loc[0] + error * eq_loc[0],
        eq_loc[1] + error * eq_loc[1],
        eq_loc[2] + error * eq_loc[2],
    ]
    m_intial = [
        eq_initial_guess[0],
        eq_initial_guess[1],
        eq_initial_guess[2],
        vel + error * vel,
        origintime + error * origintime,
    ]

    d_obs = []  # Observed arrival times
    d_pre = []  # Predicted arrival times based on the initial guess
    noise_level_data = 0.05  # Introduce a 5% noise level

    # Calculate observed and predicted arrival times at each station
    for stnx, stny, stnz in stn_locs:
        arr = calc_arrival_time(eq_loc, stnx, stny, stnz, vel, origintime)
        noise_val = np.random.normal(loc=0, scale=noise_level_data * arr)
        d_obs.append(arr + noise_val)
        d_pre.append(calc_arrival_time(eq_initial_guess, stnx, stny, stnz, m_intial[3], m_intial[4]))

    # Save the observed and predicted data to CSV files
    arr_df = pd.DataFrame({"dobs": d_obs, "d_pre": d_pre})
    arr_df.to_csv("arrival_times.csv", index=False)

    stn_locs = np.array(stn_locs)
    station_loc_df = pd.DataFrame({"xloc": stn_locs[:, 0], "yloc": stn_locs[:, 1], "zloc": stn_locs[:, 2]})
    station_loc_df.to_csv("station_locations.csv", index=False)

    return eq_loc, stn_locs, noise_level_data


# Part 2: Implement the Genetic Algorithm (GA) for earthquake location estimation
def read_arrivaltimes(filename):
    data = np.loadtxt(filename, delimiter=",", skiprows=1)
    dobs = data[:, 0]
    return dobs


def read_stnloc(filename):
    station_locations = np.loadtxt(filename, delimiter=",", skiprows=1)
    return station_locations


def fitness_function(params):
    # Apply depth constraints to the parameters
    if params[2] < depth_min or params[2] > depth_max:
        return float("inf"),  # Penalize solutions that violate the depth constraint

    station_locations = read_stnloc("station_locations.csv")
    observed_arrival_times = read_arrivaltimes("arrival_times.csv")

    # Extract parameters from the individual
    x, y, z, velocity, origin_time = params
    predicted_arrival_times = []

    # Calculate predicted arrival times based on the current guess
    for station in station_locations:
        dist = np.sqrt((x - station[0]) ** 2 + (y - station[1]) ** 2 + (z - station[2]) ** 2)
        arrival_time = dist / velocity + origin_time
        predicted_arrival_times.append(arrival_time)

    # Calculate the error between observed and predicted arrival times
    predicted_arrival_times = np.array(predicted_arrival_times)
    error = np.sum((observed_arrival_times - predicted_arrival_times) ** 2)
    return error,


# Run the genetic algorithm to optimize earthquake location
def run_ga(visualize_steps=False):
    # Define a minimization problem with fitness based on minimizing the error
    if "FitnessMin" not in creator.__dict__:
        creator.create("FitnessMin", base.Fitness, weights=(-1.0,))
    if "Individual" not in creator.__dict__:
        creator.create("Individual", list, fitness=creator.FitnessMin)

    # Set up the genetic algorithm toolbox
    toolbox = base.Toolbox()
    toolbox.register("attr_float", np.random.uniform, -3, 3)
    toolbox.register("individual", tools.initRepeat, creator.Individual, toolbox.attr_float, n=5)
    toolbox.register("population", tools.initRepeat, list, toolbox.individual)
    toolbox.register("mate", tools.cxBlend, alpha=0.5)
    toolbox.register("mutate", tools.mutGaussian, mu=0, sigma=1, indpb=0.2)
    toolbox.register("select", tools.selTournament, tournsize=3)
    toolbox.register("evaluate", fitness_function)

    # Initialize the population and parameters for the genetic algorithm
    population = toolbox.population(n=300)
    n_generations = 200
    cxpb = 0.7  # Crossover probability
    mutpb = 0.3  # Mutation probability

    # Track the best individual and statistics during the run
    hall_of_fame = tools.HallOfFame(1)
    stats = tools.Statistics(lambda ind: ind.fitness.values)
    stats.register("min", np.min)
    stats.register("mean", np.mean)
    logbook = tools.Logbook()
    logbook.header = ["gen", "min", "mean"]

    # Main loop to run the genetic algorithm
    for gen in range(n_generations):
        population, _ = algorithms.eaSimple(population, toolbox, cxpb=cxpb, mutpb=mutpb, ngen=1, verbose=False)

        # Apply depth constraints after mutation
        for ind in population:
            if ind[2] < depth_min:
                ind[2] = depth_min
            elif ind[2] > depth_max:
                ind[2] = depth_max

        # Record statistics and update the hall of fame
        record = stats.compile(population)
        logbook.record(gen=gen, **record)
        hall_of_fame.update(population)

        # Optionally visualize intermediate steps
        if visualize_steps and gen % 10 == 0:
            save_intermediate_plot(population, hall_of_fame, gen)

    # Return the best solution found by the algorithm
    best_individual = hall_of_fame[0]
    best_fitness = fitness_function(best_individual)[0]

    return best_individual, best_fitness, logbook


# Placeholder for intermediate plot saving, if needed for debugging or analysis
def save_intermediate_plot(population, hall_of_fame, gen):
    pass


# Part 3: Estimate an ellipsoid around the best estimated earthquake location
def estimate_ellipsoid(center, location_error, noise_level):
    # Combine location error and noise level to form the ellipsoid's radii
    coeffs = location_error + noise_level
    A = np.array([[coeffs[0], 0, 0], [0, coeffs[1], 0], [0, 0, coeffs[2]]])
    _, s, rotation = linalg.svd(A)

    # Handle potential division by zero
    s = np.where(s == 0, 1e-10, s)
    radii = 1.0 / np.sqrt(s)

    u = np.linspace(0.0, 2.0 * np.pi, 100)
    v = np.linspace(0.0, np.pi, 100)

    xx = radii[0] * np.outer(np.cos(u), np.sin(v))
    yy = radii[1] * np.outer(np.sin(u), np.sin(v))
    zz = radii[2] * np.outer(np.ones_like(u), np.cos(v))

    # Rotate and translate the ellipsoid to the estimated location
    for i in range(len(u)):
        for j in range(len(v)):
            [xx[i, j], yy[i, j], zz[i, j]] = np.dot([xx[i, j], yy[i, j], zz[i, j]], rotation) + center

    return xx, yy, zz


# Part 4: Visualization of results with Mayavi, including rough topography
def generate_rough_topography(X, Y, scale=1.0, octaves=6, persistence=0.5, lacunarity=2.0):
    # Generate a rough topography using Perlin noise
    Z = np.zeros_like(X)
    for i in range(X.shape[0]):
        for j in range(X.shape[1]):
            Z[i, j] = noise.pnoise2(X[i, j] * scale, Y[i, j] * scale, octaves=octaves,
                                    persistence=persistence, lacunarity=lacunarity)
    return Z


def plot_results_with_mayavi(eq_loc, stn_locs, best_params, noise_level):
    location_error = np.abs(np.array(eq_loc) - np.array(best_params[:3]))

    # Initialize the Mayavi figure
    mlab.figure(size=(1000, 800), bgcolor=(0.2, 0.2, 0.2))

    # Generate and plot the rough topography surface
    X, Y = np.mgrid[minx:maxx:200j, miny:maxy:200j]
    Z = generate_rough_topography(X, Y, scale=0.5)
    mlab.surf(X, Y, Z, colormap="terrain", warp_scale=1.0)

    # Plot station locations
    mlab.points3d(stn_locs[:, 0], stn_locs[:, 1], stn_locs[:, 2],
                  color=(0.1, 0.5, 1), scale_factor=0.3, mode="cube", opacity=1)

    # Actual earthquake location in red
    mlab.points3d(eq_loc[0], eq_loc[1], eq_loc[2], color=(1, 0.3, 0.3), scale_factor=0.6, mode="sphere")

    # Estimated location in green
    mlab.points3d(best_params[0], best_params[1], best_params[2],
                  color=(0.3, 1, 0.3), scale_factor=0.6, mode="sphere")

    # Error ellipsoid around estimated location
    if np.any(location_error + noise_level > 0):
        center = [best_params[0], best_params[1], best_params[2]]
        xx, yy, zz = estimate_ellipsoid(center, location_error, [noise_level] * 3)
        ellipsoid = mlab.mesh(xx, yy, zz, color=(1, 1, 0.6), opacity=0.6)
        ellipsoid.actor.property.interpolation = "phong"
        ellipsoid.actor.property.specular = 0.1
        ellipsoid.actor.property.specular_power = 20
        ellipsoid.actor.property.diffuse = 0.5
        ellipsoid.actor.property.ambient = 0.3

    # Camera view and axes
    mlab.view(azimuth=45, elevation=75, distance=30)
    mlab.axes(color=(1, 1, 1), xlabel="X (km)", ylabel="Y (km)", zlabel="Depth (km)")
    mlab.outline(color=(1, 1, 1))

    mlab.savefig("earthquake_loc_ga_mayavi.png", magnification=3)
    mlab.clf()


# Alternative visualization with Matplotlib
def plot_results(eq_loc, stn_locs, best_params, noise_level):
    location_error = np.abs(np.array(eq_loc) - np.array(best_params[:3]))

    fig = plt.figure()
    ax = plt.axes(projection="3d")

    X, Y = np.mgrid[minx:maxx:200j, miny:maxy:200j]
    Z = generate_rough_topography(X, Y, scale=0.5)
    ax.plot_surface(X, Y, Z, rstride=1, cstride=1, cmap="terrain", edgecolor="none", alpha=0.7)

    elevated_z = np.array([x[2] + 0.05 for x in stn_locs])
    ax.scatter([x[0] for x in stn_locs], [x[1] for x in stn_locs], elevated_z,
               c="blue", marker="s", s=120, label="Stations", edgecolors="k")

    ax.scatter(eq_loc[0], eq_loc[1], eq_loc[2], c="red", marker="o", s=100,
               label=f"Actual EQ location ({eq_loc[0]:.2f}, {eq_loc[1]:.2f}, {eq_loc[2]:.2f})")

    ax.scatter(best_params[0], best_params[1], best_params[2], c="green", marker="*", s=150,
               label=f"Estimated EQ location ({best_params[0]:.2f}, {best_params[1]:.2f}, {best_params[2]:.2f})")

    if np.any(location_error + noise_level > 0):
        center = [best_params[0], best_params[1], best_params[2]]
        xx, yy, zz = estimate_ellipsoid(center, location_error, [noise_level] * 3)
        ax.plot_surface(xx, yy, zz, rstride=4, cstride=4, color="yellow", alpha=0.3, edgecolor="none")

    ax.set_xlim([minx, maxx])
    ax.set_ylim([miny, maxy])
    ax.set_zlim([-5, 0.1])
    ax.set_xlabel("X Coordinate")
    ax.set_ylabel("Y Coordinate")
    ax.set_zlabel("Depth")
    ax.view_init(elev=30, azim=60)

    ax.legend(loc="lower left")
    plt.tight_layout()
    plt.savefig("earthquake_loc_ga.png", bbox_inches="tight", dpi=300)


# Main execution block
if __name__ == "__main__":
    eq_loc, stn_locs, noise_level = generate_data()
    best_params, best_fitness, _ = run_ga(visualize_steps=True)

    plot_results_with_mayavi(eq_loc, stn_locs, best_params, noise_level)
    plot_results(eq_loc, stn_locs, best_params, noise_level)

    print(f"Actual EQ Location: {eq_loc}")
    print(f"Best Estimated EQ Location: {best_params}")
    print(f"Location Error: {np.abs(np.array(eq_loc) - np.array(best_params[:3]))}")
    print(f"Best Fitness (Error): {best_fitness}")

Here is a brief explanation of the code used in the process:

Data Generation: The function generate_data() simulates an earthquake event and the corresponding seismic station observations. It introduces errors and noise to mimic real-world data.
Genetic Algorithm: The run_ga() function executes the Genetic Algorithm, using a fitness function (fitness_function()) to evaluate how close each candidate solution is to the actual earthquake location.
Visualization: plot_results_with_mayavi() and plot_results() create visual representations of the results. Mayavi is used for more detailed 3D plots, while Matplotlib provides an alternative.
Error Analysis: The code calculates and visualizes the error between the actual and estimated earthquake locations, providing a clear measure of the GA’s effectiveness.

Generating Synthetic Seismic Data

To tackle the earthquake location problem, we first need to generate synthetic data that mimics real seismic observations. This involves placing seismic stations randomly within a defined grid and simulating the arrival times of seismic waves at these stations based on a known earthquake location.

Here is a breakdown of the process:

Grid Boundaries: The simulation grid is defined with x and y boundaries ranging from -5 to 5 km. The depth is constrained between -3 and 0 km to reflect realistic limits.
Station Placement: 30 seismic stations are randomly distributed within the grid.
True Earthquake Location: The actual earthquake is set at coordinates (2, 2, -2) km.
Arrival Times: The arrival times of seismic waves at each station are calculated using the distance from the earthquake location and the wave velocity. To simulate real-world conditions, we introduce a small amount of noise into these times.

The synthetic data generated serves as the basis for our Genetic Algorithm to estimate the earthquake’s location.

Implementing the Genetic Algorithm

The core of this approach is a Genetic Algorithm that iteratively searches for the best solution to the earthquake location problem. The GA starts with a population of random solutions and evolves them over generations using operations inspired by natural selection: crossover, mutation, and selection.

Here is a step-by-step explanation:

Fitness Function: The fitness function measures how well a candidate solution predicts the observed arrival times. It calculates the error between the observed and predicted times. The goal is to minimize this error.
Population Initialization: The algorithm begins with a population of 300 random individuals, each representing a potential earthquake location and other parameters (velocity and origin time).
Evolution Process: Over 200 generations, the population evolves through crossover (combining two solutions) and mutation (randomly altering a solution) to produce new solutions. The best solutions are selected to continue to the next generation.
Depth Constraints: To ensure realistic results, we apply constraints on the depth of the earthquake during the mutation step.

The GA runs until it finds the best solution, i.e., the set of parameters that minimize the error in predicting the observed arrival times.

Genetic Algorithm Evolution

Visualizing the Results

Once the Genetic Algorithm identifies the best solution, we visualize the estimated earthquake location and compare it to the actual location. The visualization includes:

Station Locations: Displayed as blue cubes on a 3D plot.
Actual and Estimated Locations: Highlighted as red and green spheres, respectively.
Error Ellipsoid: An ellipsoid around the estimated location representing the uncertainty in the solution.

We use Mayavi for interactive 3D visualization and Matplotlib for static 3D plots. These visualizations provide insights into the accuracy of the GA in solving the earthquake location problem.

Earthquake Location using Genetic Algorithm (MATPLOTLIB)

Earthquake Location using Genetic Algorithm (MAYAVI)

Video Walkthrough

Limitations of the Above Example

While the Genetic Algorithm (GA) provides a robust approach to solving the earthquake location problem, the example presented here has several limitations that should be considered when interpreting the results:

Simplified Grid and Depth Constraints: The simulation grid is relatively small, with x and y boundaries ranging from -5 to 5 km, and the depth is constrained between -3 and 0 km. While this setup is suitable for demonstrating the GA, real-world earthquake location problems often involve much larger geographical areas and more complex depth variations. The constrained depth in this example may not fully capture the challenges of locating deeper earthquakes.
Synthetic Data with Simplified Assumptions: The synthetic data generated assumes a constant velocity of seismic waves (6 km/s) and a known origin time. In reality, the Earth’s subsurface is highly heterogeneous, and seismic velocities can vary significantly across different regions and depths. Additionally, the true origin time of an earthquake is often unknown or uncertain, adding another layer of complexity to the problem.
Noise and Error Representation: The example introduces a fixed 5% noise level to simulate observational errors, but in practice, noise in seismic data can be much more complex and variable. Factors such as instrument sensitivity, environmental conditions, and human activities can all contribute to varying noise levels. The simple noise model used here may not fully represent the challenges faced in processing real seismic data.
Limited Number of Seismic Stations: The example uses 30 seismic stations randomly placed within the grid. While this number is sufficient for the demonstration, real-world earthquake monitoring networks often have more stations spread over larger areas, providing more data points and improving location accuracy. The limited number of stations in this example could lead to less accurate estimates.
Fitness Function Simplification: The fitness function used in the GA is based on minimizing the error between observed and predicted arrival times. However, this approach assumes that the only source of error is the difference in arrival times. In reality, other factors, such as incorrect velocity models or mislocated stations, could contribute to errors, making the fitness landscape more complex and difficult to navigate.
Algorithm Parameters and Convergence: The GA parameters, such as population size, mutation rate, and crossover probability, are chosen based on standard practices, but these may not be optimal for all problems. Additionally, the GA may converge to a local minimum rather than the global minimum, especially in complex fitness landscapes. This can result in suboptimal solutions, particularly if the initial population is not diverse enough.
Visualization and Interpretation: While the visualizations provide valuable insights into the estimated earthquake location, they rely on the assumption that the error ellipsoid accurately represents the uncertainty in the solution. In reality, the shape and size of the error ellipsoid can be influenced by many factors, and interpreting these results requires careful consideration of the underlying assumptions.
Scalability to Real-World Applications: The example is designed for educational purposes and may not scale well to real-world applications where the number of stations, data points, and computational resources required are significantly higher. Implementing the GA for large-scale earthquake location problems would require more advanced techniques, such as parallel processing or the use of more sophisticated fitness functions.

Conclusion

The Genetic Algorithm offers a powerful method for solving the earthquake location problem, allowing for accurate and efficient estimations even in complex scenarios. The approach can handle the noise and uncertainty inherent in real seismic data, making it a valuable tool in seismology.

If you are interested in more details on Genetic Algorithms and their applications, do not forget to check out my previous post.

Disclaimer of liability

The information provided by the Earth Inversion is made available for educational purposes only.

Whilst we endeavor to keep the information up-to-date and correct. Earth Inversion makes no representations or warranties of any kind, express or implied about the completeness, accuracy, reliability, suitability or availability with respect to the website or the information, products, services or related graphics content on the website for any purpose.

UNDER NO CIRCUMSTANCE SHALL WE HAVE ANY LIABILITY TO YOU FOR ANY LOSS OR DAMAGE OF ANY KIND INCURRED AS A RESULT OF THE USE OF THE SITE OR RELIANCE ON ANY INFORMATION PROVIDED ON THE SITE. ANY RELIANCE YOU PLACED ON SUCH MATERIAL IS THEREFORE STRICTLY AT YOUR OWN RISK.

Subscribe to our weekly newsletter