Solving a Toy Earthquake Location Problem Using a Genetic Algorithm

Discover how Genetic Algorithms can be applied to solve the earthquake location problem in seismology. This post walks through generating synthetic seismic data, implementing a GA to estimate earthquake locations, and visualizing results. While effective, the approach has limitations, including simplified assumptions and constraints.
Kumar, U. (2020, June 7). Genetic Algorithm explained in the context of earthquake location problem. https://doi.org/10.5281/zenodo.4679544

Introduction

Determining the exact location of an earthquake based on seismic data is a complex problem that plays a crucial role in seismology. This challenge, known as the “earthquake location problem,” involves estimating the earthquake’s origin coordinates and depth using observed arrival times of seismic waves at various stations. Traditional methods often require simplifications or assumptions that can limit accuracy. However, more sophisticated approaches like the Genetic Algorithm (GA) offer a robust alternative by iteratively improving estimates based on natural selection principles.

In this article, I’ll walk you through the earthquake location problem and demonstrate how to solve it using a Genetic Algorithm. If you’re unfamiliar with the basics of Genetic Algorithms, I recommend checking out my previous post for a detailed introduction.

Complete Script

import os
import numpy as np
import pandas as pd
import numpy.linalg as linalg
from deap import base, creator, tools, algorithms
from mayavi import mlab
import noise
import matplotlib.pyplot as plt

np.random.seed(0)

# Define the boundaries of the grid to be used in simulations
minx, maxx = -5, 5
miny, maxy = -5, 5

# Set constraints on depth, reflecting realistic limits
depth_min = -3
depth_max = 0

# Create a directory to save intermediate steps of the genetic algorithm
output_dir = "ga_steps"
os.makedirs(output_dir, exist_ok=True)

# Part 1: Generate synthetic data representing an earthquake and seismic stations
def generate_data():
    error = 0.05  # Introduce a 5% error in the initial guess
    numstations = 30  # Define the number of seismic stations

    # Randomly place seismic stations within the defined grid boundaries
    stn_locs = []
    xvals = minx + (maxx - minx) * np.random.rand(numstations)
    yvals = miny + (maxy - miny) * np.random.rand(numstations)
    for num in range(numstations):
        stn_locs.append([xvals[num], yvals[num], 0])

    # Define the actual earthquake location
    eq_loc = [2, 2, -2]  # Coordinates in km
    vel = 6  # Velocity in km/s
    origintime = 0  # Origin time of the earthquake

    # Function to calculate arrival times of seismic waves at the stations
    def calc_arrival_time(eq_loc, stnx, stny, stnz, vel, origintime):
        eqx, eqy, eqz = eq_loc
        dist = np.sqrt((eqx - stnx) ** 2 + (eqy - stny) ** 2 + (eqz - stnz) ** 2)
        arr = dist / vel + origintime
        return arr

    # Generate an initial guess of the earthquake location with some error
    eq_initial_guess = [eq_loc[0] + error * eq_loc[0], eq_loc[1] + error * eq_loc[1], eq_loc[2] + error * eq_loc[2]]
    m_intial = [eq_initial_guess[0], eq_initial_guess[1], eq_initial_guess[2], vel + error * vel, origintime + error * origintime]

    d_obs = []  # Observed arrival times
    d_pre = []  # Predicted arrival times based on the initial guess
    noise_level_data = 0.05  # Introduce a 5% noise level

    # Calculate observed and predicted arrival times at each station
    for stnx, stny, stnz in stn_locs:
        arr = calc_arrival_time(eq_loc, stnx, stny, stnz, vel, origintime)
        noise_val = np.random.normal(loc=0, scale=noise_level_data * arr)  # Add random noise to simulate real data
        d_obs.append(arr + noise_val)
        d_pre.append(calc_arrival_time(eq_initial_guess, stnx, stny, stnz, m_intial[3], m_intial[4]))

    # Save the observed and predicted data to CSV files
    arr_df = pd.DataFrame({'dobs': d_obs, 'd_pre': d_pre})
    arr_df.to_csv('arrival_times.csv', index=False)

    stn_locs = np.array(stn_locs) # Convert stn_locs to a NumPy array
    station_loc_df = pd.DataFrame({'xloc': stn_locs[:, 0], 'yloc': stn_locs[:, 1], 'zloc': stn_locs[:, 2]})
    station_loc_df.to_csv('station_locations.csv', index=False)

    return eq_loc, stn_locs, noise_level_data

# Part 2: Implement the Genetic Algorithm (GA) for earthquake location estimation
def read_arrivaltimes(filename):
    data = np.loadtxt(filename, delimiter=',', skiprows=1)
    dobs = data[:, 0]
    return dobs

def read_stnloc(filename):
    station_locations = np.loadtxt(filename, delimiter=',', skiprows=1)
    return station_locations

def fitness_function(params):
    # Apply depth constraints to the parameters
    if params[2] < depth_min or params[2] > depth_max:
        return float('inf'),  # Penalize solutions that violate the depth constraint

    station_locations = read_stnloc('station_locations.csv')
    observed_arrival_times = read_arrivaltimes('arrival_times.csv')
    
    # Extract parameters from the individual
    x, y, z, velocity, origin_time = params
    predicted_arrival_times = []

    # Calculate predicted arrival times based on the current guess
    for station in station_locations:
        dist = np.sqrt((x - station[0]) ** 2 + (y - station[1]) ** 2 + (z - station[2]) ** 2)
        arrival_time = dist / velocity + origin_time
        predicted_arrival_times.append(arrival_time)

    # Calculate the error between observed and predicted arrival times
    predicted_arrival_times = np.array(predicted_arrival_times)
    error = np.sum((observed_arrival_times - predicted_arrival_times) ** 2)
    return error,

# Run the genetic algorithm to optimize earthquake location
def run_ga(visualize_steps=False):
    # Define a minimization problem with fitness based on minimizing the error
    if "FitnessMin" not in creator.__dict__:
        creator.create("FitnessMin", base.Fitness, weights=(-1.0,))
    if "Individual" not in creator.__dict__:
        creator.create("Individual", list, fitness=creator.FitnessMin)

    # Set up the genetic algorithm toolbox
    toolbox = base.Toolbox()
    toolbox.register("attr_float", np.random.uniform, -3, 3)  # Generate random floats within a specified range
    toolbox.register("individual", tools.initRepeat, creator.Individual, toolbox.attr_float, n=5)  # Create individuals
    toolbox.register("population", tools.initRepeat, list, toolbox.individual)  # Create populations
    toolbox.register("mate", tools.cxBlend, alpha=0.5)  # Define crossover operation
    toolbox.register("mutate", tools.mutGaussian, mu=0, sigma=1, indpb=0.2)  # Define mutation operation
    toolbox.register("select", tools.selTournament, tournsize=3)  # Define selection operation
    toolbox.register("evaluate", fitness_function)  # Evaluate fitness

    # Initialize the population and parameters for the genetic algorithm
    population = toolbox.population(n=300)
    n_generations = 200
    cxpb = 0.7  # Crossover probability
    mutpb = 0.3  # Mutation probability

    # Track the best individual and statistics during the run
    hall_of_fame = tools.HallOfFame(1)
    stats = tools.Statistics(lambda ind: ind.fitness.values)
    stats.register("min", np.min)
    stats.register("mean", np.mean)
    logbook = tools.Logbook()
    logbook.header = ["gen", "min", "mean"]

    # Main loop to run the genetic algorithm
    for gen in range(n_generations):
        population, log = algorithms.eaSimple(population, toolbox, cxpb=cxpb, mutpb=mutpb, ngen=1, verbose=False)

        # Apply depth constraints after mutation
        for ind in population:
            if ind[2] < depth_min:
                ind[2] = depth_min
            elif ind[2] > depth_max:
                ind[2] = depth_max
        
        # Record statistics and update the hall of fame
        record = stats.compile(population)
        logbook.record(gen=gen, **record)
        hall_of_fame.update(population)

        # Optionally visualize intermediate steps
        if visualize_steps and gen % 10 == 0:
            save_intermediate_plot(population, hall_of_fame, gen)

    # Return the best solution found by the algorithm
    best_individual = hall_of_fame[0]
    best_fitness = fitness_function(best_individual)[0]
    
    return best_individual, best_fitness, logbook

# Placeholder for intermediate plot saving, if needed for debugging or analysis
def save_intermediate_plot(population, hall_of_fame, gen):
    pass

# Part 3: Estimate an ellipsoid around the best estimated earthquake location
def estimate_ellipsoid(center, location_error, noise_level):
    # Combine location error and noise level to form the ellipsoid's radii
    coeffs = location_error + noise_level
    A = np.array([[coeffs[0], 0, 0], [0, coeffs[1], 0], [0, 0, coeffs[2]]])
    U, s, rotation = linalg.svd(A)

    # Handle potential division by zero
    s = np.where(s == 0, 1e-10, s)
    
    radii = 1.0 / np.sqrt(s)
    u = np.linspace(0.0, 2.0 * np.pi, 100)  # Generate parametric values for the ellipsoid
    v = np.linspace(0.0, np.pi, 100)
    xx = radii[0] * np.outer(np.cos(u), np.sin(v))
    yy = radii[1] * np.outer(np.sin(u), np.sin(v))
    zz = radii[2] * np.outer(np.ones_like(u), np.cos(v))
    
    # Rotate and translate the ellipsoid to the estimated location
    for i in range(len(u)):
        for j in range(len(v)):
            [xx[i, j], yy[i, j], zz[i, j]] = np.dot([xx[i, j], yy[i, j], zz[i, j]], rotation) + center
    
    return xx, yy, zz

# Part 4: Visualization of results with Mayavi, including rough topography
def generate_rough_topography(X, Y, scale=1.0, octaves=6, persistence=0.5, lacunarity=2.0):
    # Generate a rough topography using Perlin noise
    Z = np.zeros_like(X)
    for i in range(X.shape[0]):
        for j in range(X.shape[1]):
            Z[i, j] = noise.pnoise2(X[i, j] * scale, Y[i, j] * scale, octaves=octaves,
                                    persistence=persistence, lacunarity=lacunarity)
    return Z

def plot_results_with_mayavi(eq_loc, stn_locs, best_params, noise_level):
    location_error = np.abs(np.array(eq_loc) - np.array(best_params[:3]))

    # Initialize the Mayavi figure
    fig = mlab.figure(size=(1000, 800), bgcolor=(0.2, 0.2, 0.2))
    
    # Generate and plot the rough topography surface
    X, Y = np.mgrid[minx:maxx:200j, miny:maxy:200j]
    Z = generate_rough_topography(X, Y, scale=0.5)
    mlab.surf(X, Y, Z, colormap='terrain', warp_scale=1.0)

    # Plot the locations of seismic stations as blue cubes
    mlab.points3d(stn_locs[:, 0], stn_locs[:, 1], stn_locs[:, 2],
                  color=(0.1, 0.5, 1), scale_factor=0.3, mode='cube', opacity=1)

    # Highlight the actual earthquake location in red
    mlab.points3d(eq_loc[0], eq_loc[1], eq_loc[2],
                  color=(1, 0.3, 0.3), scale_factor=0.6, mode='sphere')

    # Show the estimated earthquake location in green
    mlab.points3d(best_params[0], best_params[1], best_params[2],
                  color=(0.3, 1, 0.3), scale_factor=0.6, mode='sphere')

    # Visualize the error as an ellipsoid around the estimated location
    if np.any(location_error + noise_level > 0):
        center = [best_params[0], best_params[1], best_params[2]]
        xx, yy, zz = estimate_ellipsoid(center, location_error, [noise_level] * 3)
        ellipsoid = mlab.mesh(xx, yy, zz, color=(1, 1, 0.6), opacity=0.6)

        # Enhance the ellipsoid's appearance with advanced lighting and shading
        ellipsoid.actor.property.interpolation = 'phong'
        ellipsoid.actor.property.specular = 0.1
        ellipsoid.actor.property.specular_power = 20
        ellipsoid.actor.property.diffuse = 0.5
        ellipsoid.actor.property.ambient = 0.3

    # Set up the camera view and add axes and grid lines for context
    mlab.view(azimuth=45, elevation=75, distance=30)
    mlab.axes(color=(1, 1, 1), xlabel='X (km)', ylabel='Y (km)', zlabel='Depth (km)',
              x_axis_visibility=True, y_axis_visibility=True, z_axis_visibility=True)
    mlab.outline(color=(1, 1, 1))

    # Save the final Mayavi plot
    mlab.savefig('earthquake_loc_ga_mayavi.png', magnification=3)
    mlab.clf()  # Clear the figure after saving

# Alternative: Visualization of results with Matplotlib
def plot_results(eq_loc, stn_locs, best_params, noise_level):
    location_error = np.abs(np.array(eq_loc) - np.array(best_params[:3]))

    X = np.linspace(minx, maxx, 200)
    Y = np.linspace(miny, maxy, 200)
    X, Y = np.meshgrid(X, Y)
    Z = (X**2 + Y**2)*0

    fig = plt.figure()
    ax = plt.axes(projection='3d')

    # Generate and plot rough topography using Matplotlib
    X, Y = np.mgrid[minx:maxx:200j, miny:maxy:200j]
    Z = generate_rough_topography(X, Y, scale=0.5)
    ax.plot_surface(X, Y, Z, rstride=1, cstride=1, cmap='terrain', edgecolor='none', alpha=0.7)
    
    # Slightly elevate station locations to ensure they are visible
    elevated_z = np.array([x[2] + 0.05 for x in stn_locs])

    # Plot the stations as blue squares
    ax.scatter([x[0] for x in stn_locs], [x[1] for x in stn_locs], elevated_z, 
               c='blue', marker='s', s=120, label="Stations", edgecolors='k')
    
    # Highlight the actual and estimated earthquake locations
    ax.scatter(eq_loc[0], eq_loc[1], eq_loc[2], c='red', marker='o', s=100, 
               label=f'Actual EQ location ({eq_loc[0]:.2f}, {eq_loc[1]:.2f}, {eq_loc[2]:.2f})')
    ax.scatter(best_params[0], best_params[1], best_params[2], c='green', marker='*', s=150, 
               label=f'Estimated EQ location ({best_params[0]:.2f}, {best_params[1]:.2f}, {best_params[2]:.2f})')

    # Display the error ellipsoid in yellow
    if np.any(location_error + noise_level > 0):
        center = [best_params[0], best_params[1], best_params[2]]
        xx, yy, zz = estimate_ellipsoid(center, location_error, [noise_level] * 3)
        ax.plot_surface(xx, yy, zz, rstride=4, cstride=4, color='yellow', alpha=0.3, edgecolor='none')

    # Set axis labels, limits, and adjust the view
    ax.set_xlim([minx, maxx])
    ax.set_ylim([miny, maxy])
    ax.set_zlim([-5, 0.1])
    ax.set_xlabel('X Coordinate')
    ax.set_ylabel('Y Coordinate')
    ax.set_zlabel('Depth')
    ax.view_init(elev=30, azim=60) 

    # Add legend and save the plot
    ax.legend(loc="lower left")
    plt.tight_layout()
    plt.savefig('earthquake_loc_ga.png', bbox_inches='tight', dpi=300)

# Main execution block: Run the GA and visualize results
if __name__ == "__main__":
    eq_loc, stn_locs, noise_level = generate_data()  # Generate synthetic data
    best_params, best_fitness, logbook = run_ga(visualize_steps=True)  # Run the genetic algorithm

    # Visualize results using Mayavi
    plot_results_with_mayavi(eq_loc, stn_locs, best_params, noise_level)

    # Visualize results using Matplotlib
    plot_results(eq_loc, stn_locs, best_params, noise_level)

    # Output the results to the console
    print(f"Actual EQ Location: {eq_loc}")
    print(f"Best Estimated EQ Location: {best_params}")
    print(f"Location Error: {np.abs(np.array(eq_loc) - np.array(best_params[:3]))}")
    print(f"Best Fitness (Error): {best_fitness}")

Here’s a brief explanation of the code used in the process:

  1. Data Generation:
    • The function generate_data() simulates an earthquake event and the corresponding seismic station observations. It introduces errors and noise to mimic real-world data.
  2. Genetic Algorithm:
    • The run_ga() function executes the Genetic Algorithm, using a fitness function (fitness_function()) to evaluate how close each candidate solution is to the actual earthquake location.
  3. Visualization:
    • plot_results_with_mayavi() and plot_results() create visual representations of the results. Mayavi is used for more detailed 3D plots, while Matplotlib provides an alternative.
  4. Error Analysis:
    • The code calculates and visualizes the error between the actual and estimated earthquake locations, providing a clear measure of the GA’s effectiveness.

Generating Synthetic Seismic Data

To tackle the earthquake location problem, we first need to generate synthetic data that mimics real seismic observations. This involves placing seismic stations randomly within a defined grid and simulating the arrival times of seismic waves at these stations based on a known earthquake location.

Here’s a breakdown of the process:

  • Grid Boundaries: The simulation grid is defined with x and y boundaries ranging from -5 to 5 km. The depth is constrained between -3 and 0 km to reflect realistic limits.
  • Station Placement: 30 seismic stations are randomly distributed within the grid.
  • True Earthquake Location: The actual earthquake is set at coordinates (2, 2, -2) km.
  • Arrival Times: The arrival times of seismic waves at each station are calculated using the distance from the earthquake location and the wave velocity. To simulate real-world conditions, we introduce a small amount of noise into these times.

The synthetic data generated serves as the basis for our Genetic Algorithm to estimate the earthquake’s location.

Implementing the Genetic Algorithm

The core of our approach is a Genetic Algorithm that iteratively searches for the best solution to the earthquake location problem. The GA starts with a population of random solutions and evolves them over generations using operations inspired by natural selection: crossover, mutation, and selection.

Here’s a step-by-step explanation:

  • Fitness Function: The fitness function measures how well a candidate solution predicts the observed arrival times. It calculates the error between the observed and predicted times. The goal is to minimize this error.
  • Population Initialization: The algorithm begins with a population of 300 random individuals, each representing a potential earthquake location and other parameters (velocity and origin time).
  • Evolution Process: Over 200 generations, the population evolves through crossover (combining two solutions) and mutation (randomly altering a solution) to produce new solutions. The best solutions are selected to continue to the next generation.
  • Depth Constraints: To ensure realistic results, we apply constraints on the depth of the earthquake during the mutation step.

The GA runs until it finds the best solution, i.e., the set of parameters that minimize the error in predicting the observed arrival times.

Genetic Algorithm Evolution

Visualizing the Results

Once the Genetic Algorithm identifies the best solution, we visualize the estimated earthquake location and compare it to the actual location. The visualization includes:

  • Station Locations: Displayed as blue cubes on a 3D plot.
  • Actual and Estimated Locations: Highlighted as red and green spheres, respectively.
  • Error Ellipsoid: An ellipsoid around the estimated location representing the uncertainty in the solution.

We use Mayavi for interactive 3D visualization and Matplotlib for static 3D plots. These visualizations provide insights into the accuracy of the GA in solving the earthquake location problem.

Earthquake Location using Genetic Algorithm (MATPLOTLIB)
Earthquake Location using Genetic Algorithm (MAYAVI)

Limitations of the Above Example

While the Genetic Algorithm (GA) provides a robust approach to solving the earthquake location problem, the example presented here has several limitations that should be considered when interpreting the results:

  1. Simplified Grid and Depth Constraints:
    • The simulation grid is relatively small, with x and y boundaries ranging from -5 to 5 km, and the depth is constrained between -3 and 0 km. While this setup is suitable for demonstrating the GA, real-world earthquake location problems often involve much larger geographical areas and more complex depth variations. The constrained depth in this example may not fully capture the challenges of locating deeper earthquakes.
  2. Synthetic Data with Simplified Assumptions:
    • The synthetic data generated assumes a constant velocity of seismic waves (6 km/s) and a known origin time. In reality, the Earth’s subsurface is highly heterogeneous, and seismic velocities can vary significantly across different regions and depths. Additionally, the true origin time of an earthquake is often unknown or uncertain, adding another layer of complexity to the problem.
  3. Noise and Error Representation:
    • The example introduces a fixed 5% noise level to simulate observational errors, but in practice, noise in seismic data can be much more complex and variable. Factors such as instrument sensitivity, environmental conditions, and human activities can all contribute to varying noise levels. The simple noise model used here may not fully represent the challenges faced in processing real seismic data.
  4. Limited Number of Seismic Stations:
    • The example uses 30 seismic stations randomly placed within the grid. While this number is sufficient for the demonstration, real-world earthquake monitoring networks often have more stations spread over larger areas, providing more data points and improving location accuracy. The limited number of stations in this example could lead to less accurate estimates.
  5. Fitness Function Simplification:
    • The fitness function used in the GA is based on minimizing the error between observed and predicted arrival times. However, this approach assumes that the only source of error is the difference in arrival times. In reality, other factors, such as incorrect velocity models or mislocated stations, could contribute to errors, making the fitness landscape more complex and difficult to navigate.
  6. Algorithm Parameters and Convergence:
    • The GA parameters, such as population size, mutation rate, and crossover probability, are chosen based on standard practices, but these may not be optimal for all problems. Additionally, the GA may converge to a local minimum rather than the global minimum, especially in complex fitness landscapes. This can result in suboptimal solutions, particularly if the initial population is not diverse enough.
  7. Visualization and Interpretation:
    • While the visualizations provide valuable insights into the estimated earthquake location, they rely on the assumption that the error ellipsoid accurately represents the uncertainty in the solution. In reality, the shape and size of the error ellipsoid can be influenced by many factors, and interpreting these results requires careful consideration of the underlying assumptions.
  8. Scalability to Real-World Applications:
    • The example is designed for educational purposes and may not scale well to real-world applications where the number of stations, data points, and computational resources required are significantly higher. Implementing the GA for large-scale earthquake location problems would require more advanced techniques, such as parallel processing or the use of more sophisticated fitness functions.

Conclusion

The Genetic Algorithm offers a powerful method for solving the earthquake location problem, allowing for accurate and efficient estimations even in complex scenarios. The approach can handle the noise and uncertainty inherent in real seismic data, making it a valuable tool in seismology.

If you’re interested in more details on Genetic Algorithms and their applications, don’t forget to check out my previous post.

Utpal Kumar
Utpal Kumar

Geophysicist | Geodesist | Seismologist | Open-source Developer
I am a geophysicist with a background in computational geophysics, currently working as a postdoctoral researcher at UC Berkeley. My research focuses on seismic data analysis, structural health monitoring, and understanding deep Earth structures. I have had the opportunity to work on diverse projects, from investigating building characteristics using smartphone data to developing 3D models of the Earth's mantle beneath the Yellowstone hotspot.

In addition to my research, I have experience in cloud computing, high-performance computing, and single-board computers, which I have applied in various projects. This includes working with platforms like AWS, Docker, and Kubernetes, as well as supercomputing environments such as STAMPEDE2, ANVIL, Savio and PERLMUTTER (and CORI). My work involves developing innovative solutions for structural health monitoring and advancing real-time seismic response analysis. I am committed to applying these skills to further research in computational seismology and structural health monitoring.

Articles: 32

Leave a Reply

Your email address will not be published. Required fields are marked *