*Genetic Algorithm explained in the context of earthquake location problem*. https://doi.org/10.5281/zenodo.4679544

## Introduction

Determining the exact location of an earthquake based on seismic data is a complex problem that plays a crucial role in seismology. This challenge, known as the “earthquake location problem,” involves estimating the earthquake’s origin coordinates and depth using observed arrival times of seismic waves at various stations. Traditional methods often require simplifications or assumptions that can limit accuracy. However, more sophisticated approaches like the Genetic Algorithm (GA) offer a robust alternative by iteratively improving estimates based on natural selection principles.

In this article, I’ll walk you through the earthquake location problem and demonstrate how to solve it using a Genetic Algorithm. If you’re unfamiliar with the basics of Genetic Algorithms, I recommend checking out my previous post for a detailed introduction.

## Complete Script

```
import os
import numpy as np
import pandas as pd
import numpy.linalg as linalg
from deap import base, creator, tools, algorithms
from mayavi import mlab
import noise
import matplotlib.pyplot as plt
np.random.seed(0)
# Define the boundaries of the grid to be used in simulations
minx, maxx = -5, 5
miny, maxy = -5, 5
# Set constraints on depth, reflecting realistic limits
depth_min = -3
depth_max = 0
# Create a directory to save intermediate steps of the genetic algorithm
output_dir = "ga_steps"
os.makedirs(output_dir, exist_ok=True)
# Part 1: Generate synthetic data representing an earthquake and seismic stations
def generate_data():
error = 0.05 # Introduce a 5% error in the initial guess
numstations = 30 # Define the number of seismic stations
# Randomly place seismic stations within the defined grid boundaries
stn_locs = []
xvals = minx + (maxx - minx) * np.random.rand(numstations)
yvals = miny + (maxy - miny) * np.random.rand(numstations)
for num in range(numstations):
stn_locs.append([xvals[num], yvals[num], 0])
# Define the actual earthquake location
eq_loc = [2, 2, -2] # Coordinates in km
vel = 6 # Velocity in km/s
origintime = 0 # Origin time of the earthquake
# Function to calculate arrival times of seismic waves at the stations
def calc_arrival_time(eq_loc, stnx, stny, stnz, vel, origintime):
eqx, eqy, eqz = eq_loc
dist = np.sqrt((eqx - stnx) ** 2 + (eqy - stny) ** 2 + (eqz - stnz) ** 2)
arr = dist / vel + origintime
return arr
# Generate an initial guess of the earthquake location with some error
eq_initial_guess = [eq_loc[0] + error * eq_loc[0], eq_loc[1] + error * eq_loc[1], eq_loc[2] + error * eq_loc[2]]
m_intial = [eq_initial_guess[0], eq_initial_guess[1], eq_initial_guess[2], vel + error * vel, origintime + error * origintime]
d_obs = [] # Observed arrival times
d_pre = [] # Predicted arrival times based on the initial guess
noise_level_data = 0.05 # Introduce a 5% noise level
# Calculate observed and predicted arrival times at each station
for stnx, stny, stnz in stn_locs:
arr = calc_arrival_time(eq_loc, stnx, stny, stnz, vel, origintime)
noise_val = np.random.normal(loc=0, scale=noise_level_data * arr) # Add random noise to simulate real data
d_obs.append(arr + noise_val)
d_pre.append(calc_arrival_time(eq_initial_guess, stnx, stny, stnz, m_intial[3], m_intial[4]))
# Save the observed and predicted data to CSV files
arr_df = pd.DataFrame({'dobs': d_obs, 'd_pre': d_pre})
arr_df.to_csv('arrival_times.csv', index=False)
stn_locs = np.array(stn_locs) # Convert stn_locs to a NumPy array
station_loc_df = pd.DataFrame({'xloc': stn_locs[:, 0], 'yloc': stn_locs[:, 1], 'zloc': stn_locs[:, 2]})
station_loc_df.to_csv('station_locations.csv', index=False)
return eq_loc, stn_locs, noise_level_data
# Part 2: Implement the Genetic Algorithm (GA) for earthquake location estimation
def read_arrivaltimes(filename):
data = np.loadtxt(filename, delimiter=',', skiprows=1)
dobs = data[:, 0]
return dobs
def read_stnloc(filename):
station_locations = np.loadtxt(filename, delimiter=',', skiprows=1)
return station_locations
def fitness_function(params):
# Apply depth constraints to the parameters
if params[2] < depth_min or params[2] > depth_max:
return float('inf'), # Penalize solutions that violate the depth constraint
station_locations = read_stnloc('station_locations.csv')
observed_arrival_times = read_arrivaltimes('arrival_times.csv')
# Extract parameters from the individual
x, y, z, velocity, origin_time = params
predicted_arrival_times = []
# Calculate predicted arrival times based on the current guess
for station in station_locations:
dist = np.sqrt((x - station[0]) ** 2 + (y - station[1]) ** 2 + (z - station[2]) ** 2)
arrival_time = dist / velocity + origin_time
predicted_arrival_times.append(arrival_time)
# Calculate the error between observed and predicted arrival times
predicted_arrival_times = np.array(predicted_arrival_times)
error = np.sum((observed_arrival_times - predicted_arrival_times) ** 2)
return error,
# Run the genetic algorithm to optimize earthquake location
def run_ga(visualize_steps=False):
# Define a minimization problem with fitness based on minimizing the error
if "FitnessMin" not in creator.__dict__:
creator.create("FitnessMin", base.Fitness, weights=(-1.0,))
if "Individual" not in creator.__dict__:
creator.create("Individual", list, fitness=creator.FitnessMin)
# Set up the genetic algorithm toolbox
toolbox = base.Toolbox()
toolbox.register("attr_float", np.random.uniform, -3, 3) # Generate random floats within a specified range
toolbox.register("individual", tools.initRepeat, creator.Individual, toolbox.attr_float, n=5) # Create individuals
toolbox.register("population", tools.initRepeat, list, toolbox.individual) # Create populations
toolbox.register("mate", tools.cxBlend, alpha=0.5) # Define crossover operation
toolbox.register("mutate", tools.mutGaussian, mu=0, sigma=1, indpb=0.2) # Define mutation operation
toolbox.register("select", tools.selTournament, tournsize=3) # Define selection operation
toolbox.register("evaluate", fitness_function) # Evaluate fitness
# Initialize the population and parameters for the genetic algorithm
population = toolbox.population(n=300)
n_generations = 200
cxpb = 0.7 # Crossover probability
mutpb = 0.3 # Mutation probability
# Track the best individual and statistics during the run
hall_of_fame = tools.HallOfFame(1)
stats = tools.Statistics(lambda ind: ind.fitness.values)
stats.register("min", np.min)
stats.register("mean", np.mean)
logbook = tools.Logbook()
logbook.header = ["gen", "min", "mean"]
# Main loop to run the genetic algorithm
for gen in range(n_generations):
population, log = algorithms.eaSimple(population, toolbox, cxpb=cxpb, mutpb=mutpb, ngen=1, verbose=False)
# Apply depth constraints after mutation
for ind in population:
if ind[2] < depth_min:
ind[2] = depth_min
elif ind[2] > depth_max:
ind[2] = depth_max
# Record statistics and update the hall of fame
record = stats.compile(population)
logbook.record(gen=gen, **record)
hall_of_fame.update(population)
# Optionally visualize intermediate steps
if visualize_steps and gen % 10 == 0:
save_intermediate_plot(population, hall_of_fame, gen)
# Return the best solution found by the algorithm
best_individual = hall_of_fame[0]
best_fitness = fitness_function(best_individual)[0]
return best_individual, best_fitness, logbook
# Placeholder for intermediate plot saving, if needed for debugging or analysis
def save_intermediate_plot(population, hall_of_fame, gen):
pass
# Part 3: Estimate an ellipsoid around the best estimated earthquake location
def estimate_ellipsoid(center, location_error, noise_level):
# Combine location error and noise level to form the ellipsoid's radii
coeffs = location_error + noise_level
A = np.array([[coeffs[0], 0, 0], [0, coeffs[1], 0], [0, 0, coeffs[2]]])
U, s, rotation = linalg.svd(A)
# Handle potential division by zero
s = np.where(s == 0, 1e-10, s)
radii = 1.0 / np.sqrt(s)
u = np.linspace(0.0, 2.0 * np.pi, 100) # Generate parametric values for the ellipsoid
v = np.linspace(0.0, np.pi, 100)
xx = radii[0] * np.outer(np.cos(u), np.sin(v))
yy = radii[1] * np.outer(np.sin(u), np.sin(v))
zz = radii[2] * np.outer(np.ones_like(u), np.cos(v))
# Rotate and translate the ellipsoid to the estimated location
for i in range(len(u)):
for j in range(len(v)):
[xx[i, j], yy[i, j], zz[i, j]] = np.dot([xx[i, j], yy[i, j], zz[i, j]], rotation) + center
return xx, yy, zz
# Part 4: Visualization of results with Mayavi, including rough topography
def generate_rough_topography(X, Y, scale=1.0, octaves=6, persistence=0.5, lacunarity=2.0):
# Generate a rough topography using Perlin noise
Z = np.zeros_like(X)
for i in range(X.shape[0]):
for j in range(X.shape[1]):
Z[i, j] = noise.pnoise2(X[i, j] * scale, Y[i, j] * scale, octaves=octaves,
persistence=persistence, lacunarity=lacunarity)
return Z
def plot_results_with_mayavi(eq_loc, stn_locs, best_params, noise_level):
location_error = np.abs(np.array(eq_loc) - np.array(best_params[:3]))
# Initialize the Mayavi figure
fig = mlab.figure(size=(1000, 800), bgcolor=(0.2, 0.2, 0.2))
# Generate and plot the rough topography surface
X, Y = np.mgrid[minx:maxx:200j, miny:maxy:200j]
Z = generate_rough_topography(X, Y, scale=0.5)
mlab.surf(X, Y, Z, colormap='terrain', warp_scale=1.0)
# Plot the locations of seismic stations as blue cubes
mlab.points3d(stn_locs[:, 0], stn_locs[:, 1], stn_locs[:, 2],
color=(0.1, 0.5, 1), scale_factor=0.3, mode='cube', opacity=1)
# Highlight the actual earthquake location in red
mlab.points3d(eq_loc[0], eq_loc[1], eq_loc[2],
color=(1, 0.3, 0.3), scale_factor=0.6, mode='sphere')
# Show the estimated earthquake location in green
mlab.points3d(best_params[0], best_params[1], best_params[2],
color=(0.3, 1, 0.3), scale_factor=0.6, mode='sphere')
# Visualize the error as an ellipsoid around the estimated location
if np.any(location_error + noise_level > 0):
center = [best_params[0], best_params[1], best_params[2]]
xx, yy, zz = estimate_ellipsoid(center, location_error, [noise_level] * 3)
ellipsoid = mlab.mesh(xx, yy, zz, color=(1, 1, 0.6), opacity=0.6)
# Enhance the ellipsoid's appearance with advanced lighting and shading
ellipsoid.actor.property.interpolation = 'phong'
ellipsoid.actor.property.specular = 0.1
ellipsoid.actor.property.specular_power = 20
ellipsoid.actor.property.diffuse = 0.5
ellipsoid.actor.property.ambient = 0.3
# Set up the camera view and add axes and grid lines for context
mlab.view(azimuth=45, elevation=75, distance=30)
mlab.axes(color=(1, 1, 1), xlabel='X (km)', ylabel='Y (km)', zlabel='Depth (km)',
x_axis_visibility=True, y_axis_visibility=True, z_axis_visibility=True)
mlab.outline(color=(1, 1, 1))
# Save the final Mayavi plot
mlab.savefig('earthquake_loc_ga_mayavi.png', magnification=3)
mlab.clf() # Clear the figure after saving
# Alternative: Visualization of results with Matplotlib
def plot_results(eq_loc, stn_locs, best_params, noise_level):
location_error = np.abs(np.array(eq_loc) - np.array(best_params[:3]))
X = np.linspace(minx, maxx, 200)
Y = np.linspace(miny, maxy, 200)
X, Y = np.meshgrid(X, Y)
Z = (X**2 + Y**2)*0
fig = plt.figure()
ax = plt.axes(projection='3d')
# Generate and plot rough topography using Matplotlib
X, Y = np.mgrid[minx:maxx:200j, miny:maxy:200j]
Z = generate_rough_topography(X, Y, scale=0.5)
ax.plot_surface(X, Y, Z, rstride=1, cstride=1, cmap='terrain', edgecolor='none', alpha=0.7)
# Slightly elevate station locations to ensure they are visible
elevated_z = np.array([x[2] + 0.05 for x in stn_locs])
# Plot the stations as blue squares
ax.scatter([x[0] for x in stn_locs], [x[1] for x in stn_locs], elevated_z,
c='blue', marker='s', s=120, label="Stations", edgecolors='k')
# Highlight the actual and estimated earthquake locations
ax.scatter(eq_loc[0], eq_loc[1], eq_loc[2], c='red', marker='o', s=100,
label=f'Actual EQ location ({eq_loc[0]:.2f}, {eq_loc[1]:.2f}, {eq_loc[2]:.2f})')
ax.scatter(best_params[0], best_params[1], best_params[2], c='green', marker='*', s=150,
label=f'Estimated EQ location ({best_params[0]:.2f}, {best_params[1]:.2f}, {best_params[2]:.2f})')
# Display the error ellipsoid in yellow
if np.any(location_error + noise_level > 0):
center = [best_params[0], best_params[1], best_params[2]]
xx, yy, zz = estimate_ellipsoid(center, location_error, [noise_level] * 3)
ax.plot_surface(xx, yy, zz, rstride=4, cstride=4, color='yellow', alpha=0.3, edgecolor='none')
# Set axis labels, limits, and adjust the view
ax.set_xlim([minx, maxx])
ax.set_ylim([miny, maxy])
ax.set_zlim([-5, 0.1])
ax.set_xlabel('X Coordinate')
ax.set_ylabel('Y Coordinate')
ax.set_zlabel('Depth')
ax.view_init(elev=30, azim=60)
# Add legend and save the plot
ax.legend(loc="lower left")
plt.tight_layout()
plt.savefig('earthquake_loc_ga.png', bbox_inches='tight', dpi=300)
# Main execution block: Run the GA and visualize results
if __name__ == "__main__":
eq_loc, stn_locs, noise_level = generate_data() # Generate synthetic data
best_params, best_fitness, logbook = run_ga(visualize_steps=True) # Run the genetic algorithm
# Visualize results using Mayavi
plot_results_with_mayavi(eq_loc, stn_locs, best_params, noise_level)
# Visualize results using Matplotlib
plot_results(eq_loc, stn_locs, best_params, noise_level)
# Output the results to the console
print(f"Actual EQ Location: {eq_loc}")
print(f"Best Estimated EQ Location: {best_params}")
print(f"Location Error: {np.abs(np.array(eq_loc) - np.array(best_params[:3]))}")
print(f"Best Fitness (Error): {best_fitness}")
```

Here’s a brief explanation of the code used in the process:

**Data Generation**:- The function
`generate_data()`

simulates an earthquake event and the corresponding seismic station observations. It introduces errors and noise to mimic real-world data.

- The function
**Genetic Algorithm**:- The
`run_ga()`

function executes the Genetic Algorithm, using a fitness function (`fitness_function()`

) to evaluate how close each candidate solution is to the actual earthquake location.

- The
**Visualization**:`plot_results_with_mayavi()`

and`plot_results()`

create visual representations of the results. Mayavi is used for more detailed 3D plots, while Matplotlib provides an alternative.

**Error Analysis**:- The code calculates and visualizes the error between the actual and estimated earthquake locations, providing a clear measure of the GA’s effectiveness.

### Generating Synthetic Seismic Data

To tackle the earthquake location problem, we first need to generate synthetic data that mimics real seismic observations. This involves placing seismic stations randomly within a defined grid and simulating the arrival times of seismic waves at these stations based on a known earthquake location.

Here’s a breakdown of the process:

**Grid Boundaries**: The simulation grid is defined with x and y boundaries ranging from -5 to 5 km. The depth is constrained between -3 and 0 km to reflect realistic limits.**Station Placement**: 30 seismic stations are randomly distributed within the grid.**True Earthquake Location**: The actual earthquake is set at coordinates (2, 2, -2) km.**Arrival Times**: The arrival times of seismic waves at each station are calculated using the distance from the earthquake location and the wave velocity. To simulate real-world conditions, we introduce a small amount of noise into these times.

The synthetic data generated serves as the basis for our Genetic Algorithm to estimate the earthquake’s location.

### Implementing the Genetic Algorithm

The core of our approach is a Genetic Algorithm that iteratively searches for the best solution to the earthquake location problem. The GA starts with a population of random solutions and evolves them over generations using operations inspired by natural selection: crossover, mutation, and selection.

Here’s a step-by-step explanation:

**Fitness Function**: The fitness function measures how well a candidate solution predicts the observed arrival times. It calculates the error between the observed and predicted times. The goal is to minimize this error.**Population Initialization**: The algorithm begins with a population of 300 random individuals, each representing a potential earthquake location and other parameters (velocity and origin time).**Evolution Process**: Over 200 generations, the population evolves through crossover (combining two solutions) and mutation (randomly altering a solution) to produce new solutions. The best solutions are selected to continue to the next generation.**Depth Constraints**: To ensure realistic results, we apply constraints on the depth of the earthquake during the mutation step.

The GA runs until it finds the best solution, i.e., the set of parameters that minimize the error in predicting the observed arrival times.

### Visualizing the Results

Once the Genetic Algorithm identifies the best solution, we visualize the estimated earthquake location and compare it to the actual location. The visualization includes:

**Station Locations**: Displayed as blue cubes on a 3D plot.**Actual and Estimated Locations**: Highlighted as red and green spheres, respectively.**Error Ellipsoid**: An ellipsoid around the estimated location representing the uncertainty in the solution.

We use Mayavi for interactive 3D visualization and Matplotlib for static 3D plots. These visualizations provide insights into the accuracy of the GA in solving the earthquake location problem.

## Limitations of the Above Example

While the Genetic Algorithm (GA) provides a robust approach to solving the earthquake location problem, the example presented here has several limitations that should be considered when interpreting the results:

**Simplified Grid and Depth Constraints**:- The simulation grid is relatively small, with x and y boundaries ranging from -5 to 5 km, and the depth is constrained between -3 and 0 km. While this setup is suitable for demonstrating the GA, real-world earthquake location problems often involve much larger geographical areas and more complex depth variations. The constrained depth in this example may not fully capture the challenges of locating deeper earthquakes.

**Synthetic Data with Simplified Assumptions**:- The synthetic data generated assumes a constant velocity of seismic waves (6 km/s) and a known origin time. In reality, the Earth’s subsurface is highly heterogeneous, and seismic velocities can vary significantly across different regions and depths. Additionally, the true origin time of an earthquake is often unknown or uncertain, adding another layer of complexity to the problem.

**Noise and Error Representation**:- The example introduces a fixed 5% noise level to simulate observational errors, but in practice, noise in seismic data can be much more complex and variable. Factors such as instrument sensitivity, environmental conditions, and human activities can all contribute to varying noise levels. The simple noise model used here may not fully represent the challenges faced in processing real seismic data.

**Limited Number of Seismic Stations**:- The example uses 30 seismic stations randomly placed within the grid. While this number is sufficient for the demonstration, real-world earthquake monitoring networks often have more stations spread over larger areas, providing more data points and improving location accuracy. The limited number of stations in this example could lead to less accurate estimates.

**Fitness Function Simplification**:- The fitness function used in the GA is based on minimizing the error between observed and predicted arrival times. However, this approach assumes that the only source of error is the difference in arrival times. In reality, other factors, such as incorrect velocity models or mislocated stations, could contribute to errors, making the fitness landscape more complex and difficult to navigate.

**Algorithm Parameters and Convergence**:- The GA parameters, such as population size, mutation rate, and crossover probability, are chosen based on standard practices, but these may not be optimal for all problems. Additionally, the GA may converge to a local minimum rather than the global minimum, especially in complex fitness landscapes. This can result in suboptimal solutions, particularly if the initial population is not diverse enough.

**Visualization and Interpretation**:- While the visualizations provide valuable insights into the estimated earthquake location, they rely on the assumption that the error ellipsoid accurately represents the uncertainty in the solution. In reality, the shape and size of the error ellipsoid can be influenced by many factors, and interpreting these results requires careful consideration of the underlying assumptions.

**Scalability to Real-World Applications**:- The example is designed for educational purposes and may not scale well to real-world applications where the number of stations, data points, and computational resources required are significantly higher. Implementing the GA for large-scale earthquake location problems would require more advanced techniques, such as parallel processing or the use of more sophisticated fitness functions.

**Conclusion**

The Genetic Algorithm offers a powerful method for solving the earthquake location problem, allowing for accurate and efficient estimations even in complex scenarios. The approach can handle the noise and uncertainty inherent in real seismic data, making it a valuable tool in seismology.

If you’re interested in more details on Genetic Algorithms and their applications, don’t forget to check out my previous post.