Empirical Orthogonal Function analysis to inspect the spatial coherency in the geospatial data

Empirical Orthogonal Functions analysis decomposes the continuous space-time field into a set of orthogonal spatial patterns along with a set of associated uncorrelated time series or principal components. Introductory concepts of EOF analysis

Empirical Orthogonal Functions analysis

Empirical Orthogonal Functions analysis decomposes the continuous space-time field into a set of orthogonal spatial patterns along with a set of associated uncorrelated time series or principal components (Fukuoka 1951; Lorenz 1956). It is a form of PCA but not restricted to multivariate statistics. Similar to PCA, it can also be used to reduce the large number of variables of the original data to a fewer variables (dimensionality reduction), without much compromising the variability in the data. This approach is essential for big data analysis or in machine learning applications. EOFs are multipurpose and have been used widely for pattern or feature extraction from the data (Fukunaga & Koontz 1970).

The continuous space-time field, X(t,\vec{s}) can be expressed as

where M is the number of modes, u_k(\vec{s}) is the basis function of space and c_k(t) is the expansion function of time. The aim of EOF analysis is to find a set of new variables that captures most of the observed variance in the data through linear combinations of the original variables.

Formulation and computation of EOFs

To apply EOF analysis, I first construct the data matrix F, which contains all the detrended and demeaned values for each three components (north, east, or vertical) where columns correspond to each selected station (say, n stations) and each row is the daily time data snapshot (say, spanning p days).

where the value of the field at discrete time t_i and spatial grid point s_j is denoted by f_{ij}. For i = 1, \dots , p and j = 1, \dots , n. I obtain the sample covariance matrix, B as

Since the EOF aims to find the uncorrelated linear combinations of different variables that explain maximum variance. This translates mathematically to find a unit length direction \vec{u} such that F\vec{u} has maximum variability. This can be expressed as

The above equation can be achieved by considering an eigenvalue problem b_{ij}\vec{u} = \lambda^{2}\vec{u}. Here, I decompose the data matrix F directly, using the Singular Value Decomposition approach (Golub & Van Loan 1996), as

where A and U are respectively p \times r and r \times n unitary matrix ( U^TU = A^TA = I_r) and r \leq min(n, p) is the rank of F. The diagonal matrix \Lambda = Diag(\lambda_1,\lambda_2,\dots,\lambda_r). The columns of A (a_1, a_2,\dots,a_r), and U (u_1, u_2,\dots,u_r) respectively are the EOFs and principal components (PCs) of the data matrix F. To improve the computation of the above equation, it is advisable to take the min(p,n) as the first dimension of F. The EOFs, A are orthogonal and the PCs, U are uncorrelated. Hence, the Equation above yields the decomposition

Accounted variance

The principal components and the corresponding eigenvectors respectively represent the temporal and spatial part of the given input data matrix F. I arrange the eigenvectors in decreasing order of eigenvalues, such that the first PC represents the biggest contributor to the variance of the space-time field. It is conventional to write the accounted variance in percentage as

The spectrum of the covariance matrix B provides information on the distribution of power (or energy) as a function of scale, and on the separation (or degeneracy) of the EOF patterns (represented as modes). The high and low power respectively are most commonly associated with the low and high-frequency variability. Hence, low frequency (and large scale) patterns tend to capture most of the observed variance in the system.

If two eigenvalues are degenerate (or indistinguishable within estimated uncertainties), then their corresponding EOF pattern can possibly mix and may not be particularly interesting.

Application on the GPS data of Taiwan (Kumar et al., 2020)

I constructed the space-time data matrix of the GPS residuals for the daily time samples for 10 years, and 47 selected stations. EOF retrieves the coherent spatio-temporal signals mathematically by solving for a series of eigenmodes of the covariance matrix expressed in standing oscillations in the form of the product of spatial pattern and time series for a given target region and selected timespan (Chao & Liau 2019; Chang & Chao 2014). I arranged the eigenmodes in decreasing order of (the positive) eigenvalues, such that the first eigenmode represents the biggest contributor to the variance (measured by the ratio of the eigenvalue for the given mode index to the sum of all eigenvalues) of the GPS residuals.

Here the first mode is for the CME, while the higher modes reflect secondary effects or local and short-period signals or noises not of interest here. The calculated eigenvalue spectrum of the EOF along with their standard errors (calculated using Monte Carlo simulations) shows that the first mode is nondegenerate and separated from the rest.

I normalize the spatial pattern for the first mode by dividing with the RMS value (given that it is near-uniform in space), whereas for the second mode by dividing with the standard deviation (given its fluctuation around 0). The corresponding time series, or principal components, are scaled by multiplying by the same normalization value, transferring the magnitude information of eigenvectors into the principal components, which hence has the same unit as the data (mm in our study).

The first two EOF modes of the CGPS residuals. The EOF decomposes the data into a linear combination of orthogonal modes (Kumar et al., 2020)

MATLAB Function for EOF analysis

Consider citing Kumar et al., 2020, if you use the codes.

function [eigenValues, varpercent, eofMapPattern, eof_temp_coeff] = eof_n_optimizado_A(data)


Z=detrend(Matrix,'constant'); %removes the mean value from each column of the matrix

% the matrix dimensions: 
% n = number of temporal samples
% p = number of points with data
[n,p] = size(Z);

% If n>=p we estimate the EOF in the State Space Setting, i.e., using Z'*Z which is p x p.
% If n<p  we estimate the EOF in the Sample Space Setting, i.e., using Z*Z' which is n x n 
if n>=p
    
    %%%%%%%%%%%%%%%%%%%%%%%%%%%
    %%% State Space Setting %%%
    %%%%%%%%%%%%%%%%%%%%%%%%%%%
    
    % Covariance matrix
    R=Z'*Z;
    
    % We estimate the eigenvalues and eigenvectors of the covariance matrix.
    [eof_p,L]=eigs(R, eye(size(R)),10);       % The eigenvelues are in the diagonal of L
                            % and the associated eigenvectors are in the columns of eof_p

    eigenValues=diag(L);                    
    
    % We estimate the expansion coefficients (time series) associated to each EOF
    [a,b]=size(L);
    for i=1:a
        exp_coef(:,i)= Z * eof_p(:,i);
    end
    
elseif n<p
    
    %%%%%%%%%%%%%%%%%%%%%%%%%%%%
    %%% Sample Space Setting %%%
    %%%%%%%%%%%%%%%%%%%%%%%%%%%%
    fprintf('Sample space setting\n');
    % Scatter matrix
    R=Z*Z';
    
    % We estimate the eigenvalues and eigenvectors of the scatter matrix.
    [eof_p_f,L]=eigs(R, eye(size(R)),10);     % The eigenvelues are in the diagonal of M
                            % and the associated eigenvectors are in the columns of eof_p_f
                            
    eigenValues=diag(L);
    
    % We estimate the expansion coefficients (time series) associated to each EOF
    [a,b]=size(L);
    for i=1:a
        exp_coef_b(:,i)= Z' * eof_p_f(:,i);
    end
    
    % We transform the EOF and their associated expansion coefficient (time series) in the Sample 
    % space setting to those in the State space setting
    
    %Expansion coefficients
    for i=1:a
        eof_p(:,i)= exp_coef_b(:,i) / sqrt(eigenValues(i));
    end
    
    %Expansion coefficients
    for i=1:a
        exp_coef(:,i)= sqrt(eigenValues(i)) * eof_p_f(:,i);
    end
    
end


% We estimate the percentage of variance explained by each EOF/eigenvector
varpercent = ( diag(L)/trace(L) )*100;

%We normalized the EOF
[a_1,b_1]=size(eof_p);
[a_11,b_11]=size(exp_coef);
eofMapPattern=eof_p;
eof_temp_coeff=exp_coef;

for i=1:b_1
    
    eofMapPattern(:,i) = eof_p(:,i)./ rms(eof_p(:,i));
    eof_temp_coeff(:,i) = exp_coef(:,i).* rms(eof_p(:,i));

%     using standard deviation
%     eofMapPattern(:,i) = eof_p(:,i)./ std(exp_coef(:,i));
%     eof_temp_coeff(:,i) = exp_coef(:,i) .* std(exp_coef(:,i));

    
end

References

Kumar, U., Chao, B. F., & Chang, E. T.-Y. Y. (2020). What Causes the Common‐Mode Error in Array GPS Displacement Fields: Case Study for Taiwan in Relation to Atmospheric Mass Loading. Earth and Space Science, 0–2. https://doi.org/10.1029/2020ea001159
Lorenz, E. N. (1956). Empirical Orthogonal Functions and Statistical Weather Prediction. Technical Report Statistical Forecast Project Report 1 Department of Meteorology MIT 49, 1(Scientific Report No. 1, Statistical Forecasting Project), 52.
Chao, B. F., & Liau, J. R. (2019). Gravity Changes Due to Large Earthquakes Detected in GRACE Satellite Data via Empirical Orthogonal Function Analysis. Journal of Geophysical Research: Solid Earth, 124(3), 3024–3035. https://doi.org/10.1029/2018JB016862
Fukuoka, A. (1951). The Central Meteorological Observatory, A study on 10-day forecast (A synthetic report). Geophysical Magazine, 22(3), 177–208. http://ci.nii.ac.jp/naid/40018672317/en/
Chang, E. T. Y., & Chao, B. F. (2014). Analysis of coseismic deformation using EOF method on dense, continuous GPS data in Taiwan. Tectonophysics, 637(C), 106–115. https://doi.org/10.1016/j.tecto.2014.09.011
Fukunaga, K., & Koontz, W. L. G. (1970). Application of the Karhunen-Loeve expansion to feature selection and ordering. IEEE Transactions on Computers, 100(4), 311–318.
Golub, G., & Van Loan, C. (1996). Matrix computations. Matrix, 1000(13), 9.
Utpal Kumar
Utpal Kumar

Geophysicist | Geodesist | Seismologist | Open-source Developer
I am a geophysicist with a background in computational geophysics, currently working as a postdoctoral researcher at UC Berkeley. My research focuses on seismic data analysis, structural health monitoring, and understanding deep Earth structures. I have had the opportunity to work on diverse projects, from investigating building characteristics using smartphone data to developing 3D models of the Earth's mantle beneath the Yellowstone hotspot.

In addition to my research, I have experience in cloud computing, high-performance computing, and single-board computers, which I have applied in various projects. This includes working with platforms like AWS, Docker, and Kubernetes, as well as supercomputing environments such as STAMPEDE2, ANVIL, Savio and PERLMUTTER (and CORI). My work involves developing innovative solutions for structural health monitoring and advancing real-time seismic response analysis. I am committed to applying these skills to further research in computational seismology and structural health monitoring.

Articles: 29

Leave a Reply

Your email address will not be published. Required fields are marked *