Discrimination of Icequakes and Earthquakes in Southeast Alaska using
Random Forest and Principal Component Analysis
Abstract
Seismic event classification can be challenging in the regions where
different types of seismicity overlap in space, time, and magnitude. In
this paper, I evaluate the performance of a supervised machine learning
technique called Random Forest for the discrimination of icequakes and
earthquakes in southeast Alaska at 15 stations surrounding the region. I
train the Random Forest on about 3000 icequakes and earthquakes that
occurred in the region over the last 17 years. For each event, absolute
frequency spectrum values are considered as input features. The
accuracies at different stations range from 75 to 95% with an average
of about 90%. I conducted tests for selecting the optimum number of
decision trees in the RF model and compared the results obtained by
applying bandpass filters of different frequency bands on input
waveforms. I further experiment by reducing the dimensions of input
features by applying Principal Component Analysis (PCA), and conducted
test for selecting the minimum number of components and the frequency
band that gives the best results. The application of PCA resulted in
slightly better results and a final model that gave the best results
among all the tests was chosen. The accuracy results of the final model
were further analyzed with respect to the amount of available dataset,
the average distance of a station from all the glaciers, and the local
geology.