Transfer Learning to Build a Scalable Model for the Declustering of
Earthquake Catalogs
Abstract
The rate of background seismicity, or the earthquakes not directly
triggered by another earthquake, in active seismic regions is indicative
of the stressing rate of fault systems. However, aftershock sequences
often dominate the seismicity rate, masking this background seismicity.
The identification of aftershocks in earthquake catalogs, also known as
declustering, is thus an important problem in seismology. Most solutions
involve spatio-temporal distances between successive events, such as the
Nearest-Neighbor-Distance algorithm widely used in various contexts.
This algorithm assumes that the space-time metric follows a bi-modal
distribution with one peak related to the background seismicity and
another peak representing the aftershocks. Constraining these two
distributions is key to accurately identify the aftershocks from the
background events. Recent work often uses a linear-splitting based on
nearest-neighbor distance threshold, ignoring the overlap between the
two populations and resulting in a mis-identification of background
earthquakes and aftershock sequences. We revisit this problem here with
both machine-learning classification and clustering algorithms. After
testing several popular algorithms, we show that a random forest trained
with various synthetic catalogs generated by an Epidemic Type Aftershock
Sequence model outperforms approaches such as K-means, Gaussian-mixture
models, and Support Vector Classifications. We evaluate different data
features and discuss their importance in classifying aftershocks.
We then apply our model to two different actual earthquake catalogs, the
relocated Southern California Earthquake Center catalog and the GeoNet
catalog of New Zealand. Our model capably adapts to these two different
tectonic contexts, highlighting the differences in aftershock
productivity between crustal and intermediate depth seismicity.