Rene Steinmann

and 3 more

Continuous seismograms contain a wealth of information with a large variety of signals with different origins. Identifying these signals is a crucial step in understanding physical geological objects. We propose a strategy to identify classes of seismic signals in continuous single-station seismograms in an unsupervised fashion. Our strategy relies on extracting meaningful waveform features based on a deep scattering network combined with an in- dependent component analysis. Based on the extracted features, agglomerative clustering then groups these waveforms in a hierarchical fashion and reveals the process of clustering in a dendrogram. We use the dendrogram to explore the seismic data and identify different classes of signals. To test our strategy, we investigate a two-day-long seismogram collected in the vicinity of the North Anatolian Fault, Turkey. We analyze the automatically inferred clusters' occurrence rate, spectral characteristics, cluster size, and waveform and envelope characteristics. At a low level in the cluster hierarchy, we obtain three clusters related to anthropogenic and ambient seismic noise and one cluster related to earthquake activity. At a high level in the cluster hierarchy, we identify a seismic crisis that includes more than 200 repeating events and high-frequent signals with correlated envelopes and an anthropogenic origin. The application shows that the cluster hierarchy helps to identify particular families of signals and to extract subclusters for further analysis. This is valuable when certain types of signals, such as earthquakes, are under-represented in the data. The proposed method may also successfully discover new types of signals since it is entirely data-driven.
The rate of background seismicity, or the earthquakes not directly triggered by another earthquake, in active seismic regions is indicative of the stressing rate of fault systems. However, aftershock sequences often dominate the seismicity rate, masking this background seismicity. The identification of aftershocks in earthquake catalogs, also known as declustering, is thus an important problem in seismology. Most solutions involve spatio-temporal distances between successive events, such as the Nearest-Neighbor-Distance algorithm widely used in various contexts. This algorithm assumes that the space-time metric follows a bi-modal distribution with one peak related to the background seismicity and another peak representing the aftershocks. Constraining these two distributions is key to accurately identify the aftershocks from the background events. Recent work often uses a linear-splitting based on nearest-neighbor distance threshold, ignoring the overlap between the two populations and resulting in a mis-identification of background earthquakes and aftershock sequences. We revisit this problem here with both machine-learning classification and clustering algorithms. After testing several popular algorithms, we show that a random forest trained with various synthetic catalogs generated by an Epidemic Type Aftershock Sequence model outperforms approaches such as K-means, Gaussian-mixture models, and Support Vector Classifications. We evaluate different data features and discuss their importance in classifying aftershocks. We then apply our model to two different actual earthquake catalogs, the relocated Southern California Earthquake Center catalog and the GeoNet catalog of New Zealand. Our model capably adapts to these two different tectonic contexts, highlighting the differences in aftershock productivity between crustal and intermediate depth seismicity.