Abstract
Introduction We compare the CNN model against a simple logistic
regression model to learn the benefits of simple models. For extracting
features, we use the tsfresh package. We find that logistic regression
detects five events compared to six by CNN on an untrained data. It also
takes lesser time to train compared to CNN. Method 1.Data selection and
pre-processing We have used seismic records from the G- network of the
Groningen area for detecting low magnitude earthquakes. It comprises 70
borehole stations, each containing 5 sensors; one is a surface
accelerometer and the other four are velocity sensors installed at 50m
depth intervals. We use data from 47 events recognized in the KNMI
catalog between October 1, 2017 and February 28, 2018. For training and
validating, we use the lowest depth velocity sensor at five stations
(G19, G23, G24, G29, G67) over the 5-months. For testing, we use a
separate four-hour dataset. 2.Feature extraction and analysis Tsfresh
lets us identify and extract relevant features using various statistical
computations like approximate entropy, skewness, variance, standard
deviation etc. It calculated 293 relevant features. We use univariate
selection and correlation analysis to find the best possible combination
and selected top 20 features. 3.Different models used We have compared
the performance of logistics regression and CNN models. The CNN model
consists of 3 convolutional layers, each followed by a max-pooling
layer. Then we flatten the output of last pooling-layer and pass it into
two fully connected layers followed by an output layer which determines
whether the input is signal or noise. For CNN, the Adam optimizer is
used with binary cross entropy as the loss function. Result Two events
were already picked in the KNMI catalogue ( at 00:12:28 and at
00:57:46.) of M1.9 and M2.2 for the test data. Table 1 shows the list of
operations from which we calculate the top 20 features. Table 2 shows
the number of correct predictions, training time, and the time-stamps of
uncatalogued events detected for both models. Both models were able to
detect the events listed in the catalogue in addition to other
uncatalogued events. Conclusion Simple models are easier to understand,
debug, train and interpret than the complex black box models. A detailed
study on diverse data is needed to improve our understanding.