Clustering Analysis Methods for GNSS Observations: A Data-Driven
Approach to Identifying California’s Major Faults
Abstract
We present a data-driven approach to clustering or grouping Global
Navigation Satellite System (GNSS) stations according to their observed
velocities, displacements or other selected characteristics. Clustering
GNSS stations has the potential for identifying useful scientific
information, and is a necessary initial step in other analysis, such as
detecting aseismic transient signals (Granat et. al., 2013). Desired
features of the data can be selected for clustering, including some
subset of displacement or velocity components, uncertainty estimates,
station location, and other relevant information. Based on those
selections, the clustering procedure autonomously groups the GNSS
stations according to a selected clustering method. We have implemented
this approach as a Python application, allowing us to draw upon the full
range of open source clustering methods available in Python’s
scikit-learn package (Pedregosa et. al., 2011). The application returns
the stations labeled by group as a table and color coded KML file and is
designed to work with the GNSS information available from GeoGateway
(Heflin et. al., 2020; Donnellan et al, 2021) but is easily extensible.
We focused on California and western Nevada. The results show partitions
that follow faults or geologic boundaries, including for recent large
earthquakes and post-seismic motion. The San Andreas fault system is
most prominent, reflecting Pacific-North American plate boundary motion.
Deformation reflected as class boundaries is distributed north and south
of the central California creeping section. For most models the
southernmost San Andreas fault connects with the Eastern California
Shear Zone (ECSZ) rather than continuing through the San Gorgonio Pass.