An algorithm for unsupervised partitioning of geoscientific datasets
using flexible similarity metrics
Abstract
A simple yet flexible and robust unsupervised classification algorithm
is described for efficiently partitioning a data set into compact,
non-overlapping groups or classes based on pairwise similarity. Unlike
most clustering algorithms, there is no assumption that natural clusters
exist in the dataset, though some clusters, when present, may be
preferentially assigned to one or more classes. The method also does not
require data objects to be compared within any coordinate system but
rather permits the user to quantify pairwise similarity using almost any
conceivable criterion. For all of the above reasons, the method lends
itself to certain geoscientific applications for which conventional
clustering methods are unsuited, including two non-trivial and
distinctly different datasets presented as examples. The computer memory
required for the user-defined similarity matrix is 4N^2 bytes and is
the sole practical limitation on the size N of the dataset that can be
directly classified. Much larger data sets can be readily accommodated
by assigning members to classes previously determined from a
representative subset.