The Pairwise Similarity Partitioning algorithm: a method for
unsupervised partitioning of geoscientific and other datasets using
arbitrary similarity metrics
- Grant Petty

Grant Petty

University of Wisconsin-Madison, University of Wisconsin-Madison
Corresponding Author:gwpetty@wisc.edu
Author ProfileAbstract
A simple yet flexible and robust algorithm is described for fully
partitioning an arbitrary dataset into compact, non-overlapping groups
or classes, sorted by size, based entirely on a pairwise similarity
matrix and a user-specified similarity threshold. Unlike many clustering
algorithms, there is no assumption that natural clusters exist in the
dataset, though clusters, when present, may be preferentially assigned
to one or more classes. The method also does not require data objects to
be compared within any coordinate system but rather permits the user to
define pairwise similarity using almost any conceivable criterion. The
method therefore lends itself to certain geoscientific applications for
which conventional clustering methods are unsuited, including two
non-trivial and distinctly different datasets presented as examples. In
addition to identifying large classes containing numerous similar
dataset members, it is also well-suited for isolating rare or anomalous
members of a dataset. The method is inductive, in that prototypes
identified in representative subset of a larger dataset can be used to
classify the remainder.