The Chameleon algorithm
Chameleon models the feature space as a k-nearest neighbour graph
(sparse graph) with samples forming vertices connected by links that are
proportional to pairwise similarity between samples (Figure 2). The user
specifies the number of links between samples (neighbourhood range) and
then in the first phase, links are progressively dissolved (in order of
increasing similarity) until a user-specified number of sub-partitions
has formed. In this partitioning phase the algorithm seeks to minimise
the summed length of all links hence minimising the affinity between
samples in different sub-partitions (Karypis, 1999). Sub-partitions are
then (optionally) merged using a hierarchical agglomerative clustering
algorithm to resolve the number of groups required of the solution. An
advantage of this approach is that it encapsulates the concept of
environmental /compositional continua by weighting cluster
interconnectivity over homogeneity. That is, samples that are distantly
related may still share a cluster if they are linked by strongly
interconnected neighbours. One of the key features of the Chameleon
algorithm is that the structure of inter-sample relationships is
preserved through the partitioning phase because co-membership of
sub-partitions is dependent on pairwise inter-sample connectivity. In
contrast, traditional algorithms merge or split samples progressively
and the outcome at each step depends on comparing samples with
intermediate clusters, the compositional characteristics of which are
artificial and reflect the range of the samples merged (Han et
al ., 2012).