A comparison of central-tendency and interconnectivity approaches to
clustering multivariate data with irregular structure
Abstract
Abstract Questions: Most clustering methods assume data are structured
as discrete hyper-spheroidal clusters to be evaluated by measures of
central-tendency. If vegetation data do not conform to this model, then
vegetation data may be clustered incorrectly. What are the implications
for cluster stability and evaluation if clusters are of irregular shape
or density? Location: Southeast Australia Methods: We define
misplacement as the placement of a sample in a cluster other than
(distinct from) its nearest neighbour and hypothesise that optimising
homogeneity incurs the cost of higher rates of misplacement. The
Chameleon algorithm emphasises interconnectivity and thus is sensitive
to the shape and distribution of clusters. We contrasted its solutions
with those of traditional non-hierarchical and hierarchical
(agglomerative and divisive) approaches. Results: Chameleon-derived
solutions had lower rates of misplacement and only marginally higher
heterogeneity than those of k-means in the range 15–60 clusters, but
their metrics converged with larger numbers of clusters. Solutions
derived by agglomerative clustering had the best metrics (and divisive
clustering the worst) but both produced inferior high-level solutions
clusters to those of Chameleon by merging distantly-related clusters.
Conclusions: Our results suggest that Chameleon may have an advantage
over traditional algorithms at when data exhibit discontinuities and
variable structure, potentially producing more stable solutions (due to
lower rates of misplacement), but scoring lower on traditional metrics
of central-tendency. Chameleon’s advantages are less obvious in the
partitioning of data from continuous gradients, however its graph-based
partitioning protocol facilitates hierarchical integration of solutions.