Essential Site Maintenance: Authorea-powered sites will be updated circa 15:00-17:00 Eastern on Tuesday 5 November.
There should be no interruption to normal services, but please contact us at [email protected] in case you face any issues.

loading page

A comparison of central-tendency and interconnectivity approaches to clustering multivariate data with irregular structure
  • Mark Tozer,
  • David Keith
Mark Tozer
University of New South Wales

Corresponding Author:[email protected]

Author Profile
David Keith
University of New South Wales
Author Profile

Abstract

Abstract Questions: Most clustering methods assume data are structured as discrete hyper-spheroidal clusters to be evaluated by measures of central-tendency. If vegetation data do not conform to this model, then vegetation data may be clustered incorrectly. What are the implications for cluster stability and evaluation if clusters are of irregular shape or density? Location: Southeast Australia Methods: We define misplacement as the placement of a sample in a cluster other than (distinct from) its nearest neighbour and hypothesise that optimising homogeneity incurs the cost of higher rates of misplacement. The Chameleon algorithm emphasises interconnectivity and thus is sensitive to the shape and distribution of clusters. We contrasted its solutions with those of traditional non-hierarchical and hierarchical (agglomerative and divisive) approaches. Results: Chameleon-derived solutions had lower rates of misplacement and only marginally higher heterogeneity than those of k-means in the range 15–60 clusters, but their metrics converged with larger numbers of clusters. Solutions derived by agglomerative clustering had the best metrics (and divisive clustering the worst) but both produced inferior high-level solutions clusters to those of Chameleon by merging distantly-related clusters. Conclusions: Our results suggest that Chameleon may have an advantage over traditional algorithms at when data exhibit discontinuities and variable structure, potentially producing more stable solutions (due to lower rates of misplacement), but scoring lower on traditional metrics of central-tendency. Chameleon’s advantages are less obvious in the partitioning of data from continuous gradients, however its graph-based partitioning protocol facilitates hierarchical integration of solutions.
22 Aug 2022Submitted to Ecology and Evolution
06 Sep 2022Submission Checks Completed
06 Sep 2022Assigned to Editor
06 Sep 2022Review(s) Completed, Editorial Evaluation Pending
23 Oct 2022Editorial Decision: Accept
Nov 2022Published in Ecology and Evolution volume 12 issue 11. 10.1002/ece3.9496