A hierarchical ensemble manifold methodology for new knowledge on
spatial data: An application to ocean physics
Abstract
Algorithms to determine regions of interest in large or highly complex
and nonlinear data is becoming increasingly important. Novel
methodologies from computer science and dynamical systems are well
placed as analysis tools, but are underdeveloped for applications within
the Earth sciences, and many produce misleading results. I present a
novel and general workflow, the Native Emergent Manifold Interrogation
(NEMI) method, which is easy to use and widely applicable. NEMI is able
to quantify and leverage the highly complex ‘latent’ space presented by
noisy, nonlinear and unbalanced data common in the Earth sciences. NEMI
uses dynamical systems and probability theory to strengthen
associations, simplifying covariance structures, within the data with a
manifold, or a Riemannian, methodology that uses domain specific
charting of the underlying space. On the manifold, an agglomerative
clustering methodology is applied to isolate the now observable areas of
interest. The construction of the manifold introduces a stochastic
component which is beneficial to the analysis as it enables latent space
regularization. NEMI uses an ensemble methodology to quantify the
sensitivity of the results noise. The areas of interest, or clusters,
are sorted within individual ensemble members and co-located across the
set. A metric such as a majority vote, entropy, or similar the
quantifies if a data point within the original data belongs to a certain
cluster. NEMI is clustering method agnostic, but the use of an
agglomerative methodology and sorting in the described case study allows
a filtering, or nesting, of clusters to tailor to a desired application.