Automated identification of characteristic droplet size distributions in
stratocumulus clouds utilizing a data clustering algorithm
Abstract
Droplet-level interactions in clouds are often parameterized by a
modified gamma fitted to a “global” droplet size distribution. Do
“local” droplet size distributions of relevance to microphysical
processes look like these average distributions? This paper describes an
algorithm to search and classify characteristic size distributions
within a cloud. The approach combines hypothesis testing, specifically
the Kolmogorov-Smirnov (KS) test, and a widely-used machine-learning
algorithm for identifying clusters of samples with similar properties:
Density-based spatial clustering of applications (DBSCAN). The
two-sample KS test does not presume any specific distribution, is
parameter free, and avoids biases from binning. Importantly, the number
of clusters is not an input parameter of the DBSCAN algorithm, but is
independently determined in an unsupervised fashion. As implemented, it
works on an abstract space from the KS test results, and hence spatial
correlation is not required for a cluster. The method is explored using
data obtained from Holographic Detector for Clouds (HOLODEC) deployed
during the Aerosol and Cloud Experiments in the Eastern North Atlantic
(ACE-ENA) field campaign. The algorithm identifies evidence of the
existence of clusters of nearly-identical local size distributions. It
is found that cloud segments have as few as one and as many as seven
characteristic size distributions. To validate the algorithm’s
robustness, it is tested on a synthetic dataset and successfully
identifies the predefined distributions at plausible noise levels. The
algorithm is general and is expected to be useful in other applications,
such as remote sensing of cloud and rain properties.