Classification of Solar Flares using Data Analysis and Clustering of
Active Regions
Abstract
We devised a new data analysis technique to identify the threat level of
solar active regions by processing a combined data set of magnetic field
parameters and flaring activity. The data set is composed of two
elements: a reduced factorization of SHARP parameters of the active
regions, and information about the flaring activity at the time of
measurement of the SHARP parameters. Machine learning is used to reduce
the data and to subsequently classify the active regions. For this
classification we used both supervised and unsupervised methods. The
following processing steps are applied to reduce and enhance the SHARP
data: outlier detection, redundancy elimination with common factor
analysis, addition of sparsity with autoencoders, and construction of a
balanced data set with under- and over-sampling. The supervised method,
K-nearest neighbors, produces very good results on the strong X- and
M-flares, with TSS scores of respectively 0.94 and 0.75. As unsupervised
methods we used clustering models, K-means and Gaussian Mixture Models.
We find that both techniques are able to distinguish non-flaring and
flaring active regions. Moreover, K-means is able to distinguish
C-/M-flaring active regions from X-flaring active regions. For
processing purposes an unsupervised method is more useful, since the
type of flare will not be available. Therefore, we conclude that K-means
provides the most promising results.