loading page

Classification of Solar Flares using Data Analysis and Clustering of Active Regions
  • Hanne Baeke,
  • Jorge Amaya,
  • Giovanni Lapenta
Hanne Baeke
KU Leuven

Corresponding Author:hanne.baeke@kuleuven.be

Author Profile
Jorge Amaya
Author Profile
Giovanni Lapenta
Univ Leuven, KU Leuven, Dept Wiskunde
Author Profile


We devised a new data analysis technique to identify the threat level of solar active regions by processing a combined data set of magnetic field parameters and flaring activity. The data set is composed of two elements: a reduced factorization of SHARP parameters of the active regions, and information about the flaring activity at the time of measurement of the SHARP parameters. Machine learning is used to reduce the data and to subsequently classify the active regions. For this classification we used both supervised and unsupervised methods. The following processing steps are applied to reduce and enhance the SHARP data: outlier detection, redundancy elimination with common factor analysis, addition of sparsity with autoencoders, and construction of a balanced data set with under- and over-sampling. The supervised method, K-nearest neighbors, produces very good results on the strong X- and M-flares, with TSS scores of respectively 0.94 and 0.75. As unsupervised methods we used clustering models, K-means and Gaussian Mixture Models. We find that both techniques are able to distinguish non-flaring and flaring active regions. Moreover, K-means is able to distinguish C-/M-flaring active regions from X-flaring active regions. For processing purposes an unsupervised method is more useful, since the type of flare will not be available. Therefore, we conclude that K-means provides the most promising results.
29 Jun 2023Submitted to ESS Open Archive
09 Jul 2023Published in ESS Open Archive