Open issues in statistical forecasting of solar proton events: a Machine
Learning perspective
Abstract
Several techniques have been developed in the last two decades to
forecast the occurrence of Solar Proton Events (SPEs), mainly based on
the statistical association between the $>$10 MeV proton
flux and precursor parameters. The Empirical model for Solar Proton
Events Real Time Alert (ESPERTA, Laurenza et al., 2009) provides a quite
good and timely prediction of SPEs after the occurrence of
$\geq$M2 X-ray bursts, by using as input parameters the
flare heliolongitude, the soft X-ray and the $\sim$1
MHz radio fluence. Here, we reinterpret the ESPERTA model in the
framework of machine learning and perform a cross validation, leading to
a comparable performance. Moreover, we find that, by applying a cut-off
on the $\geq$M2 flares heliolongitude, the False Alarm
Rate (FAR) is reduced. The cut-off is set to E20° where the cumulative
distribution of $\geq$M2 flares associated with SPEs
shows a break which reflects the poor magnetic connection between the
Earth and eastern hemisphere flares. The best performance is obtained by
using the SMOTE algorithm, leading to probability of detection of 0.83
and a FAR of 0.39. Nevertheless, we demonstrate that a relevant FAR on
the predictions is a natural consequence of the sample base rates. From
a Bayesian point of view, we find that the FAR explicitly contains the
prior knowledge about the class distributions. This is a critical issue
of any statistical approach, which requires to perform the model
validation by preserving the class distributions within the training and
test datasets.