loading page

Open issues in statistical forecasting of solar proton events: a Machine Learning perspective
  • +3
  • Mirko Stumpo,
  • Simone Benella,
  • Monica LAURENZA,
  • Tommaso Alberti,
  • Giuseppe Consolini,
  • Maria Federica Marcucci
Mirko Stumpo
INAF-Istituto di Astrofisica e Planetologia Spaziali
Author Profile
Simone Benella
INAF-Istituto di Astrofisica e Planetologia Spaziali
Author Profile
Monica LAURENZA
INAF-Istituto di Astrofisica e Planetologia Spaziali

Corresponding Author:[email protected]

Author Profile
Tommaso Alberti
INAF-Istituto di Astrofisica e Planetologia Spaziali
Author Profile
Giuseppe Consolini
Istituto Nazionale di Astrofisica
Author Profile
Maria Federica Marcucci
INAF-Istituto di Astrofisica e Planetologia Spaziali
Author Profile

Abstract

Several techniques have been developed in the last two decades to forecast the occurrence of Solar Proton Events (SPEs), mainly based on the statistical association between the $>$10 MeV proton flux and precursor parameters. The Empirical model for Solar Proton Events Real Time Alert (ESPERTA, Laurenza et al., 2009) provides a quite good and timely prediction of SPEs after the occurrence of $\geq$M2 X-ray bursts, by using as input parameters the flare heliolongitude, the soft X-ray and the $\sim$1 MHz radio fluence. Here, we reinterpret the ESPERTA model in the framework of machine learning and perform a cross validation, leading to a comparable performance. Moreover, we find that, by applying a cut-off on the $\geq$M2 flares heliolongitude, the False Alarm Rate (FAR) is reduced. The cut-off is set to E20° where the cumulative distribution of $\geq$M2 flares associated with SPEs shows a break which reflects the poor magnetic connection between the Earth and eastern hemisphere flares. The best performance is obtained by using the SMOTE algorithm, leading to probability of detection of 0.83 and a FAR of 0.39. Nevertheless, we demonstrate that a relevant FAR on the predictions is a natural consequence of the sample base rates. From a Bayesian point of view, we find that the FAR explicitly contains the prior knowledge about the class distributions. This is a critical issue of any statistical approach, which requires to perform the model validation by preserving the class distributions within the training and test datasets.