Azim Ahmadzadeh

and 3 more

Strong solar flares are indeed rare events, which make the flare classification task a rare-event problem. Solar energetic particle events are even rarer space weather events as only a few instances of them are recorded each year. With the unprecedented growth in employment of Machine Learning algorithms for rare-event classification/forecast problems, a proper evaluation of rare-event models becomes a necessary skill for domain experts to have. This task remains to be an outstanding challenge as both the learning process and the metrics used for quantitative verification of models can easily obscure or skew the true performance of models and yield misleading and biased results. To help mitigate this effect we introduce a bounded semimetric space that provides a generic representation for any deterministic performance verification metric. This space, named Contingency Space, can be easily visualized and shed light on models’ performance as well as on the metrics’ distinct behavior. An arbitrary model’s performance can be mapped to a unique point in this space, which allows comparison of multiple models at the same time, for a given metric. Using this geometrical setting we show the difference between a metric’s interpretation of performance and the true performance of the model. Using this perspective, models which are seemingly different but practically identical, or only marginally different, can be easily spotted. By tracking down a learner’s performance at each epoch, we can also compare different learners’ learning paths, which provides a deeper understanding of the utilized algorithms and their challenge in the learning process. Moreover, in the Contingency Space, a given verification metric can be represented by a geometrical surface, which allows a visual comparison between different metrics—a task that without this concept could be done only by the tedious algebraic comparison of metrics’ formulae. Moreover, using such a surface, for the first time we can see and quantify the impact of scarcity of data (intrinsic to rare-even problems) on different metrics. This extra knowledge provides us with the information we need to choose an appropriate metric for evaluation of our rare-event models.

Azim Ahmadzadeh

and 5 more

In spite of more than 20 years of substantial advances, solar flare prediction remains a largely outstanding problem. This is partly because of the scarcity of major flares. Effective flare prediction, if ever achieved, would help mitigate a substantial projected economic damage, with a long-range magnitude of 1 to 2 trillion dollars for the US alone. Prediction could also help mitigate, or even prevent, serious health risks to astronauts exposed to flares’ electromagnetic radiation and particulate. While many recent flare prediction studies have opted to employ Machine Learning techniques to better tackle the problem, a lack of sufficient understanding of how to properly treat the data often leads to overly optimistic results. We use the recently generated GSU solar flare benchmark dataset, called Space Weather ANalytics for Solar Flares (SWAN-SF), to show how a ‘mediocre’ forecast model can turn into an ‘impressive’ one, by simply overlooking some basic practices in data mining and machine learning. The benchmark is a multivariate time series collection, extracted from magnetographic measurements in the solar photosphere and spans over eight years of the Solar Dynamics Observatory Helioseismic and Magnetic Imager (SDO/HMI) era. We briefly explain the data collection process, the sampling and the slicing of time series, and then outline a series of experiments using machine learning models to illustrate the common mistakes, fallacies and pitfalls in forecasting rare events. We particularly elaborate on how and why imbalanced datasets, in general, impact the models’ performance, and how different under- or over-sampling methodologies and weighting practices could introduce accurate but often weak models. Concluding, we aim to draw attention to the impact of these practices on the flare forecasting models and how to train models by accentuating the statistical robustness over a relative accuracy in prediction.