Abstract
One of the most important steps in any AI/ML application is the
pre-processing of the data. The objective of this step is to project the
original data in a new basis, or in a new latent space, where the
different features of the problem are comparable and where their
distribution covers a large range of values. Using the data in its
natural basis can lead to under-performing AI/ML models. While almost
all papers in our domain are careful to normalize or standardize the
data, it is less frequent to see the use of simple linear PCA
transformations, and even less frequent the use of more complex
non-linear projections in latent spaces. Here we show how our research
team is using autoencoder neural networks to perform non-linear
transformations of images, simulations and time-series used in
heliophyisical applications. Autoencoder transformations allow to
parametrize any type of data by projecting it onto a latent space of
higher or lower dimension. In these latent spaces the transformed data
commonly presents better statistical properties allowing improvements in
the AI/ML modeling. In addition, autoencoders are also known as
generative techniques, i.e. they can be used to produce “artificial”
or “synthetic” data. We will present three particular examples of the
use of autoencoders: 1) parametrization of solar wind observations using
standard feed forward autoencoders, 2) parametrization of magnetosphere
simulations using convolutional autoencoders, and 3) parametrization and
generation of solar active regions using variational convolutional
autoencoders. We will show how these parametrizations can then be used
for AI/ML classification and forecasting. This project has received
funding from the European Union’s Horizon 2020 research and innovation
programme under grant agreement No 776262 (AIDA).