Evaluation
The evaluation criteria are a crucial aspect to any benchmark dataset
and need to be concretely defined and accurately reflect the objectives
of the machine learning task. Ideally, the criteria are also simple to
implement such that they can be used as a target in any loss function
that might be used to train emulators. The spatial characteristics of
the outputs in this task also need to be considered. As a primary metric
we choose the area-weighted root-mean square error (RMSE), calculated
following: