Evaluation

The evaluation criteria are a crucial aspect to any benchmark dataset and need to be concretely defined and accurately reflect the objectives of the machine learning task. Ideally, the criteria are also simple to implement such that they can be used as a target in any loss function that might be used to train emulators. The spatial characteristics of the outputs in this task also need to be considered. As a primary metric we choose the area-weighted root-mean square error (RMSE), calculated following: