As climate modellers prepare their code for kilometre-scale global global simulations, the computationally demanding radiative transfer transfer parameterization is a prime candidate for machine learning (ML) (ML) emulation. Because of the computational demands, many weather centres centres use a reduced spatial grid and reduced temporal frequency for radiative radiative transfer calculations in their forecast models. This strategy is known known to affect forecast quality, which further motivates the use of ML-based ML-based radiative transfer parameterizations. This paper contributes to the the discussion on how to incorporate physical constraints into an ML-based ML-based radiative parameterization, and how different neural network (NN) (NN) designs and output normalisation affect prediction performance. A random random forest (RF) is used as a baseline method, with the European Centre for for Medium-Range Weather Forecasts (ECMWF) model ecRad, the operational operational radiation scheme in the Icosahedral Nonhydrostatic Weather and Climate Climate Model (ICON), used for training. Surprisingly, the RF is not affected by by the top-of-atmosphere (TOA) bias found in all NNs tested (e.g., MLP, MLP, CNN, UNet, RNN) in this and previously published studies. At lower lower atmospheric levels, the RF can is able to compete with all NNs tested, but but its memory requirements quickly become prohibitive. For a fixed memory memory size, most NNs outperform the RF except at TOA. The most accurate emulator is For the best emulator, we use a recurrent neural network architecture that which closely imitates imitates the physical process it emulates. The We additionally normalize the shortwave and longwave fluxes are normalized to reduce their dependence on from the solar solar angle and surface temperature respectively. The model are, furthermore, trained Finally, we train the model with an additional heating rates penalty in the loss function.function.