Reliable forecasting models are necessary to mitigate the risks posed by solar flares to human technology. This study introduces a novel deep learning forecasting approach while emphasizing the need for performance evaluation methods tailored to better highlight current models’ limitations. In particular, we show that models reaching state-of-the-art performances with traditional metrics have similar explanatory power to no-skill persistence models and notably struggle to forecast change in activity significantly better than random guesses. We also discuss other shortcomings in traditional evaluation metric like the True Skill Statistic (TSS) that we prove for the first time to be mathematically dependent to the class balance. We introduce Patch-Distributed-CNNs (P-CNN), which allow to perform full-disk forecasts while providing event probabilities in solar sub-regions and position predictions. This new framework offers similar information to Active-Region-based forecasting models while bypassing the problem of unrecorded and misattributed flares that are detrimental to machine learning training. As a result, the model also operates independently of prior feature extraction and AR detection, thus offering promising operational utility with minimal external dependencies. Finally, a method is proposed for constructing balanced and independent Cross-Validation folds for full-disk models. Models combining SDO/AIA EUV images as inputs show improved performances compared to employing SDO/HMI photospheric magnetograms, with a TSS of 0.74 for the C+ model and 0.62 for the M+ model.