Weather stations can represent local weather variability and extremes more reliably than gridded products and are therefore better suited for local climate impact applications like calculation of the Fire Weather Index (FWI), a multivariate index for wildfire danger assessment. However, the prediction at multiple sites poses the challenge of preserving spatial consistency across locations, requiring a suitable multi-site approach. This study evaluates the potential of Convolutional Neural Networks (CNNs) for statistical downscaling (SD) of FWI predictions across the Iberian Peninsula. We compare our CNN-Multi-Gaussian (CNN-MG) model against Generalized Linear Models (GLMs) and a benchmark single-site CNN approach. Our evaluation focuses on predictive accuracy, distributional congruence, spatial coherence and extreme events reproducibility using daily FWI data from 29 locations in Spain. The CNN-MG model, which integrates the covariance structure of the predictands, outperformed other methods in representing FWI distributions across both single and multisite scales. Moreover, our model provides greater physical interpretability via eXplainable Artificial Intelligence (XAI) techniques, while also emphasizing simplicity and ease of training.