The detection of deformation in GNSS time series associated with (a)seismic events down to a low magnitude is still a challenging issue. The presence of a considerable amount of noise in the data makes it difficult to reveal patterns of small ground deformation. Traditional analyses and methodologies are able to effectively retrieve the deformation associated to medium to large magnitude events. However, the automatic detection and characterization of such events is still a complex task, because traditionally-employed methods often separate the time series analysis from the source characterization. Here we propose a first end-to-end framework to characterize seismic sources using geodetic data by means of deep learning, which can be an efficient alternative to the traditional workflow, possibly overcoming its performance. We exploit three different geodetic data representations in order to leverage the intrinsic spatio-temporal structure of the GNSS noise and the target signal associated with (slow) earthquake deformation. We employ time series, images and image time series to account for the temporal, spatial and spatio-temporal domain, respectively. Thereafter, we design and develop a specific deep learning model for each data set. We analyze the performance of the tested models both on synthetic and real data from North Japan, showing that image time series of geodetic deformation can be an effective data representation to embed the spatio-temporal evolution, with the associated deep learning method outperforming the other two. Therefore, jointly accounting for the spatial and temporal evolution may be the key to effectively detect and characterize fast or slow earthquakes.