With the increase in Global Navigation Satellite System (GNSS) observations, the requirement for objective and automated detection of slow slip event (SSE) signals hidden in displacement time series is increasing. However, machine learning for GNSS time series has rarely been attempted, primarily due to the smaller number of SSEs compared to ordinary earthquakes, which renders it difficult to prepare training data. In this study, we conducted a single-site SSE detection based on machine learning trained by real GNSS observations of southwest Japan to directly consider the complicated spatiotemporal characteristics of observational noise. Based on a catalog of 284 short-term SSEs, approximately 16,000 time series containing SSE signals or noises were extracted as training data. The signal data predominantly had an amplitude of 1.5–2.0 mm. The model architecture following the Generalized Phase Detection, which was originally proposed for seismic wave detection, was then adopted. We obtained an accuracy of 97–98% for the test data. As expected, signals with smaller amplitudes showed a higher frequency of false negatives. The true-positive ratio was highest in the western Shikoku region, which had the largest signal amplitude. In contrast, the false-positive ratio exhibited a nearly random spatial distribution. False detection appears to be caused primarily by the bends in the time series that resemble the onset or termination of the SSE signal. The analysis of this study is expected to facilitate a straightforward evaluation of the influence of noise characteristics on the detection performance, and clarify the crucial topics to improve detection precision.