Machine learning has been widely applied in numerical weather prediction, but the incorporation of new observational sites into models trained on stations with long historical records remains a challenge. Here we propose a post-processing framework consisting of three machine learning methods: station clustering with K-means, temperature prediction based on decision trees, and transfer learning for newly-built stations. We apply this framework to post-processing forecasts of surface air temperature at 301 weather stations in China. The results show significant reductions (as much as 39.4%~20.0%) in the root-mean-square error of operational forecasts at lead times as long as 7 days. Moreover, the use of transfer learning to incorporate new stations improves forecasts at the new site by 36.4% after only one year of data collection. These results demonstrate the potential for clustering and transfer learning to boost existing applications of machine learning techniques in weather forecasting.