Upscaling flux tower measurements based on machine learning (ML) algorithms is an essential approach for large-scale net ecosystem CO2 exchange (NEE) estimation, but existing ML upscaling methods face some challenges, particularly in capturing NEE interannual variations (IAVs) that may relate to lagged effects. With the capacity of characterizing temporal memory effects, the Long Short-Term Memory (LSTM) networks are expected to help solve this problem. Here we explored the potential of LSTM for predicting NEE across various ecosystems using flux tower data over 82 sites in North America. The LSTM model with differentiated plant function types (PFTs) demonstrates the capability to explain 79.19% (R2 = 0.79) of the monthly variations in NEE within the testing set, with RMSE and MAE values of 0.89 and 0.57 g C m-2 d-1 respectively (r = 0.89, p < 0.001). Moreover, the LSTM model performed robustly in predicting cross-site variability, with 67.19% of the sites that can be predicted by both LSTM models with and without distinguished PFTs showing improved predictive ability. Most importantly, the IAV of predicted NEE highly correlated with that in flux observations (r = 0.81, p < 0.001), clearly outperforming that by the random forest model (r = -0.21, p = 0.011). Among all nine PFTs, solar-induced chlorophyll fluorescence, downward shortwave radiation, and leaf area index are the most important variables for explaining NEE variations, collectively accounting for approximately 54.01% in total. This study highlights the great potential of LSTM for improving carbon flux upscaling with multi-source remote sensing data.