Lingtong Meng

and 8 more

Accurately estimating the spatial distribution of soil organic carbon (SOC) in coal mining regions is crucial for soil quality restoration and understanding global carbon cycling. Given the complex mechanisms influencing SOC in coal mining areas, research on the dynamic and high-precision digital analysis of SOC content before and after subsidence and reclamation in high groundwater mining sites remains limited. In this study, we employed four machine learning algorithms—Cubist, Random Forest (RF), Support Vector Machine (SVM), and Extreme Gradient Boosting (XGBoost)—in conjunction with a model fusion technique to analyze SOC content across various subsidence stages in high groundwater mining areas: control land (CL), subsided land (SL), and reclaimed land (RL). By integrating high-resolution imagery from China’s GF-1 satellite, we generated a predictive map of surface SOC content. Additionally, we utilized an optimal parameter-based geographical detector (OPGD) model to quantitatively identify the key factors driving SOC spatial variation within the study area. Our results indicate that the fusion model combining RF and Cubist outperformed the others, achieving a coefficient of determination (R 2) of 0.73, a root mean square error (RMSE) of 0.73 g/kg, and a ratio of performance to interquartile distance (RPIQ) of 2.50. The predictive map highlights that high SOC concentrations in the mining area are predominantly found in reclaimed lands. Organism-related factors emerged as the strongest explanatory variables for SOC content in these areas and constituted the most critical dataset in our model development. This cost-effective, high-efficiency approach offers valuable insights into SOC research and informs strategies for soil remediation in mining-affected lands.